Template Matching - NCC

  • Dear Kevin,


    may I ask

    - where is the parallelism error from? (I assume both the operator "Less" and "PixeltoVal" should be single I/O)

    - Also, is the operator "CoordinateY" only read the pixel value of Y ordinate and make the pixel values as a stream, so that is kind of line scanning?


    Best

    BL

    pasted-from-clipboard.png

  • Hi Bingnan,

    The Y-Coordinate Operator will give you the Y-Coordinates of each Pixel, as you correctly mentioned it will only Count in Y direction. There is also an X-Coodinate operator which will count in X-Direction.

    Please notice that in your current design you are overwriting your image data that comes from the camera because of the Y-Coordinate Operator which only gives you the information about the pixel index. If you want to use both, I suggest using a Branch Opreator before.

    Regarding your questions about the parallelility, if you check the Documentation of the PixelToSignal you will see the following Table:

    pasted-from-clipboard.png

    Here you can see that the Input Link I requires a parallelism of 1.

    Here are some pointers that may help you:

    ParallelDN -> Set it 1

    SplitParallel -> Here you could Split the parallel pixel into links that only have the parallilty of 1 and then compute them further.

    It all depends on what your use case exactly is.


    Best Regards

    Kevin

  • pasted-from-clipboard.png

    Hi Kevin,


    thank you for your information. Attached is the output picture, which I want to transfer from to trigger signal. My idea is to use "line-scan", the white dotes (detected objects) move from left to right - when there is a white dote passed, I output a signal using a thresholding operator (for example GreaterThan). How should I do this?

    Thank you in advance.


    Best

    Bingnan

  • Hi Bingnan,

    I think the operator that you are searching for is called "SplitImage". In the Module Properties you can set the Height to "1", thus the image will get split into different lines.

    I will attach you an example design. Note that you need to adapt it to your input image stream and to your parallelilty (currently working with parallelity 2)

    Best Regards

    Kevin

    Note: This was not tested on any hardware.

  • Hi


    In this case, for example, the height of the image is N; then I need to make N branches (parallelity)?

    Is there a way to do as "for" logo, for (i=0; i<N; i++ )


    Best,

    B

  • Hi Bingnan,

    No the parallelity is usally in the "width". Which means, that if you have a parallelity of 8 you will have 8 pixels transported in parallel. This pixels are usually the ones that are next to each other in one line.

    If you set the height in the module properties of the "SplitImage" operator to 1 it will divide an image which has N lines into N images.

    The parellelity will stay, and if you have a parallelity of 8 you will need to split it in parallel of 8 and then combine the logic with an OR Operator at the end.

    This is similiar done in the example design that I have attached before.

    Best Regards,

    Kevin

  • Hi


    Attached is my design; during the design, I had some questions:

    1. the operator IS_GreaterThan in the box "Trigger" has the properties pf Bit width and Parallelism 8, so I had to split parallel to 8. Is it because 8bit is the size of integer data?

    2. After Split_Img should I add ParallelDn?

    3. I did not add selectROI operator, Coz I want to adjust the real-time ROI manually when I monitor the camera. Would it cause any problems like resource allocation?


    I would be grateful if you could give any advice on my design. Thank you in advance.


    Best

    Bingnan

  • Dear Kevin,


    When I build it, DRC checks no error, but failed to build into .hap. Attahced are my design and log of build process. Could you please check what's the problem?


    Thanks,

    Bingnan

  • Hi Bingnan,

    In the step netlist generation you see that the hardware ressources are exhausted on the FPGA.

    pasted-from-clipboard.png



    This should not be over 100% at any of the Resources.


    Below you also find a list of operators and elements that require too much resources.

    Additionally, you can check in the top bar:
    pasted-from-clipboard.png

    There you will find a tabular view of the required sources that can also be sorted.

    Lastly you can right click on each operator or H-Box and click on "FPGA Resources" to see how much resources are needed by this operator or H-Box. If it is grayed out you need to run DRC2.


    If we take a look at the first Ressources we can see that a lot of it is in NCC/Sigma_R

    pasted-from-clipboard.png




    The size of the Kernels are 22x22 which is quite large. Would it be possible to downsample the input images, and therefore downsample the mask to 11x11 ?

    Lastly, the input parallelism is 8. With the Kernel size this becomes 22x22x8 pixels that are computed in parallel. If you put a ParallelDN Operator before the NCC-Hbox you can save up a lot of resources.



    Lastly, there are some other things that could be done to distribute the resouce usage, in some arithmetic operators you can choose the ImplementationType.

    pasted-from-clipboard.png

    Here you can see that there is still some EmbeddedALU left so it would be helpful to not use LUTs here.


    By doing all proposed ideas I was able to reduce the fpga ressources from the previous 766% to 105% on the LUTs, and on all others below 100%.

    pasted-from-clipboard.png

    The applet is attached, however it won't produce the desired results since the kernels are 11x11 and I did not adapt any of them.

    You may take this as a reference to further improve your design

    Best Regards

    Kevin

  • Hi Kevin,


    Thanks so much. Just one question: why some operators can specify resource type (ImplementationType), some not?

    And when ImplementType = AUTO, what’s the principal of VA to allocate resources?


    Best

    Bingnan

  • Thanks so much. Just one question: why some operators can specify resource type (ImplementationType), some not?

    And when ImplementType = AUTO, what’s the principal of VA to allocate resources?

    Dear Bingnan Liu,


    In case of "ImplementType = AUTO" VisualApplets (VA) will decide which ressource will be used. Only when a FPGA ressource conflict is shown by DRC or is getting obvious a specific discussion on the used "ImplementationType" modification is getting necessary. A good example would be: running out of logic ressources and shifting several operators to ALU/DSPs in order to save Logic/LUTs.


    Only specific operators have the ImplementationType feature, mostly the ones that require a lot of ressources and have an alternative.


    Please do not hesitate to post a VA design over here and we can go through it together and give some recommendations on how to make beneficial modifications.


    Best regards,

  • Post by Bingnan Liu ().

    This post was deleted by the author themselves: something described wrong ().