Template Matching - NCC

  • Dear Kevin,


    may I ask

    - where is the parallelism error from? (I assume both the operator "Less" and "PixeltoVal" should be single I/O)

    - Also, is the operator "CoordinateY" only read the pixel value of Y ordinate and make the pixel values as a stream, so that is kind of line scanning?


    Best

    BL

    pasted-from-clipboard.png

  • Hi Bingnan,

    The Y-Coordinate Operator will give you the Y-Coordinates of each Pixel, as you correctly mentioned it will only Count in Y direction. There is also an X-Coodinate operator which will count in X-Direction.

    Please notice that in your current design you are overwriting your image data that comes from the camera because of the Y-Coordinate Operator which only gives you the information about the pixel index. If you want to use both, I suggest using a Branch Opreator before.

    Regarding your questions about the parallelility, if you check the Documentation of the PixelToSignal you will see the following Table:

    pasted-from-clipboard.png

    Here you can see that the Input Link I requires a parallelism of 1.

    Here are some pointers that may help you:

    ParallelDN -> Set it 1

    SplitParallel -> Here you could Split the parallel pixel into links that only have the parallilty of 1 and then compute them further.

    It all depends on what your use case exactly is.


    Best Regards

    Kevin

  • pasted-from-clipboard.png

    Hi Kevin,


    thank you for your information. Attached is the output picture, which I want to transfer from to trigger signal. My idea is to use "line-scan", the white dotes (detected objects) move from left to right - when there is a white dote passed, I output a signal using a thresholding operator (for example GreaterThan). How should I do this?

    Thank you in advance.


    Best

    Bingnan

  • Hi Bingnan,

    I think the operator that you are searching for is called "SplitImage". In the Module Properties you can set the Height to "1", thus the image will get split into different lines.

    I will attach you an example design. Note that you need to adapt it to your input image stream and to your parallelilty (currently working with parallelity 2)

    Best Regards

    Kevin

    Note: This was not tested on any hardware.

  • Hi


    In this case, for example, the height of the image is N; then I need to make N branches (parallelity)?

    Is there a way to do as "for" logo, for (i=0; i<N; i++ )


    Best,

    B

  • Hi Bingnan,

    No the parallelity is usally in the "width". Which means, that if you have a parallelity of 8 you will have 8 pixels transported in parallel. This pixels are usually the ones that are next to each other in one line.

    If you set the height in the module properties of the "SplitImage" operator to 1 it will divide an image which has N lines into N images.

    The parellelity will stay, and if you have a parallelity of 8 you will need to split it in parallel of 8 and then combine the logic with an OR Operator at the end.

    This is similiar done in the example design that I have attached before.

    Best Regards,

    Kevin

  • Hi


    Attached is my design; during the design, I had some questions:

    1. the operator IS_GreaterThan in the box "Trigger" has the properties pf Bit width and Parallelism 8, so I had to split parallel to 8. Is it because 8bit is the size of integer data?

    2. After Split_Img should I add ParallelDn?

    3. I did not add selectROI operator, Coz I want to adjust the real-time ROI manually when I monitor the camera. Would it cause any problems like resource allocation?


    I would be grateful if you could give any advice on my design. Thank you in advance.


    Best

    Bingnan

  • Dear Kevin,


    When I build it, DRC checks no error, but failed to build into .hap. Attahced are my design and log of build process. Could you please check what's the problem?


    Thanks,

    Bingnan

  • Hi Bingnan,

    In the step netlist generation you see that the hardware ressources are exhausted on the FPGA.

    pasted-from-clipboard.png



    This should not be over 100% at any of the Resources.


    Below you also find a list of operators and elements that require too much resources.

    Additionally, you can check in the top bar:
    pasted-from-clipboard.png

    There you will find a tabular view of the required sources that can also be sorted.

    Lastly you can right click on each operator or H-Box and click on "FPGA Resources" to see how much resources are needed by this operator or H-Box. If it is grayed out you need to run DRC2.


    If we take a look at the first Ressources we can see that a lot of it is in NCC/Sigma_R

    pasted-from-clipboard.png




    The size of the Kernels are 22x22 which is quite large. Would it be possible to downsample the input images, and therefore downsample the mask to 11x11 ?

    Lastly, the input parallelism is 8. With the Kernel size this becomes 22x22x8 pixels that are computed in parallel. If you put a ParallelDN Operator before the NCC-Hbox you can save up a lot of resources.



    Lastly, there are some other things that could be done to distribute the resouce usage, in some arithmetic operators you can choose the ImplementationType.

    pasted-from-clipboard.png

    Here you can see that there is still some EmbeddedALU left so it would be helpful to not use LUTs here.


    By doing all proposed ideas I was able to reduce the fpga ressources from the previous 766% to 105% on the LUTs, and on all others below 100%.

    pasted-from-clipboard.png

    The applet is attached, however it won't produce the desired results since the kernels are 11x11 and I did not adapt any of them.

    You may take this as a reference to further improve your design

    Best Regards

    Kevin

  • Hi Kevin,


    Thanks so much. Just one question: why some operators can specify resource type (ImplementationType), some not?

    And when ImplementType = AUTO, what’s the principal of VA to allocate resources?


    Best

    Bingnan

  • Thanks so much. Just one question: why some operators can specify resource type (ImplementationType), some not?

    And when ImplementType = AUTO, what’s the principal of VA to allocate resources?

    Dear Bingnan Liu,


    In case of "ImplementType = AUTO" VisualApplets (VA) will decide which ressource will be used. Only when a FPGA ressource conflict is shown by DRC or is getting obvious a specific discussion on the used "ImplementationType" modification is getting necessary. A good example would be: running out of logic ressources and shifting several operators to ALU/DSPs in order to save Logic/LUTs.


    Only specific operators have the ImplementationType feature, mostly the ones that require a lot of ressources and have an alternative.


    Please do not hesitate to post a VA design over here and we can go through it together and give some recommendations on how to make beneficial modifications.


    Best regards,

  • Dears,


    when I run my design on the frame grabber, the microdisplay shows like this: pasted-from-clipboard.png


    I removed the "NCC detection part" only to grab the images, it runs a few seconds and reported "Timeout Error code: -2120".


    I also tried to build the .hap from /Example/VCX-QP/SingleCXP6x4AreaGray8; the result is this:

    pasted-from-clipboard.png

    .


    The hardwares I am using are Optronis GmbH Cyclone-1HS-3500 - Optronis GmbH; frame grabber ME5 VCX-QP.


    Attached are my programs. Could you help me check them out?


    Thanks

    Bingnan

  • Hi Bingnan,

    Sorry for the late reply.


    It is hard to do a remote check but I am assuming the following: Since you are using the Cyclone-1HS-3500 with a resolution of
    1280x860 you should adapt your Maximum Image Width and your ImageBuffer accordingly. You can do so by clicking on the link after the Camera Operator. Furthermore, in the ImageBuffer operator Adapt the XLength to 1280.

    I hope this helps !

    Best Regards

    Kevin

  • Hi


    Is there a way to use a big template without downsampling the mask? I can lower the speed for trade-off.


    Best

    Bingnan

  • Dear Carmen,


    May I ask what the purpose was of increasing the fractional bits from 2 to 10? Was it to reserve more fractional bits to increase the accuracy?


    Best

    Bingnan

  • Dears,


    I want to ask some questions met when I implement the program on FPGA:


    • I first set the frame rate and ROI as we normally use with the official CXP6x4 image acquisition applet. After a few seconds, the monitor appeared “black area”, as the screenshot shows. We dramatically lowered the frame rate then it disappeared. Is it because most memory was allocated to the image processing unit? pasted-from-clipboard.png
    • When I use the applet, I cannot adjust the output triggering signals in the red circle area (like signal width, delay, etc) as when I used the acquisition applet. Is it because we use the GPIO from the frame grabber rather than the trigger board? How should I adjust the output signal?


    My basic idea is - to let the program do the “detection then trigger”, and at the same time, the operator can monitor what is happening. The frame grabber model is mE5 VCX QP. I would appreciate it if you could check my designs attached.



    Best wishes

    Bingnan

  • Hello Bingnan,

    I looked at your design, and the first thing I noticed was the parallelDN right after the ImageBuffer at the beginning. I do not know how fast your camera is set, however if you go down to a parallelsim of 1, only 1 pixel is transfered with a rate of 125 MHz.

    If your camera has a resolution of 1280x860, you are transfering 1.101 MegaPixel per Frame. Thus you will only be able to have a framerate of 113 FPS.

    I updated your design, and removed the parallelDN Operator. Furthermore, I added an Overflow operator. This operator has a property called "OverflowOccured". You can look at this parameter in MicroDisplay(X). If it is "1" an Overflow occured, meaning that your buffers are not large enough, or that some other bottleneck exists in your design.

    Speaking of bottlenecks that is often overseen is the DMA Transfer rate, which is 1.800 MB/s for the mE5 VCX QP. So the maximum you can achieve with your resolution is 1635 Hz ( (1800 MegaByte/s / 1280*860) ) - assuming you are using the full DMA Bandwidth. The PCI can transport up to 128 bits in parallel, thus when you have an 8bit image, you can use a parallelsim of 16. In your design it was still one, due to the paralleDN. This means, that only 102 FPS were able to be achieved.



    Regarding your question to the triggers:

    The standard acquisition applet has these functioanlities built in, when you are using your onw custom applet you are responsible for these implementations. You can find an example for the mE5 VCX QP in your VisualApplets installation folder:


    <VASINSTALLDIR>\Examples\Processing\Trigger\me5-MA-VCX-QP\Area


    You can copy the trigger functionality and the camera from there.



    Best Regards and a nice weekend


    Kevin





  • Thank you, Kevin, for your quick reply. I will test it on the board ASAP

  • Hi Kevin,


    1. The trigger example seems to be receiving an external trigger signal, then triggering the camera. How can I make the output-triggering signal controllable?


    NCC v2.4.1.va

    Attached is my revised design - I added a new box "SignalProcessing" before the "GPO" to let the user control the output signal's form on Microdisplay. Am I doing right?


    2. With the "Overflow", is that means I can use a larger template? Or it is limited by the resources the "Overflow" controls transfer bits but no effect on this?


    Thank you in advance.


    Best

    Bingnan


    NCC v2.4.1.va

  • Hi


    I tried your design, and no matter how I adjusted the frame rate and ROI, the "black zone" always appeared, as the first screenshot shows. I checked the transfer speed: pasted-from-clipboard.png


    Do you have any idea why this happening?


    For the output signal my last design seems be able to adjust in the window of the software.


    Best

    Bingnan