VQ8-CXP6D-DepthFromFocus

  • Hi.


    I am designing a DepthFromFocus example to be applied to VQ8-CXP6D, but the designed applet has a buffer overflow in the microDisplay. The maximum frame rate of the camera is 239 fps. but, overflow occurs at 96 fps or higher.


    Please check the problem in the designed applet.


    * Test environment

    - Camera: Adimac S-25A70

    - Fram Grabber: microEnable 5 VQ8-CXP6D

    - Image ROI: 2000x2000 @8Bit(mono8)


    Thanks.

  • Dear IhShin,


    In your design I can see that you are using the camera with 2 RAMs on the acquisition side.

    There you have chosen parallelism 32, where the processing itself is targeting a parallelism of 8.


    Please check if your design works fine with a lower frame rate.

    For example 100 Hz:

    (2000 * 2000 * (100 Hz)) / (8 * (125 MHz)) = 40 %


    Step by step you can check if the applet runs fine.

    At a certain frame rate you will start observing a buffer overflow.


    You can use the same RAM approach as in the acquisition to increase the bandwidth of the EDoF calculation.

    That approach will use 2 RAM modules.

    Please make sure that you utilize the maximum possible paralleism for each image buffer (RAM) module to get the maximum possible bandwidth.


    Best regards,

  • Dear IhShin,


    Please let me know if the mentioned approach is usefull for you and what the target bandwidth looks like:

    Width * Height * BitPerPixel * Frequency
    2000 * 2000 * 8 bit * 100 Hz = 400 MB/s


    That will define the require paralleism representing the bandwidth.


    Best regards,

  • Dear B.Ru,


    If you check the operation at a low frame rate, it operates normally up to 96Hz.


    target bandwidth:

    Width * Height * BitPerPixel * Frequency

    -> 2000 x 2000 x 8 x 239 = 912MB/s


    If the RAM approach of EDoF calculation is 2 RAM module and the parallelism of the process is changed to 16, the build does not work for a long time.


    Thanks.


    pasted-from-clipboard.png

  • Dear IhShin,


    You question asks for a bandwidth of:

    2000 * 2000 * (8 bit) * (239 Hz) = 956 MB / s

    coming from the camera, while reporting a bandwidth limit at:

    2000 * 2000 * (8 bit) * (96 Hz) = 384 MB / s


    While your initial buffer is using 2 RAMs for 8 bit at paralleism 32 (receiving up to 20), the buffer approach inside the EDoF is receiving 16 bit at parallelism 32.


    Looking at the RAM


    Hardware Configuration microEnable 5 ironman:

    Resource mE5VQ8-CXP6B/mE5VQ8-CXP6D
    Vision Processor Xilinx Virtex6 XC6VLX240T FPGA
    LUT
    150720
    FlipFlop
    301440
    Block RAM 832 x 18432Bit
    Embedded Arithmetic Logic Unit (DSP48) 768
    RAM 4 x 256MiB DDR3
    Data Width per RAM 128Bit
    Bandwidth per RAM 4GB/s
    Base Design Clock
    125MHz
    Host Interface PCIe x8 Gen2
    Host Interface (PCIe x 8 Gen 2) Bandwidth (theor.) 4 Gbyte/s per direction on PCIe bus
    Host Interface (PCIe x 8 Gen 2) Bandwidth (typ./max.) up to 3.6 GByte/s on PCIe bus

    Table 55. Hardware Configuration microEnable 5 ironman (Source)


    Let's look into the details:


    To get the maximum performance of each RAM module, we have to use the full data width:

    128 bit

    using the full interface, does mean all 128 bit == bit-depth * parallelism, would give 4 GB/s of bandwidth.

    4 GB/s = (1+1) * 2GB /s one for writing input and one for reading output.


    Above we end up at an bandwidth

    956 MB / s

    here it equals 956 MPixel/s, each pixel having 16 bit = 2000 * 2000 * (16 bit) * (239 Hz) = 1912 MB / s

    Explanation: 16 bit consist of 2 intermediate values per pixel...


    Two RAM blocks are used within the EDoF:

    pasted-from-clipboard.png


    Each RAM in here uses 4 bit at parallelism 32 = 4 bit * 32 = 128 bit
    Same choice for acquisition RAM modules.

    So RAM interfacing in EDoF is fine.


    While your intended bandwidth is less than parallelism * system clock = 8 * 125 MHz = 1GPixel /s the selected parallelism of 16 is a secure choice. No problem with this too.


    From my point of view there is no issue within the VA design.

    That is OK, but no solution or answer to your question.


    There are two other details we have to look at now:

    DMA-performance and camera interface CXP.

    I guess and hope that the DMA performance of the ironman grabber in your system is not limited, but please double check that.

    A limited PCIe performance could be a reason for that, because it would propagate stop's into the design's data flow.

    The ironman is providing PCIe x8 Gen2 with theoretical 4 GB/s and 3.6 GB/s in practice, but it is possible that the mainboards PCIe slot does only support Gen1 and/or less lanes than x8.

    One PCIe lane at Gen1 would deliver 256 MB/s in theory and practically close to 200 MB/s

    One PCIe lane at Gen2 would deliver 512 MB/s in theory and practically close to 400 MB/s


    Your design is correctly configured for PCIe 8 Gen2:
    pasted-from-clipboard.png

    Shown in Applet Properties operator.


    You reported:

    2000 * 2000 * (8 bit) * (96 Hz) = 384 MB / s

    and 1 of 50 images : 2% is output bandwidth of second DMA:


    (2000 * 2000 * (8 bit) * (96 Hz)) * (1 + (1 / 50)) = 391.68 MB / s
    That is pretty close to 400 MB/s, being an indicator for 1 PCIe lane at Gen2.

    In microDiagnostics you can double-check the possible bandwidth of the grabber:

    Output will look like:

    OhneDMATurb_650x315.png


    Test is carried out for the applet that is available (flashed) on the selected frame grabber.

    Since VA designs are not fully supported, please flash the related acquisition applet.

    In your case it will be: Acq_QuadCXP6x1AreaGray8.dll


    On X axis of the performance diagram for Acq_* applet at FG_GRAY = 8 bit per pixel you need to see a peak at 2048 at or above 1000 MB/s to reach your target bandwidth.


    What to do if DMA performance is fine?


    The camera may be a second external issue.

    In practice you are using CXP and the used operator supports 4 CXP6 links.


    That means up to 6.25 Gbit/s per link.

    Due to 8b/10b encoding this represents per link:

    (8 / 10) * (6.25 (Gbit / s)) = 625 MB / s

    quad link you be:

    ((4 * 8) / 10) * (6.25 (Gbit / s)) = 2500 MB / s

    Since the interface is protocol based you will not get the full bandwidth for image data.

    But at least it is the possible peak performance for CXP6.


    There are more options being supported by the VA design:


    CXP-1 1.25 Gbit/s up to 212 m
    CXP-2 2.5 Gbit/s up to 185 m
    CXP-3 3.125 Gbit/s up to 169 m
    CXP-5 5 Gbit/s up to 102 m
    CXP-6 6.25 Gbit/s up to 60 m
    CXP-10 10 Gbit/s up to 40 m
    CXP-12 12.5 Gbit/s up to 30 m

    Source of table: Wikipedia


    Some math again:

    ((4 * 8) / 10) * (1.25 (Gbit / s)) = 500 MB / s, where this peak bandwidth is very likely transporting 400 MB/s of image data.

    Same for this configuration:

    ((1 * 8) / 10) * (5 (Gbit / s)) = 500 MB / s


    To end this already pretty long story :

    Check the CXP configuration using the hardware dialog in GenICam Explorer or microDisplayX.

    The link topology will tell you what you are using precisely for CXP.

    This will link to the documentation of link topology dialog, but screenshots are for GEV, but same steps for CXP.


    The End is a Summary

    Your VA design is correct, you should see the expected bandwidth.

    From my perspective it is possible that the PCIe connection or CXP topology is causing this.


    To me it is very likely that the PCIe Gen2 slot provides a single lane only.
    Please let me know what the DMA performance test (microDiagnistics) shows...


    If you need some help in interpreting your tests and interpreting this into the observed performance:
    I and all the other people in the VA forum community will help.


    Thanks and best regards,