Synchronization issues with flip an image twice

  • Porj_A(mE5_MA_VCX-QP_Single_proj_I) test video [Link]


    Porj_B(mE5_MA_VCX-QP_Single_proj) test video [Link]


    FFC_factor image [Link]



    Porj_A can work at Line rate of 50000.


    When I add HierarchicalBox of segment after HierarchicalBox of EdgexFilter (Porj_B).


    Porj_B can not work at Line rate of 50000.


    How can I calculate how many buffer need to add at Porj_B for Synchronization?


    Thanks.


    Jesse

  • Dear Jesse,


    I looked at both of your designs.


    Porj_B / mE5_MA_VCX-QP_Single_proj.va includes the segmantation H-Box.

    Inside of this H-Box an additional RAM buffer is used.


    You are asking for the number or size of this buffer that is required to run the design at the expected bandwidth.


    In this case it is not related to the size.


    You are using 3 RAM's modules on a marathon VCX-QP.

    The memory bandwidth inside the VCX-QP is following a shared memory concept.


    http://www.siliconsoftware.de/…t/device%20resources.html


    The implemented RAM operators work at:

    RAM Operator name
    bit depth
    parallelism (shared * 4)
    RAM1 8 8 ( -> 32 )
    RAM3 9 8 ( -> 32 )
    ffc_factor 61 1 ( -> 4 )


    Since all operators together share now:


    64 / 72 / 64 bit


    the design will not meet the expected performance.


    Simply increase the parallelism around the RAM operators to enable a higher bandwidth.


    Example:

    If 2 RAMs need to handle 8 bit @ parallelism of 8, connect both with double parallelism:

    8 bit @ parallelism 16

    Then these two would share the parallelism of 16 to 8 each.


    What you need to do now:


    Use parallelism of 32 for RAM1 and RAM3.

    Use parallelism of 2 for ffc_factor.


    Then the bandwidth would be handled correctly by the shared memory concept.

    Since 3 RAMs need to share the same Pixel-rate in that design we use a factor of 4 for the parallelism to speed-up accordingly. A factor of 4 is not affecting the image dimension.


    LinkBandwidth = bit-depth * pixel-clock * parallelism


    Example:

    500 MB/s = 8 bit * 125 MHz * 4


    In case of a shared memory RAM operator:

    PARALLELup -> RAM -> PARALLELdn


    Increase the parallelism before the RAM and reduce it after the RAM again.

    The increase is depending on the amount of RAMs and their bandwidth needs.


    The correspondingly changed VA dresign is attched here:

    mE5_MA_VCX-QP_Single_Porj_AB_SpeedUp_BRudde.va


    Best regards,

  • Dear Bjorn


    I am not quite sure about the structure of the shared memory concept.


    In Shared Memory Concept


    When a design utilizes all 4 RAM resources, each of the 4 RAM based operators can have up to 1.6 GB/s exclusive bandwidth, minus the efficiency factor of that particular operator.


    So a RAM 's maximum bandwidth is 1.6 GB/s?


    RAM Operator name bit depth parallelism LinkBandwidth(125MHz)
    RAM1 8 8 1 GB/s
    RAM3 9 8 1.125 GB/s
    ffc_factor 64 1 1 GB/s



    Why the design will not meet the expected performance?




    In Shared Memory Concept


    Due to the shared bandwidth architecture, the applet developer should utilize all 256 bits of the operator’s memory interface (RAM Data Width) to achieve maximal throughput through the memory interface when using multiple RAM based operators even though the single RAM operator needs less bandwidth on its input.


    RAM Operator name bit depth parallelism bandwidth
    RAM1 8 32 256 bits
    RAM3 9 32 288 bits
    ffc_factor 64 4 256 bits

    mE5 marathon VCX-QP maximum RAM Data Width is 512 bits.


    So set RAMs bandwidth to 256 bits?



    Thank you.


    Jesse

  • Dear Bjorn


    I try to reduce image height to 512 .

    And change parameter from mE5_MA_VCX-QP_Single_Porj_AB_SpeedUp_BRudde.va

    module Parameter Name Value
    Process0/Capture/TrgBoxLine YLength 512
    Process0/EdgexFilter/RAM1 YLength 512
    Process0/EdgexFilter/projection_v/get_last_line/value Nember 511
    Process0/EdgexFilter/ffc_factor YLength 512
    Process0/EdgexFilter/RAM3 YLength 512
    Process0/Segment/get_last_line/value Number 511
    Process0/DMA_Source Height 512
    Process0/DMA_Filter Height 512


    The output data is not synchronized. Test video[Link]


    Is this design can dynamic reduce image height?[Height range : 512~1024]


    The attachment is the final version in the project.

    The line rate will up to 76923.


    Thanks.


    Jesse

  • Dear Bjorn


    I try to modify ffc_factor (RAM) buffer height to 512 and load image height of 512.


    Than Sync ffc_factor height to max.Now can dynamic modify image height!


    But I need to try many times to get this result.It will take me a lot of time.Because it takes about an hour to build it once.

    This is why I want to ask how to calculate about the design.



    Jesse

  • Hi Jesse,


    To make things more easy to understand:


    The mE5-MA-VCX-QP is using a 512 bit wide shared memory concept.

    This supports up to 12.8 GB/s for the mentioned platform.

    ( VA-documentation Appendix: Ressource table: RAM Bandwidth total (shared) )


    If you use the maximum possible link-width around all RAM opertaors the bandwidth will be at maximum:

    512 bit <= parallelism * bit-depth


    You can try this once and check if everything is fine.

    Then you do not need to do a re-synthesis.


    Please consider that additional factors may influence you system-bandwidth:

    - Limited DMA performance due to mainboard-specification

    - There is a load-balancing in between of the memory buffers:

    - write is of higher priority than read.

    - all RAMs have same priority