CoefficientBuffer: Maximum memory size and bandwidth on marathon frame grabbers

  • Operator CoefficientBuffer will not use the maximum available DRAM bandwidth and memory size of the frame grabber platform in each configuration. I made a list of configurations and measured the corresponding bandwidth in hardware:


    Grabber Configuration Bandwidth Max. Usable Size
    mE5-MA-VCL Grabber Max 6400 MB/s 512 MiB
    mE5-MA-VCL 1 Link, Par 1 999 MB/s* 128 MiB
    mE5-MA-VCL 1 Link, Par 2 1524 MB/s 128 MiB
    mE5-MA-VCL 1 Link, Par 4 1524 MB/s 128 MiB
    mE5-MA-VCL 1 Link, Par 8 1524 MB/s 128 MiB
    mE5-MA-VCL 2 Link, Par 1 1999 MB/s* 256 MiB
    mE5-MA-VCL 2 Link, Par 2 3048 MB/s 256 MiB
    mE5-MA-VCL 2 Link, Par 4 3048 MB/s 256 MiB
    mE5-MA-VCL 2 Link, Par 8 3048 MB/s 256 MiB
    mE5-MA-VCL 4 Link, Par 1 3999 MB/s* 512 MiB
    mE5-MA-VCL 4 Link, Par 2 6097 MB/s 512 MiB
    mE5-MA-VCL 4 Link, Par 4 6097 MB/s 512 MiB
    mE5-MA-VCL 4 Link, Par 8 6097 MB/s 512 MiB
    mE5-MA-VCX-QP Grabber Max 12800 MB/s 512 MiB
    mE5-MA-VCX-QP 1 Link, Par 1 999 MB/s* 64 MiB
    mE5-MA-VCX-QP 1 Link, Par 2 1525 MB/s 64 MiB
    mE5-MA-VCX-QP 1 Link, Par 4 1525 MB/s 64 MiB
    mE5-MA-VCX-QP 1 Link, Par 8 1525 MB/s 64 MiB
    mE5-MA-VCX-QP 4 Link, Par 1 3999 MB/s* 256 MiB
    mE5-MA-VCX-QP 4 Link, Par 2 6101 MB/s 256 MiB
    mE5-MA-VCX-QP 4 Link, Par 4 6101 MB/s 256 MiB
    mE5-MA-VCX-QP 4 Link, Par 8 6101 MB/s 256 MiB
    mE5-MA-VCX-QP 8 Link, Par 1 7999 MB/s* 512 MiB
    mE5-MA-VCX-QP 8 Link, Par 2 12201 MB/s 512 MiB
    mE5-MA-VCX-QP 8 Link, Par 4 12201 MB/s 512 MiB
    mE5-MA-VCX-QP 8 Link, Par 8 12201 MB/s 512 MiB


    *) Limited by Link configuration, not by DRAM

    Green = possible maximum obtained


    The size is valid for one operator. If you combine more operators you can use larger memory sizes.

    The bandwidh represents the speed of one operator. The operator will always use the maximum bandwith of the hardware. So if the bandwidth is not equal to the maximum of the platform the operator "wastes" the bandwidth. Other DRAM operators cannot use the unused bandwidth.

    There is one exception: If the link limits the bandwidth there is some bandwidth left for other operators.

    In most cases you need only a single at maximum speed. As you can see from the table above, the bandwidth is limited with a single link. So you need to use multiple links and cobine the data into a single link. The coefficient data needs to be distributed over multiple files in this case. Depending on the way you combine the links (operators MergePixel, MergeParallel, InsertLine) the memory layout will difer.


    The following screenshot shows a very simple link combination.


    pasted-from-clipboard.png


    Check out the VisualApplets documentation for memory configurations of the specific grabbers. LINK


    Johannes Trein
    Teamleader Applications and Development
    SiliconSoftware GmbH
    https://silicon.software/wp-content/uploads/siso-logo-animation.gif



  • Hi Johannes,


    I managed, after some time and after finding your thread and a short sentence in the documentation that increasing parallelism will not increase bandwidth, to get my applet working with a CoefficientBuffer. However, I did not want to use more than one Link as this would make it much more complicated for the user. Before I used (BitWidth = 16, Par = 64) and casted to (BitWidth = 8 , Par = 16). I optimized it by starting the otherway round (BitWidth = 64, Par = 16; see attached image). So my main question is why is the Bit Width not a part of the table above? Would you still think I should use more than one link?


    Why is it not possible to improve the operator by increasing the parallelism...I have to admit that it did not quite get the explanation in the documentation.


    Best regards,

    Theo



    forum.silicon.software/index.php?attachment/207/

    Files

    • Applet.PNG

      (26.3 kB, downloaded 1 times, last: )

    Edited once, last by Theo ().

  • Hi Theo

    So my main question is why is the Bit Width not a part of the table above? Would you still think I should use more than one link?

    The table already assumes a bit width of 64 to get to those values. I need to add more details to the post.


    I know the operator does not perform very well. Thank you for the feedback. I will forward it.


    Tip: It is difficult to prepare the CoefficientBuffer input files if you are using multiple links. If you don't want to write a software for testing you can use the VisualApplets simulation to generate the coefficient files.


    An alternative to CoefficientBuffer is RamLUT.


    Johannes


    Johannes Trein
    Teamleader Applications and Development
    SiliconSoftware GmbH
    https://silicon.software/wp-content/uploads/siso-logo-animation.gif