Posts by Pierre Chatelier

    But unlike LoadCoefficients, "UpdateROI" cannot be set to 0, so it is always set at 1.

    I think that here again, this is a documentation problem :

    I have juste checked that indeed, the CoefficientsBuffer ROI seems to be taken into account (regarding the fps), after writing 1 into UpdateROI *even if the value is already 1*

    Do you confirm ?

    Should it be the case, I also heavily suggest that the documentation of UpdateROI is updated to explain that !


    For instance, in my own GUI, since UpdateROI has only a single possible value, no GUI event is raised if I rewrite "1" in the associated NumericUpDown control, and thus I do not propagate to the Sgc_setIntegerValue(). That's why I got tricked !

    It means that I should call Sgc_setxxxValue() even if the new value is the same as the current one.

    I have got an additional question.

    If I want to set an ROI, I dynamically set the width and height of the:

    -camera Genicam parameters (mandatory)

    -applet ImageBuffer width and height (optional, but better bandwidth)

    -applet SelectROI width and height (optional, but better bandwidth)

    -applet CoefficientBuffer XLength and YLength (and offset) (required for Shading to be meaningful)

    and it kind of works, but...

    Regarding the bandwidth, since the "real" BufferWidth and BufferHeight of the CoefficientBuffer operator are static and set to 5120x5120 (actually 320x5120 because of the tricky use of the width when using 4x(64b@2x) per file), I have observed that my fps is limited by that hard-coded "full frame" dimensions.

    For instance, in 3520x2000, while it should be ~250fps (that is what I observe in an applet without shading), I am limited to 75fps here.
    I think that it would work if I built a specific version of the applet with a CoefficientBuffer operator adapted to 3520x2000 (i.e. 220x2000), but it is not very handy.

    Am I right in my investigation ? Is there another solution to use a customizable ROI at higher FPS when using a CoefficientBuffer operator ?

    Thanks for your help.
    I summarize here the different information from this thread :

    -The current version of the Appendix.Device Resources states:

    "Due to the shared bandwidth architecture, the applet developer should utilize all 256 bits of the operator’s memory interface (RAM Data Width)"

    But the "256" here is just a specific case, the real "RAM Data Width" may be different (and found in the same documentation for each board model type), and it is 512 for the MA-VCX-QP. (that's why I tried 64b@4x at first instead of 64b@8x for the output of the CoefficientBuffers)
    I suggest that you modify a little that sentence of the doc.

    -For a board using the Shared memory, it is indeed written in the doc that all RAM operators should use the same data width for performance, but I did not realize that it implied to "artifically" ParallelizingUp my 8b@32x camera stream to 8b@64x for the InfiniteSource ImageBuffer storage. It is perfectly logical afterwards, but not trivial at first. You may also insist on that point in the documentation.

    -The current version of CoefficientBuffer (VA 3.1.2) has technical limitations that makes it tricky to use at full bandwidth. Your discussion thread CoefficientBuffer: Maximum performance... is really important.
    I suggest that you include it in the VA documentation (or work on a new, less tricky, CoefficientBuffer operator!)

    -In the shared memory model, no performance gain will be observed by splitting a CoefficientBuffer into several CoefficientBuffer operators, as long as the full data width is properly used.

    Success !

    Setting the CoefficientBuffers to 8x(64b@2x) is OK. Now I have my 50 fps. (and I have already made and tested the program to split my Coefficent .tif files into 128b block parts).

    But I still don't understand why SyncToMax is needed instead of SyncToMin since all image sizes are identical.

    Some other compilations are still going on, I will update the thread with other results to make an exhaustive report.


    The design with 8x(64b@2x) coefficients is not yet compiled, I just reported a screen capture of your applet while the compilation is on.

    This is just an informative report showing that for now I have similar figures as yours.

    To avoid Overflow, I found that the speed limit is a period of 83950, which is ~1488Hz for the 125Mhz clock.
    On your machine you seem to achieve 1600Hz.
    As soon as I have the Shading applet compiled, I will report the figures here.



    During the compilation, I can already provide you some information :

    First, the performance of my board


    Second, the test of your applet:


    As you can see, it is around 1500fps rather than 1600 fps.

    I don't understand two things :

    -I couldn't run your applet under MicroDisplay without my camera to be detected (if no camera is found, I cannot start the applet, the buttons are disabled)

    -The ROI(ImageBuffer) >FillLevel is at 75%, so the fifo is full ?!

    I have checked that just setting the input Imagebuffer to 512b/clock is not enough.



    A CXP Camera can usually be configured either to be in "free run mode", or to be slave of the trigger signal.

    You should look into its genicam configuration.

    With MicroDisplay, you can use Tools > GenicamExplorer, and look for nodes :

    Acquisition Control> Acquisition Mode (something like "Continuous" means "free run", and the camera use its "Acquisition Control > Acquisition frame rate"). If you set the Acquisition mode to "CoaXPress", you can be enslaved to triggers coming from the OptoTrigger, for instance. Some cameras may also have "Single Frame" or "Line 0", when using native ports (not CoaXPress) for the trigger, documented by the manufacturer of the camera.

    For some cameras, it will be rather "Acquisition Control > Trigger Mode" and "Acquisition Control > Trigger Source"


    I have a performance problem with a Shading applet on MA-VCX-QP, when adding CoefficientBuffer operators (inspired by the Shading example of VA install folder)
    I take care of the bandwidth and there should be no problem, but my design encounters a very low functional limit.

    The board is a Marathon MA-VCX-QP,

    The camera is a Mono8 5120x5120@80fps, but I only target 50 fps for this board.

    The targeted camera bandwidth is 1250MB/s ~ 1.23 GB/s
    After the CXPQuadCamera operator, the native 8b@32x is downcast to 8b@16x after the input InfiniteSource ImageBuffer.
    Then the VA design contains a shading algorithm.

    If I submit dummy constants to the shading algorithm instead of reading CoefficientBuffers, it works @50fps.

    Now, I want to read shading coefficients in a CoefficientBuffer.
    According to the "Shared memory" documentation of the MA-VCX-QP, the optimal bandwidth should be of width 256b, so I first configured the CoefficientBuffer to output 64b@4x, with proper CastToParallel/ImageFifo/ParallelDn to transform the input file from TIFF 16b to 16b@16x shading information.

    In that case, the design won't run at more than 20fps, which is far from what the MA-VCX-QP RAM bandwidth could sustain, even with shared memory.

    I tried several variants, boosting the CoefficientBuffer output to 64b@8x, or reducing the data to 8b@16x shading information. I tried to add additional ImageBuffers as FIFOs. I tried many things, but I always have that ~20fps limit,

    I will post soon screenshots of the different failing designs .

    I definitely have the same problems as in this thread : Bandwidth statistics not available

    I have here a design that should work easily, even with shared memory. I am currently building a dozen different designs to show you that I can never achieve a correct performance as soon as I add a CoefficientBuffer.
    I will post screen capture of all the different tries on monday.
    I hope there will be a solution.

    I think you can close this thread, a similar problem is now reported in MA-VCX-QP performance problem when using CoefficientBuffer

    I have a 5120x5120@80fps Mono8 camera.
    With the standard Acq_SingleCXP6x4AreaGray, I can run it without overflow at ~65fps, which is coherent with the MA-VCX-QP bandwidth.
    But if I try to build the simplest Visual Applet, even if simulation and compilation are OK, I only get less that 1fps of corrupted output.

    Is there a specific trick to handle the fact that the MA-VCX-QP won't output more than 8b@16x, while the camera has a minimum parallelism of 20x ?

    Link to the applet CXP-Mono8-5120x5120.VA


    In the above example, pixels are split between 2x4 bits, and an ImageFiFo has been added before the ParallelDn.

    According to the sample code of such an applet, they are not necessary, but I tried, since I have no clue yet.

    Does your answer also cover the following question :

    In that design, the bandwidth analysis is over 3000 MB/s (for the part where the info is available), which is quite good.

    However, when I use it on a 4672x3416@148fps camera, the output is ok for low fame rates (~100fps) and becomes corrupted near 122fps.

    That limit of 120 fps is ~1900 MB/s, which is largely under 3000 MB/s.

    Do I have any tool to understand if there is a bottleneck ? How can I find a strategy to fix that ?

    The limitation does not come from the PC, since the simplest design (just camera->image buffer->DmaOutput) runs OK at 148fps.

    Do you mean that when using a camera ROI, all the following operators anywhere in the design must be adapted manually at run-time ?



    This is a pain that it cannot be factorized in some single variable.

    I don't understand that limitation. Even if the ImageBuffer has an XLength greater that the real ROI, since it is smart enough to handle EOL and keep correct frame dimensions, why does it limit the performance ?

    Yes I am using Microdisplay.
    Here is the doc "This parameter is used to start loading of coefficient images into the buffer before the image acquisition starts. The loading is triggered by a write cycle of value one to this parameter. Writing value 0 does not cause the loading of the coefficient files."

    It is not clear that loading is triggered when changing the value from 0 to 1. I just thought that the parameter had to be 1 all the time, and that 0 had no meaning.

    Apart from that, case solved.

    I have a design that runs for a CXP camera, Mono8, 4672x3416@148fps, externally synchronized through CoaXPress on an OptoTrigger.

    This is a classic design : camera->split in low/high bytes->store in two Infinite imagebuffers->merge pixel->SelectROI operator->DmaOutput…

    The problem occurs when I set the camera to 1024x1000@1000fps. I also set the SelectROI Width/Height to 1024x1000

    In that case, which has a *lower* datarate than 4672x3416@148fps, the image buffers are filled at 75% and there are overflows. I have to reduce the sync signal to ~830fps to get a flawless behaviour.

    But I can make it work if I also manually set both image buffers XLength and YLength to 1024x1000.

    I would like to get rid of this manual operation, since I expect Image Buffers to automatically adapt themselves to smaller images.

    I tried different other strategies without success:

    -put the SelectROI before the Image buffers (it's even worse)

    -split the pixels in 4 images buffers of 2 bits instead if 2 image buffers of 4 bits (it does not change anything)

    -limit the camera parallelization to x20 instead of x32 (it's obviously worse, but I tried)