I have a performance problem with a Shading applet on MA-VCX-QP, when adding CoefficientBuffer operators (inspired by the Shading example of VA install folder)
I take care of the bandwidth and there should be no problem, but my design encounters a very low functional limit.
The board is a Marathon MA-VCX-QP,
The camera is a Mono8 5120x5120@80fps, but I only target 50 fps for this board.
The targeted camera bandwidth is 1250MB/s ~ 1.23 GB/s
After the CXPQuadCamera operator, the native 8b@32x is downcast to 8b@16x after the input InfiniteSource ImageBuffer.
Then the VA design contains a shading algorithm.
If I submit dummy constants to the shading algorithm instead of reading CoefficientBuffers, it works @50fps.
Now, I want to read shading coefficients in a CoefficientBuffer.
According to the "Shared memory" documentation of the MA-VCX-QP, the optimal bandwidth should be of width 256b, so I first configured the CoefficientBuffer to output 64b@4x, with proper CastToParallel/ImageFifo/ParallelDn to transform the input file from TIFF 16b to 16b@16x shading information.
In that case, the design won't run at more than 20fps, which is far from what the MA-VCX-QP RAM bandwidth could sustain, even with shared memory.
I tried several variants, boosting the CoefficientBuffer output to 64b@8x, or reducing the data to 8b@16x shading information. I tried to add additional ImageBuffers as FIFOs. I tried many things, but I always have that ~20fps limit,
I will post soon screenshots of the different failing designs .