If you don't detect any major mistake in my design, case closed for me.
Thanks for the detailed explanations. It will be useful.
But for the current OTSU design, I only compared the figures given by the bandwidth test of VA itself, not a real measurement. For this test, without the "WaitBuffer" I can set Mean and Peak to 800 MB/s and get a green light (801 will give red light). With the "WaitBuffer", 640 for both Avg and Peak is the max.
I wondered why adding that ImageBuffer had a so huge effect on simulated bandwidth.
Here is the new design : Otsu-optimized.zip
With the "Waitbuffer" (that avoids the deadlock), the simulation goes up to 640MB/s.
Without the "WaitBuffer" (but that did not work on a real board), the simulation went up to 800MB/s.
I would like to understand why, so that I can take that in account in future designs and expected performance when dimensioning the required resources.
This is the first time I use SyncToMax, now I understand better what it does !
As you mention, I have replaced the "pixel replicator" instances by isFirstPixel+Register after a SyncToMax.
The algorithm still works, but the simulation shows the exact same bandwidth.
It does not seem to optimize the badnwidth. It even consumes a little more FPGA resources.
I tried to split the Wait Buffer in "low" and "high" bits, but is still the same, it does not help increasing the bandwidth.
I have added an ImageBuffer on the "original image" branch before the last sync operator. It seems to remove the deadlock that was not shown by the simulation.
However, the bandwidth test now drops from 800MB/s to 600MB/s.
I finally succeeded.
For instance, in the example of the previous message, where I wanted to multiply each pixel of the histogram by the sum of all values in the histogram, I found out that I can ;
-on an out branch of the historgam, perform RowSum
-on that branch, use "RemovePixel" for pixels 0-254, and keep pixel 255
-replicate that last pixel 256 times
-sync with the regular histogram output
It seems to do exactly what I want.
Now, I have a design that works (I have implemented an adaptive binarization based on histogram analysis).
Simulation works, visual result on dummy images is OK, bandwidth analysis works up to 800 MB/s (for max and mean)
But guess what ? Once compiled on the board, with a real camera, I just get no images. The DmaToPC never gets a frameIndex > 0.
The board is a MicroEnable5 VD8-PoCL
The camera runs at 1280x1024@50fps
How can I investigate where the problem occurs ? Since the simulation works perfectly, I am very surprised.
However, I still don't understand something about synchronization.
Imagine (for the sake of simplicity, this is just an example) that I want to multiply each "pixel" of the histogram by the sum of all the values in the histogram.
I can make a FIFO of 256 values to store the incoming histogram values, while making a "framesum" on a parallel link to compute the sum of all histogram values.
But then, how can I apply the final sum to all the FIFO values ? I can't find which operator to use to "block" the FIFO output 255 times, so that once unblocked, the total sum is up-to-date and will be properly applied to the 256 stored values.
I guess I have to use IsLastPixel and RemovePixel, but I can't make it work.
I can't find a way to find the location of a min/max data in a data1D or data2D.
For instance, imagine I want to implement an OTSU binarization on 8-bit images. The idea is to compute an histogram, then find the good threshold maximizing the separation of pixels into two classes. I could just compute in parallel the 256 possible threshold separations, and select the one maximizing some formula. But I would need to detect a max and apply it to some branch select. Is it possible ?
Other use case : I perform a convolution with a kernel, and want to find the location of the maximum result of the convoution to perform some work around that location.
Is it possible with standard operators ?
Hi, sorry for the delay.
A few more details :
-for my tests, I always used a zero offset of the ROI, so there is at least no start alignment problem
-"please let me know what you mean by FG_WIDTH" : actually you are right, my applets do not have a FG_WIDTH parameter. But my software has a generic behaviour which queries FG_WIDTH and FG_HEIGHT presence and modify them when possible, so that it works with standard applets as well. But you are right, this is not releveant for the current problem.
-"If you have a mismatch it looks like in your attached image" : we agree that it looks like an alignment problem, I am just puzzled that it occurs at a width of 1152 and not 1184, because if there was a misalignment, it should occur in both cases.
-I can't see the problem in VA simulation.
See screenshot below. The Select_ROI is set to 2336x1728, while the image is 1140x1728 (I cannot use 1152 because of x20 parallelism)
I have a very basic design ending with a Select_ROI just before the DMA output.
This is for a camera of max size 2336x1728, so the links max dimensions are set to 2336x1728
Usually, everything works well. On the software size, I always allocate buffers of size 2336x1728 for the acquisition image FIFO (even when using ROIs, see below)
If I set a ROI on the CAMERA, let's say 1184x1728, and also modify the FG_WIDTH of the applet to 1184, it still works. Even if the Select_ROI is still configured at 2336x1728.
But If I go down from 1184 to 1152 (sill inthe camera and FG_WIDTH), strange behaviour occurs and the image seems corrupted. To fix it, I have to also set the Select_ROI width to 1152.
I notice that 1152 is just below the half width of 2336, so something happens there.
Any clue ?
Here are screenshots of correct image at 1184x1728 and corrupted at 1152x1728
I have an applet design that is using "append image".
Is it possible from the SiSo SDK to send a command to the frame grabber, or to the applet, to "reset and clear" all the buffers and specifically the "append image" operator, so that it restarts from the beginning of the concatenation ?