# Posts by silverfly

• ## The calculation of the next pixel directly depends on the calculation result of the previous pixel

Anyway, thank you very much.

It is worth mentioning that my question comes from the semi-global matching algorithm[1] in stereo matching, which may be of interest to you. This is a very common algorithm in stereo vision.

I also found a problem when designing a stereo matching algorithm. When a design contains a large number of operators (for example, several thousand operators, which is common when the disparity level of stereo matching is 128 or greater), VisualApplets will becomes very stuck or even exits abnormally. Is this a known problem?

[1] Hirschmuller H. Stereo processing by semiglobal matching and mutual information[J]. IEEE Transactions on pattern analysis and machine intelligence, 2007, 30(2): 328-341.

• ## The calculation of the next pixel directly depends on the calculation result of the previous pixel

Thank you for the exquisite double loop reference design.

Basically, you decompose the two-dimensional image into line images first, so that the processing results of the previous image lines can be fed back to the current line through a loop; then, the line image is decomposed into individual pixels to provide a pixel-level loop.

I will try to integrate the double loop method into my own design. Thank you very much!

In addition, since the loop is at the pixel level, the design can only work under one pixel parallelism.

You only achieved a processing speed of 3.2fps for a 1024x1024 image at a clock frequency of 125MHz. Since each pixel is executed sequentially and there is no pipeline between pixels, the processing time of an image is the sum of the individual processing time of all pixels. With this calculation, the operation of each pixel (calculating the mean value) requires about 39 clock cycles. Is this reasonable?

The processing speed of 3.2fps is not enough for my application. Do you have any additional optimization suggestions, such as how the clock fraction mentioned by Johannes Trein on the third floor is implemented in VisualApplets.

• ## The calculation of the next pixel directly depends on the calculation result of the previous pixel

Thank you for taking the time to consider this problem

• ## The calculation of the next pixel directly depends on the calculation result of the previous pixel

"recursive" is just what I want, which is different from ordinary template filtering, because operators such as firkernel or pixelneighbor can only get the original image pixel, but what I hope is that the filtering result of the current pixel can be immediately available to the next pixel.

I would be very grateful if you can provide a basic design like this.

• ## The calculation of the next pixel directly depends on the calculation result of the previous pixel

I hope to simplify and make the problem more concrete, assuming that the op operation is to find the average of the upper left neighborhood of the pixel P, that is, op(P)=average(A,B,C,D).

However, the pixels A, B, C, and D are also the mean values of their respective upper left neighborhoods, rather than the original image gray values.

Is it possible to provide such a simple va program?

• ## The calculation of the next pixel directly depends on the calculation result of the previous pixel

I thought about using the loop method to solve this problem, but there are two difficulties.

The first is that the granularity of the loop is a pixel, not an image, how to solve the synchronization problem between the current pixel and the previous pixel;

The other is that when calculating P, I need not only pixel A, but also pixels B, C, and D. Since the line buffer cannot provide a definite delay, how can I locate pixels B, C, and D?

• ## The calculation of the next pixel directly depends on the calculation result of the previous pixel

sgm.JPG

Let me explain in detail, suppose I want to calculate an operation op for pixel P, but this calculation process depends on the calculation result of the same operation op for pixels A, B, C and D. How can i deal with this problem？

Using operators similar to FIRkernel can only get the result of the previous operation on pixels A, B, C and D, not op.

• ## The calculation of the next pixel directly depends on the calculation result of the previous pixel

As mentioned in the Subject, is it possible to provide such a design example, the calculation of a certain operation of the next pixel directly depends on the calculation result of the same operation of the previous pixel.

This is quite different from operations such as cascade of two FIR filters or DILATE followed by ERODE, because the result obtained by operators such as FIRkernel is still the result of the previous operation. What I want is that the calculation result of one operation is immediately available for the same operation on the next pixel.

• ## How to serialize two image sequences using InsertImage operator?

Sorry, Björn, I have kept you waiting.

I am a bit confused about the results of our DMA test. Two logs have been uploaded. Please look at it.

Also, as Johannes said, I tried to use Action Command to start the second camera, but the settings at Firmware could not be edited.

• log.zip

• 捕获2.PNG

• ## Is there any possibility to use more than 4 operators consisting of off-chip RAM resources?

Is there any possibility to use more than 4 operators consisting of off-chip RAM resources? (From the physical FPGA level, not just the VisualApplets design level)

We want to merge the pictures collected by the four cameras into one process for processing, which requires 4 buffers. But our follow-up process needs additional buffers.

Take VQ4 capture card as an example, each off-chip RAM has 128M, and we don't actually need that much storage space. Is it possible to divide a physical DDR into more, such as 8 virtual banks, to provide more RAM-based operator usage rights.

• ## How to serialize two image sequences using InsertImage operator?

I watched your video and I fully understand what you mean.

But it just doesn't work on our machine. It can barely run when both cameras are 10fps.

This is a bit weird. We will find out what is going on.

Thank you, Johannes.

• ## How to serialize two image sequences using InsertImage operator?

Hi, Carmen Z!

We don't have a license for the Parameters Library, but I can be sure that our address generation is correct. In order to increase the transmission bandwidth, the 4 parallelisms of the camera input are merged into a kernel. You can check this carefully.

We are testing the effect of the FrameBufferRandomRd operator on bandwidth. We tried to use 8 kernels for input images and 2 parallelisms for address generation and still couldn't reach our expected bandwidth. I wonder if you can answer our concerns.

• ## How to serialize two image sequences using InsertImage operator?

We later performed experiments in microDisplayX. For sequence images generated by ImageSequence and subsequent InsertImage operations, timeout errors will still be reported soon.

We are working on FrameBufferRandomRd operator for our design. The design block diagram is shown in the screenshot below. We also verified the design in microDisplayX. Although the timeout error is not reported, the path after the InsertImage operator still does not reach the bandwidth we expect, and the two FrameBufferRandomRd operators will quickly overflow (you can rule out that the two cameras are out of sync).

In addition, we cannot find the "StartAcquisition" parameter in GenICam, and the new version of microDisplayX seems to be powerless for 64bit DmaToPC output.

• ## How to serialize two image sequences using InsertImage operator?

I need the entire image sequence, just to make it easier to synthesize, so only one image is output.

Thanks, I will try to use the FrameBufferRandomRd operator to build what I need instead.

By the way, the design I provided has been successfully synthesized, and it reports timeout at runtime instead of timing errors during synthesis.

• ## How to serialize two image sequences using InsertImage operator?

As shown, I want to serialize the image sequence of the two cameras.

Why does the synthesized bitstream file fail and always report timeout?

• ## How to use PARALLELdn operator after RemoveImage gracefully to reduce parallelism without having to use frame buffer?

Thanks, Johannes.

I should consider the camera's peak transmission rate.

Thank you for your detailed guidance.

• ## How to use PARALLELdn operator after RemoveImage gracefully to reduce parallelism without having to use frame buffer?

Okay, thank you!

I still don't know how to calculate the size of the buffer. Could you illustrate with my example, where the clock frequency is 62.5MHz.

In my understanding, after using the RemoveImage operator to reduce the parallelism, an ImageFifo close to the size of an image frame is required anyway. Specificly, it should be ImageSize * OutputParallelism / InputParallelism, which is an unaffordable resource overhead.

• ## How to use PARALLELdn operator after RemoveImage gracefully to reduce parallelism without having to use frame buffer?

How to use PARALLELdn operator after RemoveImage gracefully to reduce parallelism without having to use frame buffer?

For example, the input parallelism is 4, and now RemoveImage is used to delete three images in a continuous sequence of four images and only one is retained. The remaining processing logic is actually sufficient when the parallelism is 1. We use PARALLELdn to reduce the parallelism to 1 in order to reduce resources, but must we use a frame buffer for this? After all, even if one image cannot be transmitted in time after the parallelism is reduced to 1.

• ## VisualApplets 3.0 's bug for CastBitWidth Operator

There is no problem in VA 3.1.2! It's just a specific version bug.

• ## VisualApplets 3.0 's bug for CastBitWidth Operator

OK, thank you!