Posts by SWe

SWe · Oct 20th 2020

Thank you for mentioning that algorithm. I heard of it, several times.

I have a big Design right now, too. VA is stable in my case, please ensure that you use the newest version, which is 3.2.1 right now. I encountered, that modifying links takes more time in big designs.

A workaround I use is, to develop single parts of the big design in smaller designs, which speeds up the development.

Aside that: I wish you sucess for the implementation!

SWe · Oct 19th 2020

Dear,

the only thing which came in my Mind was to decompose the single lines to a 0D-Stream, which would dramatically improve the inner loop latency (and therefore the bandwidth). Right now every Pixel doubles every operators latency as the subsequent EoL and EoF flags are present.

The only problem, which I couldn't solve until now is, that I would need a multiplexer like InsertImage on 0D-level.

Maybe you or someone else has an idea for that problem?

Best regards,

Simon

SWe · Oct 16th 2020

Dear,

I'v built a basic example which applies that filtering. It is based on a double loop approach. The outer loop is for the feedback of the old line (B, C, D), the inner loop is pixel based and feeds back the last value (A).

On Hardware (ME5-VCL, 125MHz) I only achieved 3.2fps with a (1024x1024)px image. But maybe it's fast enough for your application.

How to test on hardware:

Build, Flash, Load in MicroDisplay
Set the timeout of your ackquisition to really high value
Search for Source/SimDimension and Source/P32ROI and set the test image dimensions
PixelsToLineImage and LineImageToOneLine have to be set to the image width, Output/AppendTo2D has to be your image height
Start continous grabbing
Search for Source/Inject, Insert the file name of your test image to the parameter "ImageFile", Change "InjectFromFile" to yes
--> Now you have one test image in DRAM
Search for Source/Cam0_Loop1_Inject2, Set parameter "SelectSource" to 1
Search for Source/RUN, Set parameter "Mode" to High for continous operation, or to Pulse for a single shot (Toggle this for more single shots).
Look at the output

I hope this helps.

Best regards,

Simon

SWe · Oct 16th 2020

Dear Lucy,

please have a look at the modified version.

CarmenZ and I modified it to achieve more bandwidth.

Most of the magic is done in "Par8_ROI_Extract". There are 2 LUTs with the pixel coordinates to extract from the image. The box "FrameBufferRandomRd_Par8" is a special approach to get more bandwidth while being able to do random reads in DRAM. You can find this in the examples folder of VA: "Examples/Processing/Geometry/GeometricTransformation/GeometricTransformation_PixelReplicator.va". It's just a little bit stripped down. It is also really expensive, I hope the rest of your design fits to the FPGA. If not, please tell me.

Some changes are in "subtract_bkgnd/ImageSequence", where the three ImageFIFOs and the InsertImage are replaced by a FrameMemoryRandomRd with an address generator which repeats the input image three times.

I hope this helps.

Best regards,

Simon

SWe · Oct 15th 2020

Dear Lucy,

I'll try to optimize it for you.

May I know where the deadlock exactly was? I'm curious about that.

Best regards,

Simon

SWe · Oct 14th 2020

Dear,

you are appending 4x4 frames, which results in an image height of 16. The Simulation stops then:

Quote

(BRANCH) Process0\subtract_bkgnd\ImageSequence\CopySequence\I: Image height(16) exceeds the maximal link height(4)!

In runtime the behaviour of exceeded link dimensions is undefined! So maybe that's your main problem.

Aside from this I don't see a obvious deadlock problem. All of your FIFOs are big enough.

I see a bandwidth problem, as you are using a DRAM-operator at parallelism 1.

What is the frame rate of your camera?

On ME5 platforms the maximum possible parallelism should be used, as stated in the manual (Appendix. Device Resources/Shared Memory Concept):

Quote

Due to the shared bandwidth architecture, the applet developer should utilize all 256 bits of the operator’s memory interface (RAM Data Width) to achieve maximal throughput through the memory interface when using multiple RAM based operators even though the single RAM operator needs less bandwidth on its input.

Best regards,

Simon

SWe · Oct 13th 2020

Hi,

I've got to admit that I had a mistake in my thoughts. Doing calculations on the previous line operations (B, C, D) is no Problem.

But I haven't found a good way for calculating A yet.

I think about it, but this could take some time - and no guarantee for a solution.

Greetings

SWe · Oct 13th 2020

So you want to implement something, which provides P{0,0} = mean(P_{0,-1},P_{-1,-1},P_{-1,0},P_{-1,1})? Where P{y-relative,x-relative} is the "recursive" mean at it's position?

If you want to, I can provide you a basic design.

From a system theoretic point of view this should converge to something like a (masked) Gaussian mask. Maybe you can evaluate this and create a single filter mask?

Greetings

Simon

SWe · Oct 13th 2020

Hi,

it is possible to use a loop for single lines for B, C and D, as Johannes mentioned. But your image dimensions have to be constant. For A I would use a Pixel-Neighbours-Operator to do the calculation once again. Just pass a Kernel with all used arguments to that operator.

A far as I can see, parallelism > 1 could work, depends on the operations you have to do.

Greetings,

Simon

Edit: A kernel could be difficult - but Pixel-Neighbours is an O-Type, so you can use multiple in Parallel.

SWe · Oct 8th 2020

Dear Pier,

I would suggest to use the "SignalGate" operator.

For the "Gate" input I would use a "RS-FlipFlop", which is Set/Reset by your GPIOs,

Best regards

Simon

Edit: For a defined number of Frames, use a "PulseCounter" and count it via "FrameStartToSignal"

SWe · Sep 23rd 2020

Hello Community,

similar to my big Sqrt problem (Thread), I encountered the need for wide multiplications in one of my designs.

In this post I want to show you a solution for that problem I worked out.

My solution is to split the multiplication into two multiplications and add them after the multiplications:

BigMult.PNG

For signed values, I take the absolute value and do a manual complement afterwards:

BigMult2.PNG

In this example only the lower link is signed. If both links are signed, do the absolute value and the negative check for the upper link, too. Feed "neg A xor neg B" into the lower port of "Complement".

I added a test design for you. There are other options possible: you may use the same approach like in the Sqrt example to work with pseudo floating point data.

Best regards

Simon

SWe · Sep 23rd 2020

Hello,

I did some optimization:

As the sqrt operation only allows shift-increments by two, the adaptive shift operations can be done with 16 cases and not 32
The "highest bit set" logic can be done via combinatoric logic

This leads to the following improvements:

	Old	New
LUT	1218	789
Flip Flop	218	105

Best regards

Simon

SWe · Sep 16th 2020

Dear Pier,

thank you for the data.

I did an alternative approach for you. It's a little bit cheaper and not that error prone.

I cross checked with the simulation values of your design.

Best regards

Simon

SWe · Sep 16th 2020

Hello Community,

in my recent work I got a Problem: Sqrt for values bigger than 32 Bit.

Just shifting right with a hard coded number of Bits is not a good idea, since there may be small Values observed, too.

I want to share my solution I came up with.

What the design does:

Check which set bit's number is the most significant
Shift dynamically to the right -> "Anti fractional Bits", they have to be an even number, as the sqrt operation divides them by two
Perform the sqrt operation on the shifted value
Shift left by the "Anti fractional Bits" divided by two

Pro:

High precision

Con:

Extremely Expensive

It's just like the poor man's floating point

Please feel free to comment - maybe there is some ability to improve this.

Best regards

Simon

SWe · Sep 11th 2020

Dear Pier,

could you please post an image with example raw data of that camera's output?
I've read the document, but couldn't find the specification how the data is packed (and transfered).

Do you know what pixel data format is used?

What ROI dimensions do you use for the triangulation?

Best regards

Simon

SWe · Sep 1st 2020

Thank you for your reply.

I can't find something obvious, which may cause your Problems.

Have you double checked with the standard acquisition applet?

Maybe there is a problem with your CL cabling, or your encoder debounce is to short.

Do you have simulation images?

Greetings,

Simon

SWe · Sep 1st 2020

Hi Arjun,

do you have a specific reason to use two SignalGate operators?

Do you use that for your tap sorting?

I wouldn't recommend that, because for the signal based paths there is no internal synchronisation.

If I'm right with tap sorting: what camera do you use?

Greetings,

Simon

SWe · Aug 27th 2020

Thank you!

In my opinion, you should do some preprocessing (Maybe a Closing) to make sure, you pass only one (and the right!) Blob to the subtraction loop. You could select the right blob based on some other parameters, like area or bounding box, too.

Besides that, the design should work.

Greetings

Simon

SWe · Aug 27th 2020

Hey Mich,

I did some changes to your design. It should work now. You created yourself a deadlock with "InsertImage". To fix that "CreateBlankImage" is used to create an independent input for the dummy values. The loop now calculates the X- and Y-Difference in parallel, as I moved it to a kernel.

Do you use blob analysis more often in your Applications? I have a question online here, too - maybe you could help me with that? Link: BLOB-Features

If you have further questions: don't hesitate asking

Thanks and Greetings

Simon

SWe · Aug 27th 2020

Hey Mich,

I will have a look at it. The Idea of Theo seems really good in that case.

Do you have a simulation image for me?

Greetings

Simon

Posts by SWe

The calculation of the next pixel directly depends on the calculation result of the previous pixel

The calculation of the next pixel directly depends on the calculation result of the previous pixel

The calculation of the next pixel directly depends on the calculation result of the previous pixel

Deadlock somewhere

Deadlock somewhere

Deadlock somewhere

The calculation of the next pixel directly depends on the calculation result of the previous pixel

The calculation of the next pixel directly depends on the calculation result of the previous pixel

The calculation of the next pixel directly depends on the calculation result of the previous pixel

GigE area scan camera and image sequence

Multiplication for Inputs bigger than 32Bit "BigMult"

Sqrt operation for bit widths bigger than 32

Using a Teledyne Dalsa Genie Nano with Triangulation Firmware

Sqrt operation for bit widths bigger than 32

Using a Teledyne Dalsa Genie Nano with Triangulation Firmware

Image Acquisition from 8K Monochrome line scan camera - 8 tap mode - Scrambled image from Marathon VCL

Image Acquisition from 8K Monochrome line scan camera - 8 tap mode - Scrambled image from Marathon VCL

Storing blob data from previous frame

Storing blob data from previous frame

Storing blob data from previous frame