Posts by Johannes Trein

    Hello InShin


    The VisualApplets examples covers designs for Tap Geometry Sorting. See https://docs.baslerweb.com/vis…20Geometry%20Sorting.html


    However, there is no example for 2X2E. Therefore, I made a quick update of the 2XE design to 2X2E. At this moment I do not have a RAW image to verify the correct function but I think it can be a good starter for you to check if it is working for your. Use the simulation for verification. Just record a RAW image with microDisplayX and load it to the simulation source.


    See attached.


    Your design looks good as well, but uses a parallel down to 1 in between. This will cause a reduction to a mean and peak bandwidth of e.g. 125MPixel/s.


    Johannes

    Hello Pierre


    thanks for this interesting question and good explanations.


    Let me start with the bad news. You wrote:

    It is a high-performance query counter and related to Microsoft's query performance counter for Windows®." Does that mean that the timestamp is not generated on the FG but in the Host PC ?

    Indeed the timestamp is not generated by the FG. It is a timestamp by the host PC and not reliable to do any assumptions concerning the trigger pulses. In fact the times of receiving DMA image transfers and such events can totally get mixed up.


    So you need to generate a reliable timestamp inside the frame grabber and use this one. You mentioned several options already. I think the most simple one will be a large counter i.e. 64 bit counting at every clock cycle (operator PulseCounter @125MHz). Latch the counter value either with the image transfer (PixelToImage) by the camera or the GPI1 signal (RemovePixel of all counter values except when a trigger is present) and transfer the results either using a second DMA channel or as an image trailer.


    I hope that'll help you. Feel free to ask further more detailed questions.


    Johannes

    Hello Sangrae Kim


    Note that the images needs to be 8 bit per Pixel to fit to the 64 bit of CoefficientBuffer. A 16 Bit pixel image cannot directly be uploaded.

    I added a simulation only H-Box to the design to let you know how to generate the images.

    pasted-from-clipboard.png


    Open to view:

    pasted-from-clipboard.png


    Load any image to the SimulationSource module.

    Set the pixel alignment if your image is not a 16 bit image.

    Start the simulation and save the results of the two probes to image files.

    Set these images files in CoefficientBuffer and start the simulation again. The CoefficientBuffer will then use the images. Check the simulation probe of the design to verify your results.


    I've attached two sample images which can be loaded directly to the CoefficientBuffers.

    Let me know your feedback.


    See attached ZIP


    Best regards

    Johannes

    Hello Sangrae Kim,


    You will need the following memory bandwith:


    3104 x 3088 * 30fps = 288 MPixel/s.


    To buffer the camera image (assume 16Bit per pixel) = 288 MPixel/s * 2 byte * 2 for read and write = 1150 MB/s

    For CoefficientBuffer you need 288MPixel/s * 2 images * 2 byte = 1150 MB/s

    So in total your design requires a memory bandwidth of 2.3 GByte/s. The marathon VCL has theoretical total of 6400 MB/s so we are good here.


    Operator ImageBuffer needs to be used at parallelism 16 to get the full performance. Otherwise you will not use all memory cells.


    CoefficientBuffer is a little tricky. In this thread you can see how CoefficientBuffer needs to be parameterized to get the maximum performance. LINK

    From the table we can figure out that the operator will only run in full performance when using four output links in parallel at a parallelism of two and a bit width of 64.


    pasted-from-clipboard.png



    See attached file (untested).

    Let me know if you have further questions.


    Johannes

    In addition to the great posts above for the use of fixed point arithmetic:


    Notation: u(5,3) means unsigned, 5 integer and 3 fractional bits.


    There is a simple rule for fixed point arithmetic.

    For additions or subtractions you need to use the same number of fractional bits at the input. The result will have the same number of fractional bits. Integer bits will increase by one.

    So u(5,3) + u(9,3) will become u(10,3)

    e.g. 1.5 + 2.25 = 3.75 --> 12 + 18 = 30


    For multiplications and division the number of fractional bits at the inputs can differ. For multiplications the resulting fractional bits will be the sum of the input fractional bits. The resulting integer bits will be the sum of the input integer bits.

    So u(5,3) * u(2,8) will become u(7,11)

    e.g. 1.5 * 2.25 = 3.375 --> 12 * 576 = 6912


    Johannes

    Hi Theo


    see some examples of the alternative implementation to ImageSequence as Björn mentioned with FrameBufferRandomRd in the HDR examples: https://docs.baslerweb.com/liv…gh%20Dynamic%20Range.html


    Copy & Paste of the old ImageSequence operator into a CXP project is allowed as the operator was available in earlier versions for CXP. But it is very likely that you get a timing error. Moreover the operator can only be used with parallelism of four which makes it very inefficient at the share memory concept.


    Johannes

    Hello Sangrae Kim


    you will need to do the De-Bayer before the distortion correction. So your bandwidth will get 3 times higher.

    An implementation on VQ4-GE is possible but the bandwidth might be limited. So it totally depends on the bandwidth requirements and the distortion factor. i.e. several pixel of just between 0 and 2 pixel.


    Johannes

    Hi Jesse


    I guess you get an overflow in your design.

    CXP6x2 can have at maximum 1200 MB/s. If you reduce the parallelism to 8 you can process 1000 MP/s at maximum. Even if there are gaps between the images you cannot process this burst speed.


    So I changed the order of your operators. First do the calculation and remove all unnecessary lines. Next place a FIFO which has to store a single line instead of a whole frame and after that you can reduce the paralellism if the mean camera bandwidth will allow that.


    See attached.


    Johannes

    Hi


    what you are requesting is a IIR filter instead of an FFR filter. The problem here is that you will need to calculate a current step before getting the new pixel. Any parallelism in a pipeline stage will therefore impossible. This makes this operation so very slow.


    A direct approach in VisualApplets does not exist but you can use loops for your requirement. Instead of the existing loop examples where lines or frames are processed you will need to process a single pixel inside the loop.


    Before going into detail you should be aware of the bandwidth limitation. A fraction of the FPGA clock will be possible.


    Johannes

    Hi Pier


    I added a Dummy FIFO with InfiniteSource = Enable to the design. See attached.


    Using SplitLine will create "End Of Line" markers which need to be inserted in the stream. As cameras cannot be stopped DRC will throw an error. However, our AppendLine operator will exactly remove this markers and generate a gap. So we need a dummy FIFO to trick the DRC.


    Johannes

    Hi Pier,


    there is a very simple solution to do this. Use TrgBoxLine and control the sequence length instead of the image height. To do so you need to append all camera lines into a single very long line and convert to 1D. Now TrgBoxLine can select the desired lines i.e. frames. Convert back to 2D and the image width.


    See attached.


    There exist much smoother options but this is simplest and easiest to use.


    Johannes

    Hi Pier


    as Simon wrote the bit position depends on the data packing. But assuming a byte per byte order you simply need to acquire using an 8 bit mono format und use CastParallel so that you get a 46 bit link.

    I would suggest to use a AcquisitionApplets 8 bit Gray and set the ROI width and height so that you can see the triangulation data. Check the first eight byte to get an idea where your data is located.


    Johannes

    While I was thinking about this again I figured out that my assumption is wrong.

    This will most likely generate blocking signals at the SYNC input and cause ImageBuffer overflows.

    The blocking signals will be generated at the second input. So PulseCounter is stopped while being unstoppable. So the implementation might cause a wrong synchronization but cannot explain the errors you see.


    Anyway please comment my first two phrases.

    ... so you are saying the SignalGate operators switch between whole frame and not lines. So the error will be very unlikely at this position. Even though you have to be very careful that you get an even distribution of even and odd images for the buffers.


    Could you check the fill levels at all ImageBuffer operators during acquisition? Is any of the operator fill level > 50%?

    Johannes

    Hello Arjun


    so you are saying the SignalGate operators switch between whole frame and not lines. So the error will be very unlikely at this position. Even though you have to be very careful that you get an even distribution of even and odd images for the buffers.


    Could you check the fill levels at all ImageBuffer operators during acquisition? Is any of the operator fill level > 50%?


    I think there is an error in module DocID. PulseCount will generate a pixel with every clock cycle. SYNC operator will perform a 0D to 2D synchronization. So for both inputs one pixel is required. This will most likely generate blocking signals at the SYNC input and cause ImageBuffer overflows. However you are saying it is working sometimes. I would assume it cannot work at all. So my assumption might be incomplete.

    pasted-from-clipboard.png


    I changed the design. See attached.


    Let us know your results.


    Johannes