Posts by Johannes Trein

Johannes Trein · Nov 21st 2018

Hello Jayasuriya,

We totally revised the C# SDK interface in the current runtime versiones. I recommend to use the current Runtime 5.6 which includes the C# interface. See subdirectory \SDKWrapper\CSharpWrapper

See the documenation in http://www.siliconsoftware.de/…per/doc/CSharpWrapper.pdf

You can find the full setup in https://silicon.software/file-…-v5-6-win64-with-applets/

Let me know if you prefer the old version instead.

BR

Johannes

Johannes Trein · Nov 15th 2018

Hi Jesse

Quote from Jesse Lin

Are CoefficientBuffer's maximum memory size and bandwidth different with marathon?

We have to say that the CoefficientBuffer is not very easy to use. It needs an update.

However, it can be fully used with the full bandwidth and performance. So in mE5VQ8-CXP6D one operator can use 256MiB (Mebibyte = 256 * 2^20 Byte) and a theoretic speed of 3.2GByte/s.

Johannes Trein · Nov 14th 2018

Hello Jesse

Thank you for your post. You obtained a very good understanding of VisualApplets.

Quote from Jesse Lin

The platform microEnable 5marathon(mE5_VCX-QP) has shared memory concept.

So these RAMs data width needmore than 64*1(RMA1) + 8*8(RAM2) + 8*9(RAM3) = 200 bit.

Then increase parallelism to 32 before RAM1 and RAM2, and set ffc_factor parallelism to 4.

So these RAMs will share 256 bit. Then RAMs all have enough bandwidth. Is it right?

You are using a mE5-MA-VCX-QP. The total bandwidth of this platform is 12.8GB/s. To use the full bandwidth you need to use all 512bit = 64 byte of each DRAM.

Now lets have a look at your configuration:

- RAM1:

Required 1200 MPixel/s (Max CXP6x2 Speed)

Use parallelism = 32

--> 1200 MP/s / 32 * 64 byte * 2 = 4.8GByte/s used in RAM1 (*2 because of read and write)

If you use parallelism = 64 instead you will only use 2.4GByte/s in this RAM

- RAM2:

Same as RAM1: 4.8GByte/s. Because you are using 6Bit/s you cannot use parallelism 64,

- ffc_factor:

Required: 1200 MPixel/s. Because of 16Bit --> 2400 MB/s

Unfortunately operator CoefficientBuffer is inefficient for the configuration with only one link. See explanations in this post: CoefficientBuffer: Maximum memory size and bandwidth on marathon frame grabbers pasted-from-clipboard.png

So you need to change it to a configuration with 8 output links:

Therefore the total required RAM bandwidth is

RAM1: 2400MB/s (at parallelism 64)

RAM2: 4800MB/s

ffc_factor: 2400 MB/s

Total = 9600 which is less than the theoretic maximum of 12800 MB/s. Therefore the memory bandwidth is enough.

Quote from Jesse Lin

If move this design run on mE5VQ8-CXP6B. mE5VQ8-CXP6B doesn’t has shared memory concept.

The RAMs have independent data width. So I do not modify the parallelism to 32 and RAMs all have enough bandwidth?

On mE5VQ8-CXP6D you also have a bandwith of 3.2 GB/s for each of the four individual DRAMs. The data width is only 128 bit.

RAM1: Parallelism 16

RAM2: Parallelism 16 -> you will need to use two ImageBuffer operators in parallel

ffc_factor: 4 outputs at parallelism 2

Quote from Jesse Lin

This design use 96% LUT with mE5_MA_VCX-QP. It is possible to add applet in the future.

I try to reduce parallelism to 4, and modify board frequency to 250 MHz(mE5_MA_VCX-QP_Dual_250MHz.va). Compilation will fail(CmopileError.PNG).

If I want increase board frequency, Is there any detail in design that needs attention?

You can change the FPGA clock for marathon frame grabbers but we cannot guarantee that you will meet the timing requirements of the FPGA during the build process. In practice it will always work with 125MHz. Up to 160MHz you have good chances to meet the timing. Everything above will most likely not work correct.

The DRAM will not get faster when you change the FPGA clock. It it will only affect the speed of processing between the operators i.e. less parallelism required.

In your case you need to save some FPGA resources. You are using 96% LUT but only a few of the embedded ALU types.

Here are some tricks to reduce LUT and use ALU:

1. Use ADD instead of FIRoperator like for the mean filter:

pasted-from-clipboard.png

2. Use the same idea for the Gauss and Laplace filter

pasted-from-clipboard.png

3. Replace SCALE by CONST and Mult operator. Mult will use ALU, Scale will use LUT

4. Use DIV at low parallelism

I hope my information will help you to solve this project.

BR

Johannes

Johannes Trein · Nov 2nd 2018

Hi Mike,

today I could test the apple in frame grabber hardware on FPGA. The "rotate90_simple.va" runs sufficiently fast enough for your application.

See the following screenshot.

We get 1024x1224*47fps = 58MPixel/s which is already the theoretic maximum of parallelism 1. As you will need 20fps only the applet is much faster than your requirements.

pasted-from-clipboard.png

Applet "rotate90_fast.va" can be used for faster inputs like non-downsampling or Camera Link inputs. I needed to increase the parallelism after the buffer to use the fast speed. So the "fast" file above is incomplete. I will update this file for others having the same request.

Johannes

Johannes Trein · Nov 2nd 2018

internal remark: FR 8617

Johannes Trein · Oct 31st 2018

Hi Mike

if your camera runs at 2448 x 2048 * 20fps with 12 Bit packed format the bandwidth will be 150MByte/s which is more than the theoretical maximum of Gigabit Ethernet vision.

Anyway I made two designs which fulfill your requirements.

You can fully simulate the design and test the maximum bandwidth in hardware using the build in pattern generator. You will need a Silicon Software mE4-VQ4GE FPGA frame grabber together with VisualApplets for testing. The applet can be adapted to other frame grabbers but needs some modifications because of shared DDR3 memory compared to the DDR2 on the microEnable IV.

Note: DRAM gets slow when it comes to non-linear write or read access. That's why such a simple task of 90° rotation is difficult for both technologies FGPA frame grabbers but also standard PC systems. The "fast" implementation uses all four DRAMs to increase the bandwidth.

At this moment I have no access to frame grabber hardware so I could not measure the resulting bandwidth. Once I have access to the hardware I will do the measurement.

pasted-from-clipboard.png

BR

Johannes

Johannes Trein · Oct 31st 2018

Hi Jayasuriya

Quote from JSuriya

Yes the applet that you have provided is working well!. Lets consider a scenario like, " if i generate 200 images and it has a pixel with value 1 at the 100th row of it, then in simulation it will be separated as two images each of height 100 where in the first image only a pixel 1 occurs". Here I want to set that particular pixel value to 1 for the next 10 images i.e from 101 to 110 lines of the 2nd image in the simulation , but in the output I get pixel value 1 at the 100 th row only. Could you please give me a solution where this SetToSequence works even between set of images?

It will work correct in hardware. In the simulation the 1D protocoll is considered as 2D images. Therefore, if you want to simulate a sequence of 200 frames you need to set SetToSequence_To1D_LinesToSimulate = 200 to get a single 1D image.

I like to mention one more thing: If the number of past images is much more than 10 or dynamic you should consider a loop operation. Check the rolling average examples in the VA documentation for this.

Johannes

Johannes Trein · Oct 30th 2018

Hi Jayasuriya

the attached VA design should solve your task.

It generates a random pattern representing your input images. Next all small images are appended to a single image of infinite height i.e. 1D image. Now we can check if one of the pixel in the same column of the last 10 rows was 1 and output 1.

To simulate the design you need to set the simulation cycles to 100 which is the same value as in SetToSequence_To1D_LinesToSimulate.

pasted-from-clipboard.png

BR

Johannes

Johannes Trein · Oct 30th 2018

Hi Mike,

image rotation is a task for an FPGA which cannot be solved by the one algorithm. Factors are bandwidth, FPGA and board generation as well as specifications for the rotation.

To you need a fixed rotation by 90, 180, 270°? Or a variable roation by e.g. 15°? Should the rotation angle be dynamic or static?

We have different examples included in the VisualApplets example list. See the examples in http://www.siliconsoftware.de/…ric%20Transformation.html

These examples can be adapted for rotation only. Depending on the bandwidth requirements they work well with small rotation angles.

For 90° rotations the implementation will always depend on the image dimensions and bandwidth requirements.

I hope my post will give you some ideas for your further work on this.

BR

Johannes

Johannes Trein · Oct 30th 2018

Hi Jesse,

This seems to be a program issue. I like to ask you to write your question directly to the Silicon Software support team support@silicon-software.de

We will update this post after the support case is closed.

BR

Johannes

Johannes Trein · Aug 28th 2018

Hi Simon

welcome to the forum. Before searching for a solution based on TCL maybe another feature will help you: Did you notice a new setting for the location of simulation image files:

pasted-from-clipboard.png

It is available from VA 3.1.0

Maybe that'll help you.

Other than that you need to build absolute paths for all simulation images when adding them to the simulation. Note that you can use regular expressions within the TCL.

Johannes

Johannes Trein · Aug 27th 2018

Hi Theo,

we have a pretty elegant solution using restart markers within the JPEG stream to separate it to multiple operators. We can get up to 6 JPEG encoders in a design which results in a datarate of 6 * 300MB/s = 1800 MB/s on a mE5-MA-VCX-QP.

The design is quite tricky but I am sure you'll understand. We'll send you a VA design and SDK example. Please give us some days to finish it.

Johannes

Johannes Trein · Aug 23rd 2018

Hi Mike

welcome to the forum and thank you for the question. Others might have this question, too. I don't have my own example right now but let me share the example of a collegue. Maybe he can add some more information.

C

// runs with CV 2.4.11


#include "stdafx.h"
#include <stdio.h>
#include <time.h>

//#include "board_and_dll_chooser.h"


#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>

#include <fgrab_struct.h>
#include <fgrab_prototyp.h>
#include <fgrab_define.h>
#include <SisoDisplay.h>

using namespace cv;
using namespace std;

int ErrorMessage(Fg_Struct *fg)
{
    int            error    = Fg_getLastErrorNumber(fg);
    const char*    err_str = Fg_getLastErrorDescription(fg);
    fprintf(stderr,"Error: %d : %s\n",error,err_str);
    return error;
}

int ErrorMessageWait(Fg_Struct *fg)
{
    int            error    = ErrorMessage(fg);
    printf (" ... press ENTER to continue\n");
    getchar();
    return error;
}


int main(int argc, char* argv[], char* envp[]){


    int nr_of_buffer    =    8;            // Number of memory buffer
    int nBoard            =    0; //selectBoardDialog();            // Board Number
    int nCamPort        =    PORT_A;        // Port (PORT_A / PORT_B)
    int MaxPics            =    100000;        // Number of images to grab
    int status = 0;
    Fg_Struct *fg;

    //Mat image;


    const char *dllName = NULL;

    int boardType = 0;

    boardType = Fg_getBoardType(nBoard);

    dllName = "DualAreaGray16"; 

    // Initialization of the microEnable frame grabber
    if((fg = Fg_Init(dllName,nBoard)) == NULL) {
        status = ErrorMessageWait(fg);
        return status;
    }
    fprintf(stdout,"Init Grabber ok\n");

    // Setting the image size
    int width    = 640;
    int height    = 480;
    if (Fg_setParameter(fg,FG_WIDTH,&width,nCamPort) < 0) {
        status = ErrorMessageWait(fg);
        return status;
    }
    if (Fg_setParameter(fg,FG_HEIGHT,&height,nCamPort) < 0) {
        status = ErrorMessageWait(fg);
        return status;
    }

    //GENERATOR
    int gen = 0;
    if (Fg_setParameter(fg,FG_GEN_ENABLE,&gen,nCamPort) < 0) {
        status = ErrorMessageWait(fg);
        return status;
    }
    
    int gen_rol = 1;
    if (Fg_setParameter(fg,FG_GEN_ROLL,&gen_rol,nCamPort) < 0) {
        status = ErrorMessageWait(fg);
        return status;
    }
    
    fprintf(stdout,"Set Image Size on port %d (w: %d,h: %d) ok\n",nCamPort,width,height);


    // Memory allocation
    int format;
    Fg_getParameter(fg,FG_FORMAT,&format,nCamPort);
    size_t bytesPerPixel = 1;
    switch(format){
    case FG_GRAY:    bytesPerPixel = 1; break;
    case FG_GRAY16:    bytesPerPixel = 2; break;
    case FG_COL24:    bytesPerPixel = 3; break;
    case FG_COL32:    bytesPerPixel = 4; break;
    case FG_COL30:    bytesPerPixel = 5; break;
    case FG_COL48:    bytesPerPixel = 6; break;
    }
    size_t totalBufSize = width*height*nr_of_buffer*bytesPerPixel;
    dma_mem *pMem0;
    if((pMem0 = Fg_AllocMemEx(fg,totalBufSize,nr_of_buffer)) == NULL){
        status = ErrorMessageWait(fg);
        return status;
    } else {
        fprintf(stdout,"%d framebuffer allocated for port %d ok\n",nr_of_buffer,nCamPort);
    }

    // Creating a display window for image output
    int Bits = 8;

    int nId = ::CreateDisplay(Bits,width,height);
    SetBufferWidth(nId,width,height);
    if((Fg_AcquireEx(fg,nCamPort,GRAB_INFINITE,ACQ_STANDARD,pMem0)) < 0){
        status = ErrorMessageWait(fg);
        return status;
    }
    // ====================================================
    // MAIN LOOP
    frameindex_t lastPicNr = 0;
    while((lastPicNr = Fg_getLastPicNumberBlockingEx(fg,lastPicNr+1,nCamPort,10,pMem0))< MaxPics) {
        if(lastPicNr <0){
            status = ErrorMessageWait(fg);
            Fg_stopAcquireEx(fg,nCamPort,pMem0,0);
            Fg_FreeMemEx(fg, pMem0);
            Fg_FreeGrabber(fg);
            CloseDisplay(nId);
            return status;
        }
        ::DrawBuffer(nId,Fg_getImagePtrEx(fg,lastPicNr,0,pMem0),(int)lastPicNr,"");

        //Mat any_img = takePhoto();
        char* dataBuffer = (char*)Fg_getImagePtrEx(fg,lastPicNr,0,pMem0); //getBufferOfPhoto();
        Mat any_img(Size(width, height), CV_8UC1, dataBuffer, Mat::AUTO_STEP);
        Mat color;

        //printf("width: %u\n", any_img.size().width);
        //printf("height: %u\n", any_img.size().height);

        printf("pic: %d\n", lastPicNr); 

        namedWindow( "Display window raw", WINDOW_AUTOSIZE ); // Create a window for display.
        imshow( "Display window raw", any_img ); // Show our image inside it.

        cvtColor(any_img, color, CV_BayerGB2BGR); // CV_BayerGB2BGR works with PhotonFokus MV-640

        namedWindow( "Display window color", WINDOW_AUTOSIZE ); // Create a window for display.
        imshow( "Display window color", color ); // Show our image inside it.


        //Sleep(1000);
        //
        waitKey(1); // Wait for a keystroke in the window
        //Sleep(100);

    }

    // ====================================================
    // Freeing the grabber resource
    Fg_stopAcquireEx(fg,nCamPort,pMem0,0);
    Fg_FreeMemEx(fg, pMem0);
    Fg_FreeGrabber(fg);

    CloseDisplay(nId);

    return FG_OK;
}

Display More

Johannes Trein · Aug 23rd 2018

Hi Theo

welcome to the forum and thank you for the question. You are right, a process without a DMA channel will be started immediately after loading the applet (Fg_Init function) and canot be reset or restart. For processes including one or more DMA channels you can stop and start the acquisition (Fg_StopAcquire(Ex) and Fg_StartAcquire(Ex)) or the "Play" and "Stop" Button in microDisplay. See doc Processes without DMAs / Trigger Processes

I can only immagine two solutions:

1st: Add a dummy DMA operator to the process. It does not need to transfer any data if you keep the timeout long enough. However in this solution you "waste" FPGA ressources for this extra DMA.

2nd: You did mention the second solution already: Add reset signals to the inputs of the signal operators. You can conveniently do this by adding a TxSignalLink operators before each of the reset inputs. However, this solution is only for signal operators and won't work for operators in the 0D, 1D or 2D domain.

In addition you can wait until the pipeline got empty or transfer signals between processes using TxSignalLink and RxSignalLink.

So far we did not have this request before but I will generate a feature request so that it can be analyzed by our PM.

Johannes

Johannes Trein · Aug 23rd 2018

Hello Wutianmeng,

welcome to the forum. So you modified this design so that it is capable to process the LXG 500M? Or need help to modify the design? Can you please attach the VA design to your post?

Johannes

Johannes Trein · Aug 20th 2018

The attached VA design is a simple example for color plane separation for RGB+IR intput and separated output. The example is made for CL-Medium 4 Tap 8 Bit cameras sending the data interleaved i.e. R + G + B + IR in the four taps. It works for the JAI Sweep+ series or RGBIR cameras on the Silicon Software mE5-MA-VCL FPGA frame grabber but can easily adapted to others.

The example uses the straight forward solution. There are many other options using a single DRAM operator for implementation only. The actual best solution depends on the requirements.

Johannes Trein · Aug 17th 2018

In this example we stretch a histogram. The histogram is calculated for the

input image. The lower starting point as well es end point in the histogram is considered

as black and white. The input image is shifted by the black offset, and scaled by the white gain

so that the histogram is in the range 0 to 255.

To eliminate noise and single pixel from the calculation the minimum number of pixel threshold

is used.

H-Box MakeBadImage will make a picture having a bad histogram on purpose.

pasted-from-clipboard.png

Johannes Trein · Aug 15th 2018

Operator RemovePixel requires many ressources at high parallelisms. This is because a removing pixel can be at any position in the parallel pixel. This causes a complex implementation full of barell shifters.

However, in many cases the number of remaining pixel is very low. If you want to remove 90% of the pixel anyway you can implement a two stage solution. First, remove all parallel words if no pixel if left. Second, remove the unwanted pixel left.

The output is exaclty the same compared to using a single RemovePixel operator but requires much less ressources. The only difference is that the two stage solution will use a new output parallelism.

The attached design shows a little example with simulation data.

pasted-from-clipboard.png

Johannes Trein · Aug 13th 2018

The Silicon Software AcquisitionApplets have a build in trigger sequencer and queue.

If trigger inputs exceed the minimum allowed output period, pulses are queued and delayed. This will kill the timing but will keep the trigger synchronized. The example is made for a Silicon Software mE5-MA-VCL frame grabber but can be easily adapted to any other FPGA frame grabber.

The attached example has the following functions:

- external trigger input or software trigger

- trigger pulse multiplication 1:N

- trigger queue

- trigger output period limitation

- exsync output

pasted-from-clipboard.png

The implementation was tested with a logic analyzer. The following screenshot shows an input period of 5ms. The applet is configured to a multiplication of 2 pulses with a minimum output period of 2.5ms.

pasted-from-clipboard.png

If we increase the input period, the output will be delayed and the queue will be filled with pulses.

pasted-from-clipboard.png

If the gap between input pulses gets sufficiently large enough, the queue will compensate it. In the example 5 input pulses with a period of 2.9ms will generate 10 output pulses with a period of 2500 ms.

pasted-from-clipboard.png

NOTE: This example is not trigger scaling

If you want to scale an input trigger e.g. an encoder signal by a multiplication and division factor you need to measure the input period, scale it and generate pulses. This is not the purpose of this example.

Johannes Trein · Aug 1st 2018

To get the width of each line into a pixel value use the attached sample. It will count the length of each line and will output the width as a pixel.

Extension: If you need to get only the width of a frame you need to remove all lines except the one for measurement.

Debugging Library operators: If you just need to read the width of an image using a parameter you can use the operators of the debugging library instead.

pasted-from-clipboard.png

See attached VA design file.

Posts by Johannes Trein

Silicon Software Dot Net SDK related issues

RAM data width confirm with mE5_MA_VCX-QP and mE5VQ8-CXP6

RAM data width confirm with mE5_MA_VCX-QP and mE5VQ8-CXP6

FPGA Image Rotation by 90° on microEnable IV

Reseting operators, image buffer and signals

FPGA Image Rotation by 90° on microEnable IV

How to set the particular pixel value for a sequence of images?

How to set the particular pixel value for a sequence of images?

FPGA Image Rotation by 90° on microEnable IV

GenICam Explorer save parameters will not respond

Running a tcl script with relative paths

Maximum Datarate | JPEG_Encoder_Gray

How do you integrate VisualApplets with OpenCV ?

Reseting operators, image buffer and signals

Update of LAG VA design for Baumer LXG 500M

Planar R + G + B + NIR output for RGBNIR cameras like JAI LQ-201-CL, LQ-401-CL or SW-2001Q-CL from the Sweep+ series

A simple example on Histogram Stretching

An alternative to RemovePixel at high parallelisms to reduce the required ressources.

Implement a trigger pulse queue and trigger pulse multiplication

Get the width of each line into a link