Operator RemovePixel requires many ressources at high parallelisms. This is because a removing pixel can be at any position in the parallel pixel. This causes a complex implementation full of barell shifters.
However, in many cases the number of remaining pixel is very low. If you want to remove 90% of the pixel anyway you can implement a two stage solution. First, remove all parallel words if no pixel if left. Second, remove the unwanted pixel left.
The output is exaclty the same compared to using a single RemovePixel operator but requires much less ressources. The only difference is that the two stage solution will use a new output parallelism.
The attached design shows a little example with simulation data.