I managed, after some time and after finding your thread and a short sentence in the documentation that increasing parallelism will not increase bandwidth, to get my applet working with a CoefficientBuffer. However, I did not want to use more than one Link as this would make it much more complicated for the user. Before I used (BitWidth = 16, Par = 64) and casted to (BitWidth = 8 , Par = 16). I optimized it by starting the otherway round (BitWidth = 64, Par = 16; see attached image). So my main question is why is the Bit Width not a part of the table above? Would you still think I should use more than one link?
Why is it not possible to improve the operator by increasing the parallelism...I have to admit that it did not quite get the explanation in the documentation.