Compute the layer shifts
As explained in Compute the layer Q-formats, it is only for the fully connected and convolutional layers than the output Q-format can be defined independently from the input format.
Once you know the Q-format of the input and output of the fully connected and convolutional layers, then you can compute the bias shift and out shift. Those shifts are used to ensure that the result of the layer computation has the requested output Q-format.
If fi is the number of fractional bits for the input, fo for the output, fw for the weight and fb for the biases then:
The bias shift is : (fi + fw) - fb
The out shift is : (fi + fw) - fo
This quantization approach will not show whether problems might occur during the computations. For this reason, it is important to test the final behavior of the quantized network.
If you really want to know what is happening in the internal computations, like some possible saturations or sign inversions, the inference should be modified to keep track of the dynamic of the internal computations.
This modification is easy to do it in any language in which basic operations like addition and multiplication can be overloaded. But you may need to write your own inference implementation and use the CMSIS-NN implementation with overloaded basic operations.