Tflite Quantize

lite/toco/graph_transformations/quantize.cc

getArray -> check for minmax
ChooseQuantizationForOperatorInput
ChooseQuantizationForOperatorOutput

But they can also be calculated from dataset.

ChooseQuantizationForOperatorInput

bias is handled specially
min max is determined differently in training, ie not numerical minmax
- GetOrComputeMinMax
- todo: what is the actual method
ChooseQuantizationParams
- scale = (rmax - rmin) / (qmax_double - qmin_double)
  
  r -> float min max q -> quantized value min max
- zero_point_from_min = qmin_double - rmin / scale zero_point_from_max = qmax_double - rmax / scale zero_point_from_min_error = std::abs(qmin_double) + std::abs(rmin / scale) zero_point_from_max_error = std::abs(qmax_double) + std::abs(rmax / scale)
  
  The actual zero point is chosen from the smaller error side. Todo: determine why it is done like this. It relates to numeric stability.
- the zero point is nudged into quantized range todo: not sure why

ChooseQuantizationForOperatorOutput

QuantizeBuffer const auto inverse_scale = 1. / quantization_params.scale; quantization_params.zero_point + inverse_scale * src_val;

quantized = 1 / scale * float + zero_point

weight only quantization is done in Export > lite/tools/optimize/quantize_weight.cc:QuantizeWeightsInternal
record quantizable input InsertQuantizableInputTensorsFromOperator
1. GetWeightInputIndices only a handful of op is supported
symmetric quantize.
add dequantize when ops need float input.

lite/python/lite.py:_calibrate_quantize_model lite/tools/optimize/quantize_model.cc:QuantizeModel

feed input to interpreter
1. lite/python/optimize/calibration_wrapper.cc
2. minmax information is stored here
QuantizeWeightsInputOutput
1. weight is stored in constant buffer lite/toco/model.h:array::buffer
2. tensor is stored in allocator
3. operator is supported via lite/tools/optimize/operator_property.cc:GetOperatorProperty
ApplyConstraints
QuantizeBiases
SetInputAndOutputTypes