lite/toco/graph_transformations/quantize.cc
call graph
getArray -> check for minmax
ChooseQuantizationForOperatorInput
ChooseQuantizationForOperatorOutput
how to determine minmax
Some can be imported from the tensorflow model.
But they can also be calculated from dataset.
input
ChooseQuantizationForOperatorInput
- bias is handled specially
- min max is determined differently in training, ie not numerical minmax
-
GetOrComputeMinMax
- todo: what is the actual method
-
-
ChooseQuantizationParams
-
scale = (rmax - rmin) / (qmax_double - qmin_double)
r -> float min max q -> quantized value min max
-
zero_point_from_min = qmin_double - rmin / scale zero_point_from_max = qmax_double - rmax / scale zero_point_from_min_error = std::abs(qmin_double) + std::abs(rmin / scale) zero_point_from_max_error = std::abs(qmax_double) + std::abs(rmax / scale)
The actual zero point is chosen from the smaller error side. Todo: determine why it is done like this. It relates to numeric stability.
- the zero point is nudged into quantized range todo: not sure why
-
output
ChooseQuantizationForOperatorOutput
how to quantize
QuantizeBuffer const auto inverse_scale = 1. / quantization_params.scale; quantization_params.zero_point + inverse_scale * src_val;
quantized = 1 / scale * float + zero_point
weight only quantize
- weight only quantization is done in
Export > lite/tools/optimize/quantize_weight.cc:QuantizeWeightsInternal
- record quantizable input
InsertQuantizableInputTensorsFromOperator
GetWeightInputIndices
only a handful of op is supported
- symmetric quantize.
- add dequantize when ops need float input.
dataset quantize
- will disable weight only quantize
- a extension step that based on same model memorys
lite/python/lite.py:_calibrate_quantize_model
lite/tools/optimize/quantize_model.cc:QuantizeModel
- feed input to interpreter
lite/python/optimize/calibration_wrapper.cc
- minmax information is stored here
- QuantizeWeightsInputOutput
- weight is stored in constant buffer
lite/toco/model.h:array::buffer
- tensor is stored in allocator
- operator is supported via
lite/tools/optimize/operator_property.cc:GetOperatorProperty
- weight is stored in constant buffer
- ApplyConstraints
- QuantizeBiases
- SetInputAndOutputTypes