test script
import tensorflow as tf
graph_def_file = "./mobilenet_v1_1.0_224/frozen_graph.pb"
input_arrays = ["input"]
output_arrays = ["MobilenetV1/Predictions/Softmax"]
converter = tf.lite.TFLiteConverter.from_frozen_graph(
graph_def_file, input_arrays, output_arrays)
tflite_model = converter.convert()
open("converted_model.tflite", "wb").write(tflite_model)
graph transform call graph
The call graph converts tensorflow model into tflite model.
tensorflow/lite/python/lite.py:TFLiteConverter::convert
tensorflow/lite/python/convert.py:toco_convert_impl
tensorflow/lite/python/convert.py:toco_convert_protos -> will call toco_from_protos -> modified to start gdbserver instead
Note:
- In
convert.py
, post train quantization is triggered by_is_post_training_optimize
- In r1.14, it is being triggered if any optimization is set (optimization settings currently does not matter)
- or if only int8 ops is used.
- If a representative dataset is provided, the activation will also be quantized
- it is being performed at the last step _calibrate_quantize_model
- this step will also quantize weight, I suppose this helps reduce complexity.
- in weight only mode, the inference input/output type need to be float
- I suppose it is caused by the lack of information on data path. (todo)
- these infomation is transferred using toco_flag
- I think using a adaptive data structure is good
tensorflow/lite/toco/python/toco_from_protos.py -> this is will be called in command line
pywrap_tensorflow::TocoConvert -> lite/toco/python/toco.i -> lite/toco/python/toco.h:TocoConvert
lite/toco/toco_tooling.cc:toco::Import
lite/toco/toco_tooling.cc:toco::Transform
lite/toco/toco_tooling.cc:toco::Export
lite/toco/toco_tooling.cc:toco::TransformWithStatus
toco::RunGraphTransformationsWithStatus
toco::GraphTransformationPass -> position of transformations
Some helpful gdb setup lines
Read more here
Questions to be answered.
- In what way how does graph transformation change the graph?
- basically, change graph to tflite format.
- todo: review more possible optimize here.
- How to determine quantization range?
- how each operation is performed?
- They have optimized implementation in lite/kernels
- Does not seem to be gpu-optimized.
- Be aware of possibly different interpreter / op_resolver.
variable regularization
regularizer = tf.contrib.layers.l2_regularizer(scale=0.1)
tf.layers.conv2d(kernel_regularizer=regularizer)
reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
loss = tf.add_n([base_loss] + reg_losses, name="loss")