Paddle-Lite is a set of lightweight inference engine which is fully functional, easy to use and then performs well. Lightweighting is reflected in the use of fewer bits to represent the weight and activation of the neural network, which can greatly reduce the size of the model, solve the problem of limited storage space of the mobile device, and the inference speed is better than other frameworks on the whole.
In PaddleClas, we uses Paddle-Lite to evaluate the performance on the mobile device, in this section we uses the MobileNetV1
model trained on the ImageNet1k
dataset as an example to introduce how to use Paddle-Lite
to evaluate the model speed on the mobile terminal (evaluated on SD855)
tools/export_model.py
, the specific way of transform is as follows.python tools/export_model.py -m MobileNetV1 -p pretrained/MobileNetV1_pretrained/ -o inference/MobileNetV1
Finally the model
and parmas
can be saved in inference/MobileNetV1
.
adb shell getprop ro.product.cpu.abi
wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v8
If the ARM version is v7, the v7 benchmark_bin file should be downloaded, the command is as follow.
wget -c https://paddle-inference-dist.bj.bcebos.com/PaddleLite/benchmark_0/benchmark_bin_v7
After the PC and mobile phone are successfully connected, use the following command to start the model evaluation.
sh deploy/lite/benchmark/benchmark.sh ./benchmark_bin_v8 ./inference result_armv8.txt true
Where ./benchmark_bin_v8
is the path of the benchmark binary file, ./inference
is the path of all the models that need to be evaluated, result_armv8.txt
is the result file, and the final parameter true
means that the model will be optimized before evaluation. Eventually, the evaluation result file of result_armv8.txt
will be saved in the current folder. The specific performances are as follows.
PaddleLite Benchmark
Threads=1 Warmup=10 Repeats=30
MobileNetV1 min = 30.89100 max = 30.73600 average = 30.79750
Threads=2 Warmup=10 Repeats=30
MobileNetV1 min = 18.26600 max = 18.14000 average = 18.21637
Threads=4 Warmup=10 Repeats=30
MobileNetV1 min = 10.03200 max = 9.94300 average = 9.97627
Here is the model inference speed under different number of threads, the unit is FPS, taking model on one threads as an example, the average speed of MobileNetV1 on SD855 is 30.79750FPS
.
In II.III section, we mention that the model will be optimized before evaluation, here you can first optimize the model, and then directly load the optimized model for speed evaluation
Paddle-Lite
In Paddle-Lite, we provides multiple strategies to automatically optimize the original training model, which contain Quantify, Subgraph fusion, Hybrid scheduling, Kernel optimization and so on. In order to make the optimization more convenient and easy to use, we provide opt tools to automatically complete the optimization steps and output a lightweight, optimal and executable model in Paddle-Lite, which can be downloaded on Paddle-Lite Model Optimization Page. Here we take MacOS
as our development environment, downloadopt_mac model optimization tools and use the following commands to optimize the model.
model_file="../MobileNetV1/model"
param_file="../MobileNetV1/params"
opt_models_dir="./opt_models"
mkdir ${opt_models_dir}
./opt_mac --model_file=${model_file} \
--param_file=${param_file} \
--valid_targets=arm \
--optimize_out_type=naive_buffer \
--prefer_int8_kernel=false \
--optimize_out=${opt_models_dir}/MobileNetV1
Where the model_file
and param_file
are exported model file and the file address respectively, after transforming successfully, the MobileNetV1.nb
will be saved in opt_models
Use the benchmark_bin file to load the optimized model for evaluation. The commands are as follows.
bash benchmark.sh ./benchmark_bin_v8 ./opt_models result_armv8.txt
Finally the result is saved in result_armv8.txt
and shown as follow.
PaddleLite Benchmark
Threads=1 Warmup=10 Repeats=30
MobileNetV1_lite min = 30.89500 max = 30.78500 average = 30.84173
Threads=2 Warmup=10 Repeats=30
MobileNetV1_lite min = 18.25300 max = 18.11000 average = 18.18017
Threads=4 Warmup=10 Repeats=30
MobileNetV1_lite min = 10.00600 max = 9.90000 average = 9.96177
Taking the model on one threads as an example, the average speed of MobileNetV1 on SD855 is 30.84173FPS
.
More specific parameter explanation and Paddle-Lite usage can refer to Paddle-Lite docs。
Вы можете оставить комментарий после Вход в систему
Неприемлемый контент может быть отображен здесь и не будет показан на странице. Вы можете проверить и изменить его с помощью соответствующей функции редактирования.
Если вы подтверждаете, что содержание не содержит непристойной лексики/перенаправления на рекламу/насилия/вульгарной порнографии/нарушений/пиратства/ложного/незначительного или незаконного контента, связанного с национальными законами и предписаниями, вы можете нажать «Отправить» для подачи апелляции, и мы обработаем ее как можно скорее.
Опубликовать ( 0 )