GPT2 using OpenVino speed up.
If you have python-3.9 installed you will have to install python-3.7.9 for this project to work. Follow the instructions below when building for first time (verified build on MacOS):
brew install pyenv # for syncing multitple versions on the machine
pip3 install virtualenv # virtual-environment maker, can use any other package
pyenv install 3.7.9 # install the specific version
pyenv local 3.7.9 # set local (this folder) version to 3.7.9
export LOCAL_PY_VER_PATH=`pyenv which python3` # set path for convinience
echo $LOCAL_PY_VER_PATH # [opt.] to check the path
$LOCAL_PY_VER_PATH -m venv . # using the path above build a virtual environment in this folder
source bin/activate # activate the local env
pip3 install -r requirements.txt # install run dependencies
When coming back to this project simply activate the virtualenv as and the rest will be ready for you:
source bin/activate
To get the model in the ONNX format first run the file convert.py
, this should dump gpt2.onnx
file.
python3 convert.py
For this you must first have openvino installed on your system. Download from here. Now I have added most of the requirements in my requirements.txt
file, however you should also install those for OpenVino. After that run the following commands to setup environment variables:
export OPENVINO_FOLDER="path/to/openvino_2021"
cd $OPENVINO_FOLDER/bin
source setupvars.sh
cd $OPENVINO_FOLDER/deployment_tools/model_optimizer
pip3 install install -r requirements.txt
pip3 install install -r requirements_onnx.txt
If everything works correctly you will see an output like this:
[setupvars.sh] OpenVINO environment initialized
Now come back to this repo, Openvino environment setup works correctly only if you are in the openvino_2021/bin
folder. Now we run the script mo_onnx.py
:
mo_onnx.py --help # to get meanings of arguments to be passed
mkdir full_precision half_precision # full_precision is FP36 and other is FP16
mo_onnx.py --input_model gpt2.onnx \
--data_type=FP32/FP16 \
--output_dir=full_precision/half_precision
If everything works correctly you should see 3 files in /fp32
folder:
gpt2.bin
gpt2.mapping
gpt2.xml
To check if everything works fine run the script run.py
. You should start seeing the outputs, the following is on the machine with following configuration:
MacBook Pro (13-inch, 2020, Four Thunderbolt 3 ports)
Processor: 2 GHz Quad-Core Intel Core i5
Memory: 16 GB 3733 MHz LPDDR4X
Graphics: Intel Iris Plus Graphics 1536 MB
The performance results are as follows (2x
boost):
----------------------------------------------------------------------
Loading Pytorch model
:: Pytorch inference in 0.59065s
----------------------------------------------------------------------
Creating Inference Engine...
Loading network
Loading IR to the plugin...
exec_net: <openvino.inference_engine.ie_api.ExecutableNetwork object at 0x12c531fb0>
:: OpenVino inference in 0.26206s
----------------------------------------------------------------------
In order to test generation capabilities you can pass --g
flag and get the following results:
----------------------------------------------------------------------
Loading Pytorch model
Text shape: torch.Size([1, 127])
:: Pytorch inference in 0.46476s
----------------------------------------------------------------------
Testing generation
:: Pytorch generation took (40 steps): 17.663s
----------------------------------------------------------------------
Creating Inference Engine...
Loading network
Loading IR to the plugin...
exec_net: <openvino.inference_engine.ie_api.ExecutableNetwork object at 0x130aaffb0>
:: OpenVino inference in 0.23262s
----------------------------------------------------------------------
Testing generation
:: OpenVino generation took (40 steps): 6.220s
----------------------------------------------------------------------
When running on AWS c5.12xlarge
and batching the data to 128
samples in a batch we see larger performance increase.
----------------------------------------------------------------------
Loading Pytorch model
Pytorch inference in 3.55126s
----------------------------------------------------------------------
Creating Inference Engine...
Loading network
Loading IR to the plugin...
exec_net: <openvino.inference_engine.ie_api.ExecutableNetwork object at 0x12c531fb0>
----------------------------------------------------------------------
OpenVino inference in 0.78668s
----------------------------------------------------------------------
Which is a 5x
boost. Using OpenVino benchmarking tool we saw even more power throughput working at 134.29ms
of first inference and 17ms
as average processing time across 3522
runs. This is a massive 209x speed improvement.
This proves our hypothesis that larger CPU machines can take advantage of OpenVino's performance in a super-liear fashion.