GPU accelerated TensorFlow Lite / TensorRT applications.

$ sudo apt install libgles2-mesa-dev 
$ mkdir ~/work
$ mkdir ~/lib
$
$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh

2.1.2. build TensorFlow Lite library.

$ cd ~/work 
$ git clone https://github.com/terryky/tflite_gles_app.git
$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4.sh

(Tensorflow configure will start after a while. Please enter according to your environment)

$
$ ln -s tensorflow_r2.4 ./tensorflow
$
$ cp ./tensorflow/bazel-bin/tensorflow/lite/libtensorflowlite.so ~/lib
$ cp ./tensorflow/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so ~/lib

2.1.3. build an application.

$ cd ~/work/tflite_gles_app/gl2handpose
$ make -j4

2.1.4. run an application.

$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
$ cd ~/work/tflite_gles_app/gl2handpose
$ ./gl2handpose

2.2. Build for aarch64 Linux (Jetson Nano, Raspberry Pi)

2.2.1. build TensorFlow Lite library on Host PC.

(HostPC)$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$
(HostPC)$ mkdir ~/work
(HostPC)$ cd ~/work 
(HostPC)$ git clone https://github.com/terryky/tflite_gles_app.git
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4_aarch64.sh

# If you want to build XNNPACK-enabled TensorFlow Lite, use the following script.
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.4/build_libtflite_r2.4_with_xnnpack_aarch64.sh

(Tensorflow configure will start after a while. Please enter according to your environment)

2.2.2. copy Tensorflow Lite libraries to target Jetson / Raspi.

(HostPC)scp ~/work/tensorflow_r2.4/bazel-bin/tensorflow/lite/libtensorflowlite.so jetson@192.168.11.11:/home/jetson/lib
(HostPC)scp ~/work/tensorflow_r2.4/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so jetson@192.168.11.11:/home/jetson/lib

2.2.3. clone Tensorflow repository on target Jetson / Raspi.

(Jetson/Raspi)$ cd ~/work
(Jetson/Raspi)$ git clone -b r2.4 https://github.com/tensorflow/tensorflow.git
(Jetson/Raspi)$ cd tensorflow
(Jetson/Raspi)$ ./tensorflow/lite/tools/make/download_dependencies.sh

2.2.4. build an application.

(Jetson/Raspi)$ sudo apt install libgles2-mesa-dev libdrm-dev
(Jetson/Raspi)$ cd ~/work 
(Jetson/Raspi)$ git clone https://github.com/terryky/tflite_gles_app.git
(Jetson/Raspi)$ cd ~/work/tflite_gles_app/gl2handpose

# on Jetson
(Jetson)$ make -j4 TARGET_ENV=jetson_nano TFLITE_DELEGATE=GPU_DELEGATEV2

# on Raspberry pi without GPUDelegate (recommended)
(Raspi )$ make -j4 TARGET_ENV=raspi4

# on Raspberry pi with GPUDelegate (low performance)
(Raspi )$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=GPU_DELEGATEV2

# on Raspberry pi with XNNPACK
(Raspi )$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=XNNPACK

2.2.5. run an application.

(Jetson/Raspi)$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
(Jetson/Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Jetson/Raspi)$ ./gl2handpose

about VSYNC

On Jetson Nano, display sync to vblank (VSYNC) is enabled to avoid the tearing by default . To enable/disable VSYNC, run app with the following command.

# enable VSYNC (default).
(Jetson)$ export __GL_SYNC_TO_VBLANK=1; ./gl2handpose

# disable VSYNC. framerate improves, but tearing occurs.
(Jetson)$ export __GL_SYNC_TO_VBLANK=0; ./gl2handpose

2.3 Build for armv7l Linux (Raspberry Pi)

2.3.1. build TensorFlow Lite library on Host PC.

(HostPC)$ wget https://github.com/bazelbuild/bazel/releases/download/3.1.0/bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ chmod 755 bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$ sudo ./bazel-3.1.0-installer-linux-x86_64.sh
(HostPC)$
(HostPC)$ mkdir ~/work
(HostPC)$ cd ~/work 
(HostPC)$ git clone https://github.com/terryky/tflite_gles_app.git
(HostPC)$ ./tflite_gles_app/tools/scripts/tf2.3/build_libtflite_r2.3_armv7l.sh

(Tensorflow configure will start after a while. Please enter according to your environment)

2.3.2. copy Tensorflow Lite libraries to target Raspberry Pi.

(HostPC)scp ~/work/tensorflow_r2.3/bazel-bin/tensorflow/lite/libtensorflowlite.so pi@192.168.11.11:/home/pi/lib
(HostPC)scp ~/work/tensorflow_r2.3/bazel-bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so pi@192.168.11.11:/home/pi/lib

2.3.3. setup environment on Raspberry Pi.

(Raspi)$ sudo apt install libgles2-mesa-dev libegl1-mesa-dev xorg-dev
(Raspi)$ sudo apt update
(Raspi)$ sudo apt upgrade

2.3.4. clone Tensorflow repository on target Raspi.

(Raspi)$ cd ~/work
(Raspi)$ git clone -b r2.3 https://github.com/tensorflow/tensorflow.git
(Raspi)$ cd tensorflow
(Raspi)$ ./tensorflow/lite/tools/make/download_dependencies.sh

2.3.5. build an application on target Raspi..

(Raspi)$ cd ~/work 
(Raspi)$ git clone https://github.com/terryky/tflite_gles_app.git
(Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Raspi)$ make -j4 TARGET_ENV=raspi4  #disable GPUDelegate. (recommended)

#enable GPUDelegate. but it cause low performance on Raspi4.
(Raspi)$ make -j4 TARGET_ENV=raspi4 TFLITE_DELEGATE=GPU_DELEGATEV2

2.3.6. run an application on target Raspi..

(Raspi)$ export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
(Raspi)$ cd ~/work/tflite_gles_app/gl2handpose
(Raspi)$ ./gl2handpose

for more detail infomation, please refer this article.

3. About Input video stream

Both Live camera and video file are supported as input methods.

Live UVC Camera
Recorded Video file

3.1. Live UVC Camera (default)

UVC(USB Video Class) camera capture is supported.

Use v4l2-ctl command to configure the capture resolution.
- lower the resolution, higher the framerate.

(Target)$ sudo apt-get install v4l-utils

# confirm current resolution settings
(Target)$ v4l2-ctl --all

# query available resolutions
(Target)$ v4l2-ctl --list-formats-ext

# set capture resolution (160x120)
(Target)$ v4l2-ctl --set-fmt-video=width=160,height=120

# set capture resolution (640x480)
(Target)$ v4l2-ctl --set-fmt-video=width=640,height=480

currently, only YUYV pixelformat is supported.
- If you have error messages like below:

-------------------------------
 capture_devie  : /dev/video0
 capture_devtype: V4L2_CAP_VIDEO_CAPTURE
 capture_buftype: V4L2_BUF_TYPE_VIDEO_CAPTURE
 capture_memtype: V4L2_MEMORY_MMAP
 WH(640, 480), 4CC(MJPG), bpl(0), size(341333)
-------------------------------
ERR: camera_capture.c(87): pixformat(MJPG) is not supported.
ERR: camera_capture.c(87): pixformat(MJPG) is not supported.
...

please try to change your camera settings to use YUYV pixelformat like following command :

$ sudo apt-get install v4l-utils
$ v4l2-ctl --set-fmt-video=width=640,height=480,pixelformat=YUYV --set-parm=30

to disable camera
- If your camera doesn't support YUYV, please run the apps in camera_disabled_mode with argument -x

$ ./gl2handpose -x

3.2 Recorded Video file

FFmpeg (libav) video decode is supported.
If you want to use a recorded video file instead of a live camera, follow these steps:

# setup dependent libralies.
(Target)$ sudo apt install libavcodec-dev libavdevice-dev libavfilter-dev libavformat-dev libavresample-dev libavutil-dev

# build an app with ENABLE_VDEC options
(Target)$ cd ~/work/tflite_gles_app/gl2facemesh
(Target)$ make -j4 ENABLE_VDEC=true

# run an app with a video file name as an argument.
(Target)$ ./gl2facemesh -v assets/sample_video.mp4

4. Tested platforms

You can select the platform by editing Makefile.env.

Linux PC (X11)
NVIDIA Jetson Nano (X11)
NVIDIA Jetson TX2 (X11)
RaspberryPi4 (X11)
RaspberryPi3 (Dispmanx)
Coral EdgeTPU Devboard (Wayland)

5. Performance of inference [ms]

Blazeface

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	10	10
TensorFlow Lite	CPU int8	7	7
TensorFlow Lite GPU Delegate	GPU fp16	70	10
TensorRT	GPU fp16	--	?

Classification (mobilenet_v1_1.0_224)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	69	50
TensorFlow Lite	CPU int8	28	29
TensorFlow Lite GPU Delegate	GPU fp16	360	37
TensorRT	GPU fp16	--	19

Object Detection (ssd_mobilenet_v1_coco)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	150	113
TensorFlow Lite	CPU int8	62	64
TensorFlow Lite GPU Delegate	GPU fp16	980	90
TensorRT	GPU fp16	--	32

Facemesh

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	29	30
TensorFlow Lite	CPU int8	24	27
TensorFlow Lite GPU Delegate	GPU fp16	150	20
TensorRT	GPU fp16	--	?

Hair Segmentation

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	410	400
TensorFlow Lite	CPU int8	?	?
TensorFlow Lite GPU Delegate	GPU fp16	270	30
TensorRT	GPU fp16	--	?

3D Handpose

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	116	85
TensorFlow Lite	CPU int8	80	87
TensorFlow Lite GPU Delegate	GPU fp16	880	90
TensorRT	GPU fp16	--	?

3D Object Detection

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	470	302
TensorFlow Lite	CPU int8	248	249
TensorFlow Lite GPU Delegate	GPU fp16	1990	235
TensorRT	GPU fp16	--	108

Posenet (posenet_mobilenet_v1_100_257x257)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	92	70
TensorFlow Lite	CPU int8	53	55
TensorFlow Lite GPU Delegate	GPU fp16	803	80
TensorRT	GPU fp16	--	18

Semantic Segmentation (deeplabv3_257)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	108	80
TensorFlow Lite	CPU int8	?	?
TensorFlow Lite GPU Delegate	GPU fp16	790	90
TensorRT	GPU fp16	--	?

Selfie to Anime

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	?	7700
TensorFlow Lite	CPU int8	?	?
TensorFlow Lite GPU Delegate	GPU fp16	?	?
TensorRT	GPU fp16	--	?

Artistic Style Transfer

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	1830	950
TensorFlow Lite	CPU int8	?	?
TensorFlow Lite GPU Delegate	GPU fp16	2440	215
TensorRT	GPU fp16	--	?

Text Detection (east_text_detection_320x320)

Framework	Precision	Raspberry Pi 4 [ms]	Jetson nano [ms]
TensorFlow Lite	CPU fp32	1020	680
TensorFlow Lite	CPU int8	378	368
TensorFlow Lite GPU Delegate	GPU fp16	4665	388
TensorRT	GPU fp16	--	?

Name	Name	Last commit message	Last commit date
Latest commit History 499 Commits
.github/workflows	.github/workflows
common	common
gl2age_gender	gl2age_gender
gl2animegan2	gl2animegan2
gl2blazeface	gl2blazeface
gl2blazepose	gl2blazepose
gl2blazepose_fullbody	gl2blazepose_fullbody
gl2boundless	gl2boundless
gl2classification	gl2classification
gl2dbface	gl2dbface
gl2dense_depth	gl2dense_depth
gl2detection	gl2detection
gl2face_portrait	gl2face_portrait
gl2face_segmentation	gl2face_segmentation
gl2facemesh	gl2facemesh
gl2hair_segmentation	gl2hair_segmentation
gl2handpose	gl2handpose
gl2iris_landmark	gl2iris_landmark
gl2mirnet	gl2mirnet
gl2objectron	gl2objectron
gl2pose_estimation_3d	gl2pose_estimation_3d
gl2posenet	gl2posenet
gl2segmentation	gl2segmentation
gl2selfie2anime	gl2selfie2anime
gl2style_transfer	gl2style_transfer
gl2text_detection	gl2text_detection
list_egl_configs	list_egl_configs
misc	misc
third_party	third_party
tools	tools
trt_age_gender	trt_age_gender
trt_classification	trt_classification
trt_dbface	trt_dbface
trt_dense_depth	trt_dense_depth
trt_detection	trt_detection
trt_objectron	trt_objectron
trt_pose_estimation_3d	trt_pose_estimation_3d
trt_posenet	trt_posenet
.gitignore	.gitignore
LICENSE	LICENSE
Makefile	Makefile
Makefile.env	Makefile.env
Makefile.include	Makefile.include
README.md	README.md

Search code, repositories, users, issues, pull requests...

License

terryky/tflite_gles_app

Folders and files

Latest commit

History

Repository files navigation

GPU accelerated TensorFlow Lite / TensorRT applications.

1. Applications

2. How to Build & Run

2.1.1. setup environment

2.1.2. build TensorFlow Lite library.

2.1.3. build an application.

2.1.4. run an application.

2.2.1. build TensorFlow Lite library on Host PC.

2.2.2. copy Tensorflow Lite libraries to target Jetson / Raspi.

2.2.3. clone Tensorflow repository on target Jetson / Raspi.

2.2.4. build an application.

2.2.5. run an application.

about VSYNC

2.3.1. build TensorFlow Lite library on Host PC.

2.3.2. copy Tensorflow Lite libraries to target Raspberry Pi.

2.3.3. setup environment on Raspberry Pi.

2.3.4. clone Tensorflow repository on target Raspi.

2.3.5. build an application on target Raspi..

2.3.6. run an application on target Raspi..

3. About Input video stream

4. Tested platforms

5. Performance of inference [ms]

Blazeface

Classification (mobilenet_v1_1.0_224)

Object Detection (ssd_mobilenet_v1_coco)

Facemesh

Hair Segmentation

3D Handpose

3D Object Detection

Posenet (posenet_mobilenet_v1_100_257x257)

Semantic Segmentation (deeplabv3_257)

Selfie to Anime

Artistic Style Transfer

Text Detection (east_text_detection_320x320)

6. Related Articles

7. Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Uh oh!

Languages