Olof Astrand
10 min readSep 5, 2024

TI Deep Learning library and building TI Apache TVM for the Beagley-AI

TIDL

Here I try to build apache tvm for the BeagleY AI myself. This article contains mostly of some things that might be fixed in the future and when that is the case, I might delete the article when it becomes obsolete.

The most important ingredience is this, abbreviated TISDKR. ti-processor-sdk-rtos. However it is missing the PDK, Not sure if this is the latest, https://software-dl.ti.com/jacinto7/esd/processor-sdk-rtos-jacinto7/07_03_00_07/exports/docs/pdk_jacinto_07_03_00_29/docs/userguide/jacinto/index_jacinto.html

Here is the edge AI docs, https://software-dl.ti.com/jacinto7/esd/processor-sdk-linux-am67a/10_00_00/exports/edgeai-docs/common/sdk_overview.html

Download this

https://dr-download.ti.com/software-development/software-development-kit-sdk/MD-bA0wfI4X2g/10.00.00.05/ti-processor-sdk-rtos-j721e-evm-10_00_00_05.tar.gz

We also need the Edge AI tidl tools

https://github.com/TexasInstruments/edgeai-tidl-tools.git

git checkout 10_00_04_00

Of course we also need TVM

Do not forget,

 git submodule update --init --recursive 

I will try using this branch, but in the future there might be a better branch or tag. (i.e. tidl-j7)

Look for this,

src/runtime/contrib/tidl/c7x/7524_j722s/build/

Make sure that you have, ls ${TVM_HOME}/src/runtime/contrib/tidl/c7x/7524_j722s in your checked out tag/branch of apache tvm

This is the linker file for the c7x , /home/olof/tvm/python/tvm/src/runtime/contrib/tidl/c7x/c7x_tvm.cmd

The BeagleY-AI uses c7x/7524 so this, or some equivalent functions must exist in the tvm branch.

Another key component is the tvm_runtime_static.dir/src/runtime/rpc/rpc_endpoint.cc Which includes the itidl_ti.h file.

This is what it might look like when it is not setup correctrly, (export PSDKR_PATH=${HOME}/ti/ti-processor-sdk-rtos-j722s-evm-10_00_00_05/)

make[1]: *** [CMakeFiles/Makefile2:731: src/runtime/contrib/tidl/c7x/CMakeFiles/tidl_c7x_libs.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs….
[ 24%] Building CXX object CMakeFiles/tvm_runtime_static.dir/src/runtime/rpc/rpc_endpoint.cc.o
“tidl_api.c”, line 26: fatal error: cannot open source file “itidl_ti.h”
1 catastrophic error detected in the compilation of “tidl_api.c”.

From TI:s Video library

Steps to reproduce

Download:
J721E PSDKR v10
J722S PSDKR v10
clang+llvm-10.0.0-x86_64-linux-gnu-ubuntu-18.04
ti-cgt-c7000_4.1.0.LTS

Prepare:
extract and copy pdk_jacinto_10_00_00_27 from J721E PSDKR to J722S PSDKR
extract llvm
extract ti-cgt-c7000 and copy to J722SDR

git clone https://github.com/TexasInstruments/tvm.git
checkout the tvm branch TIDL_PSDK_10.0.2

Check the latest changes that might be fixed after this article (Sept 2024)
https://github.com/TexasInstruments/tvm/commits/tidl-j7/

Patch:
ti_build.sh row 50,
/tidl_j7* is now in c7x-mma-tidl
It should look like this,
cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=YES -DUSE_MICRO=ON -DUSE_SORT=ON -DUSE_TIDL=ON -DUSE_LLVM="${LLVM_CONFIG}" -DHIDE_PRIVATE_SYMBOLS=ON -DUSE_TIDL_RT_PATH=$(ls -d ${PSDKR_PATH}/c7x-mma-tidl/arm-tidl/rt) -DUSE_TIDL_PSDKR_PATH=${PSDKR_PATH} -DUSE_CGT7X_ROOT=${TVM_DEPS_PATH}/ti-cgt-c7000_4.1.0.LTS ..
make -j6


tvm/CMakeLists.txt Row 595
#add_dependencies(tvm tidl_c7x_libs_j784s4)
add_dependencies(tvm tidl_c7x_libs_j722s)
#add_dependencies(tvm tidl_c7x_libs_mv7504)

cp -r ti-processor-sdk-rtos-j721e-evm-10_00_00_05/ppdk_jacinto_10_00_00_27 ti-processor-sdk-rtos-j722s-evm-10_00_00_05/pdk


sudo ln -s /home/olof/ti/edgeai-tidl-tools/tidl_tools/osrt_deps/tflite_2.12_x86_u22/tensorflow/tensorflow/ /usr/include/tensorflow
cd ~/ti/ti-processor-sdk-rtos-j722s-evm-10_00_00_05/c7x-mma-tidl

cd ~/ti/ti-processor-sdk-rtos-j722s-evm-10_00_00_05/
./sdk_builder/scripts/setup_psdk_rtos.sh --install_tidl_deps


export PSDKR_PATH=${HOME}/ti/ti-processor-sdk-rtos-j722s-evm-10_00_00_05/
export TIDL_TOOLS_PATH=${HOME}/ti/edgeai-tidl-tools/tidl_tools/
export TVM_DEPS_PATH=${HOME}/ti
export WORKSPACE=${HOME}/tvm
export EVM_IP=192.168.1.63
./ti_build.sh



# If you get compile errors like, tidl_onnxRtImport_core.h:68:10: fatal error: core/providers/tidl/tidl_execution_provider_common.h: No such file or directory
export TIDL_BUILD_ONNX_IMPORT_LIB=0
make


export TIDL_TOOLS_PATH=/home/olof/ti/ti-processor-sdk-rtos-j722s-evm-10_00_00_05/c7x-mma-tidl/tidl_tools
export USE_CGT7X_ROOT=/home/olof/ti/ti-cgt-c7000_4.1.0.LTS/

export TIDL_TOOLS_PATH=/home/olof/ti/edgeai-tidl-tools/tidl_tools/
export ARM64_GCC_PATH=/home/olof/ti/arm-gnu-toolchain-13.2.Rel1-x86_64-aarch64-none-linux-gnu/
export CGT7X_ROOT=/home/olof/ti/ti-cgt-c7000_4.1.0.LTS/
export PYTHONPATH=/home/olof/tvm/python/:$PYTHONPATH

Running ./sdk_builder/scripts/setup_psdk_rtos.sh

export SOC=j722s

# Short version of log ..
[psdk linux tisdk-adas-image-j722s-evm.tar.xz] Installing files ...
[psdk linux tisdk-adas-image-j722s-evm.tar.xz] Done
[psdk linux boot-adas-j722s-evm.tar.gz] Checking ...
[psdk linux boot-adas-j722s-evm.tar.gz] Installing files ...
[psdk linux boot-adas-j722s-evm.tar.gz] Done
export BIOS_VERSION=
export XDC_VERSION=
export GCC_ARCH64_VERSION=9.2-2019.12
export CGT_C6X_VERSION=
export CGT_C7X_VERSION=4.1.0.LTS
export CGT_ARM_VERSION=
export CGT_ARMLLVM_VERSION=3.2.2.LTS
export NDK_VERSION=
export NS_VERSION=
export SYSCONFIG_VERSION=1.21.0
export SYSCONFIG_BUILD=3587
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/opencv/opencv/zip/refs/tags/4.1.0 [following]
--2024-09-02 10:50:21-- https://codeload.github.com/opencv/opencv/zip/refs/tags/4.1.0
Resolving codeload.github.com (codeload.github.com)... 140.82.121.9
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘4.1.0.zip’

[Bulding opencv-4.1.0 libs for TIDL PC emualtion mode]
From https://github.com/dmlc/dmlc-core
...

cd sdk_builder/

cd ~/ti/ti-processor-sdk-rtos-j722s-evm-10_00_00_05/c7x-mma-tidl/ti_dl/test

export SOC=j722s
make TARGET_PLATFORM=PC

running make sdk

Read more here, https://software-dl.ti.com/codegen/docs/tvm/tvm_tidl_users_guide/getting-started/index.html

Compiling your model,

export TIDL_TOOLS_PATH=/home/olof/ti/edgeai-tidl-tools/tidl_tools/
export ARM64_GCC_PATH=/home/olof/ti/arm-gnu-toolchain-13.2.Rel1-x86_64-aarch64-none-linux-gnu/
export CGT7X_ROOT=/home/olof/ti/ti-cgt-c7000_4.1.0.LTS/
export PYTHONPATH=/home/olof/tvm/python/:$PYTHONPATH


mkdir object-recognition
wget https://storage.googleapis.com/download.tensorflow.org/models/tflite/coco_ssd_mobilenet_v1_1.0_quant_2018_06_29.zip

cp ${HOME}/tvm/tests/python/relay/ti_tests/*.py .
export TVM_HOME=${HOME}/tvm/

python compile_model.py mv2_quant_tfl --target --platform J7 --c7x
usage: compile_model.py [-h] [--platform PLATFORM] [--target] [--host] [--tidl] [--notidl] [--c7x] [--noc7x] [--batch_size BATCH_SIZE] [model_name]

positional arguments:
model_name

options:
-h, --help show this help message and exit
--platform PLATFORM Compile model for which platform (J7, J721S2)
--target Compile for target
--host Compile for host (emulation)
--tidl Enable TIDL offload
--notidl Disable TIDL offload
--c7x Enable C7x code generation
--noc7x Disable C7x code generation
--batch_size BATCH_SIZE
Overwrite default batch size in the model, 0 means no overwrite
 python compile_model.py  mv2_quant_tfl  --target   --platform J7  --c7x

Output from running,

python compile_model.py  mv2_quant_tfl  --target   --platform J7  --c7x
TFLite model mv2_quant_tfl imported to Relay IR.
TIDL tools path is set. /home/olof/ti/edgeai-tidl-tools/tidl_tools/PC_dsp_test_dl_algo.out <CDLL '/home/olof/ti/edgeai-tidl-tools/tidl_tools/tidl_model_import_relay.so', handle 560d6b0e3d40 at 0x7f44af914670>
Generating subgraph boundary tensors for calibration...
Building graph on host for tensor data collection...
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
depthwise_conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
Running graph on host for tensor data collection...
Importing subgraph into TIDL...
Empty prototxt path, running calibration

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/olof/ti/model/TFLite_model/artifacts/mv2_quant_tfl_J7_target_tidl_c7x/tempDir/tidl_import_subgraph0.txt.qunat_stats_config.txt
Freeing memory for user provided Net
----------------------- TIDL Process with REF_ONLY FLOW ------------------------

# 0 . .. T 533.07 .... ..... ... .... .....


***************** Calibration iteration number 0 started ************************



Empty prototxt path, running calibration

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/olof/ti/model/TFLite_model/artifacts/mv2_quant_tfl_J7_target_tidl_c7x/tempDir/tidl_import_subgraph0.txt.qunat_stats_config.txt
Freeing memory for user provided Net
----------------------- TIDL Process with REF_ONLY FLOW ------------------------

# 0 . .. T 254.74 .... ..... ... .... .....


***************** Calibration iteration number 0 completed ************************






***************** Calibration iteration number 1 started ************************



Empty prototxt path, running calibration

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/olof/ti/model/TFLite_model/artifacts/mv2_quant_tfl_J7_target_tidl_c7x/tempDir/tidl_import_subgraph0.txt.qunat_stats_config.txt
Freeing memory for user provided Net
----------------------- TIDL Process with REF_ONLY FLOW ------------------------

# 0 . .. T 349.93 .... ..... ... .... .....


***************** Calibration iteration number 1 completed ************************






***************** Calibration iteration number 2 started ************************



Empty prototxt path, running calibration

~~~~~Running TIDL in PC emulation mode to collect Activations range for each layer~~~~~

Processing config file #0 : /home/olof/ti/model/TFLite_model/artifacts/mv2_quant_tfl_J7_target_tidl_c7x/tempDir/tidl_import_subgraph0.txt.qunat_stats_config.txt
Freeing memory for user provided Net
----------------------- TIDL Process with REF_ONLY FLOW ------------------------

# 0 . .. T 355.45 .... ..... ... .... .....


***************** Calibration iteration number 2 completed ************************



Empty prototxt path, running calibration

------------------ Network Compiler Traces -----------------------------
NC running for device: 4
Running with OTF buffer optimizations
successful Memory allocation
successful Workload Creation
****************************************************
** ALL MODEL CHECK PASSED **
****************************************************

TIDL import of 1 Relay IR subgraphs succeeded.
TIDL artifacts are stored at artifacts/mv2_quant_tfl_J7_target_tidl_c7x
Building C7x tvm deployable module: generating c files...
Building C7x tvm deployable module: building... (log in c7x_deploy_mod.log)
make SILICON_VERSION=7100 TVM_ROOT=/home/olof/tvm/python/tvm TVM_C7X_ROOT=/home/olof/tvm/python/tvm/src/runtime/contrib/tidl/c7x QUIET= -C /home/olof/ti/model/TFLite_model/artifacts/mv2_quant_tfl_J7_target_tidl_c7x/tempDir -f /home/olof/tvm/python/tvm/src/runtime/contrib/tidl/c7x/Makefile.c7x_mod -j$(nproc)
Creating Arm wrapper tvm module...
Artifacts can be found at artifacts/mv2_quant_tfl_J7_target_tidl_c7x
compile_model succeeded: mv2_quant_tfl J7 target tidl c7x

As we can see, the SILICON version is wrong . SILICON_VERSION=7100, it should probably be 7524_j722s,

This is what my tid_tools looked like after a lot of compiling.

ls -la /home/olof/ti/ti-processor-sdk-rtos-j722s-evm-10_00_00_05/c7x-mma-tidl/tidl_tools

Some libraries still missing

Compare this with the precompiled ones from edge-ai-tidl-tools

We try to build the missing libraries for emulation,
cd ~/ti/ti-processor-sdk-rtos-j722s-evm-10_00_00_05/c7x-mma-tidl/
export TARGET_PLATFORM=PC
make gv

Test on host

python infer_model.py mv2_quant_tfl
Running inference with deployable module in artifacts/mv2_quant_tfl_J7_host_tidl_noc7x …
2024–09–06 20:44:27,632 INFO Could not find libdlr.so in model artifact. Using dlr from /home/olof/.local/lib/python3.10/site-packages/dlr/libdlr.so
artifacts/mv2_quant_tfl_J7_host_tidl_noc7x: inference execution finished
0 Inference results (top5):
[896, 404, 745, 658, 405]
[206, 195, 171, 161, 147]
Pass
infer_model succeeded: mv2_quant_tfl J7 host dlr tidl noc7x

scp - *.py olof@192.168.1.xx:/home/olof/work/infer/

Also compile witout tidl or c7x support

python compile_model.py mv2_quant_tfl — target — notidl — platform J7 — noc7x

Test on target

python infer_model.py  --platform J7  --notidl mv2_quant_tfl
Running inference with deployable module in artifacts/mv2_quant_tfl_J7_target_notidl_noc7x ...
2024-09-06 19:21:42,424 INFO Could not find libdlr.so in model artifact. Using dlr from /home/olof/work/infer/env/lib/python3.11/site-packages/dlr/libdlr.so
artifacts/mv2_quant_tfl_J7_target_notidl_noc7x: inference execution finished
0 Inference results (top5):
[896, 404, 745, 658, 405]
[205, 191, 171, 160, 146]
Pass
infer_model succeeded: mv2_quant_tfl J7 target dlr notidl noc7x

But this is not accelerated.

usage: infer_model.py [-h] [--platform PLATFORM] [--dlr] [--tvm] [--tidl] [--notidl] [--c7x] [--noc7x] [--batch_size BATCH_SIZE]
[model_name]

positional arguments:
model_name

options:
-h, --help show this help message and exit
--platform PLATFORM Compile model for which platform (J7, J721S2)
--dlr Use DLR runtime for inference
--tvm Use TVM runtime for inference
--tidl Enable TIDL offload
--notidl Disable TIDL offload
--c7x Enable C7x code generation
--noc7x Disable C7x code generation
--batch_size BATCH_SIZE
Overwrite default batch size in the model, 0 means no overwrite
python infer_model.py  --platform J7 mv2_quant_tfl
Running inference with deployable module in artifacts/mv2_quant_tfl_J7_target_tidl_noc7x ...
2024-09-06 18:57:45,331 INFO Could not find libdlr.so in model artifact. Using dlr from /home/olof/work/infer/env/lib/python3.11/site-packages/dlr/libdlr.so
[18:57:45] /home/olof/work/infer/neo-ai-dlr/src/dlr.cc:343: Error: [18:57:45] /home/olof/work/infer/neo-ai-dlr/3rdparty/tvm/src/runtime/library_module.cc:123: Binary was created using {tidl} but a loader of that name is not registered. Available loaders are const_loader, metadata, VMExecutable. Perhaps you need to recompile with this runtime enabled.
Stack trace:
[bt] (0) /home/olof/work/infer/env/lib/python3.11/site-packages/dlr/libdlr.so(+0x110140) [0xffff87ce0140]
[bt] (1) /home/olof/work/infer/env/lib/python3.11/site-packages/dlr/libdlr.so(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x74) [0xffff87c050bc]

More variables

These need to be set before compiling the model.

export ARM64_GCC_PATH="${HOME}/ti/arm-gnu-toolchain-13.2.Rel1-x86_64-aarch64-none-linux-gnu/"
export TIARMCGT_LLVM_ROOT=${HOME}/ti/ti-cgt-armllvm_3.2.2.LTS
export TOOLCHAIN_PATH_GCC_ARCH64="${HOME}/ti/arm-gnu-toolchain-13.2.Rel1-x86_64-aarch64-none-linux-gnu/"
export TIDL_TOOLS_PATH=/home/olof/ti/edgeai-tidl-tools/tidl_tools/
export PYTHONPATH=/home/olof/tvm/python/:$PYTHONPATH
python compile_model.py mv2_quant_tfl --host

Here is some information if you want to do c++ inference.

Here are some late changes to make it usable with J7, https://github.com/TexasInstruments/tvm/commit/a69e5e48b79cc549fe341a8f64bc9f34cbdd4eb0

To run inference on target,

scp -r ti-processor-sdk-rtos-j722s-evm-10_00_00_05/targetfs/usr/lib/python3.12/site-packages/dlr/* olof@192.168.1.xx:/home/olof/work/infer/env/lib/python3.11/site-packages/dlr

git clone  https://github.com/TexasInstruments/neo-ai-dlr
cd neo-ai-dlr
git submodule update --init --recursive
mkdir build
cd build
cmake ..
make -j 8

cp

Interesting question,

https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1331589/tda4vh-q1-unable-to-run-the-compiled-model-with-c7x_codegen-set-to-1-correctly-in-tda4vh/5077472#5077472

As you can see, this article is work in progress and will be updated.

Olof Astrand
Olof Astrand

No responses yet