ONNXRunTime

ONNXRunTime provides inofficial julia bindings for onnxruntime. It exposes both a low level interface, that mirrors the official C-API, as well as an high level interface.

Contributions are welcome.

Usage

The high level API works as follows:


julia> import ONNXRunTime as ORT

julia> path = ORT.testdatapath("increment2x3.onnx"); # path to a toy model

julia> model = ORT.load_inference(path);

julia> input = Dict("input" => randn(Float32,2,3))
Dict{String, Matrix{Float32}} with 1 entry:
  "input" => [1.68127 1.18192 -0.474021; -1.13518 1.02199 2.75168]

julia> model(input)
Dict{String, Matrix{Float32}} with 1 entry:
  "output" => [2.68127 2.18192 0.525979; -0.135185 2.02199 3.75168]

For GPU usage the CUDA and cuDNN packages are required and the CUDA runtime needs to be set to 12.0 or a later 12.x version. To set this up, do

pkg> add CUDA cuDNN

julia> import CUDA

julia> CUDA.set_runtime_version!(v"12.0")

Then GPU inference is simply

julia> import CUDA, cuDNN

julia> ORT.load_inference(path, execution_provider=:cuda)

CUDA provider options can be specified

julia> ORT.load_inference(path, execution_provider=:cuda,
                          provider_options=(;cudnn_conv_algo_search=:HEURISTIC))

Memory allocated by a model is eventually automatically released after it goes out of scope, when the model object is deleted by the garbage collector. It can also be immediately released with release(model).

The low level API mirrors the offical C-API. The above example looks like this:

using ONNXRunTime.CAPI
using ONNXRunTime: testdatapath

api = GetApi();
env = CreateEnv(api, name="myenv");
so = CreateSessionOptions(api);
path = testdatapath("increment2x3.onnx");
session = CreateSession(api, env, path, so);
mem = CreateCpuMemoryInfo(api);
input_array = randn(Float32, 2,3)
input_tensor = CreateTensorWithDataAsOrtValue(api, mem, vec(input_array), size(input_array));
run_options = CreateRunOptions(api);
input_names = ["input"];
output_names = ["output"];
inputs = [input_tensor];
outputs = Run(api, session, run_options, input_names, inputs, output_names);
output_tensor = only(outputs);
output_array = GetTensorMutableData(api, output_tensor);

Alternatives

Use the onnxruntime python bindings via PyCall.jl.
ONNX.jl
ONNXNaiveNASflux.jl

Complements

ONNXLowLevel.jl cannot run inference but can be used to investigate, create, or manipulate ONNX files.

Breaking Changes in version 0.4.

Support for CUDA.jl is changed from version 3 to versions 4 and 5.
Support for Julia versions less than 1.9 is dropped. The reason for this is to switch the conditional support of GPUs from being based on the Requires package to being a package extension. As a consequence the ONNXRunTime GPU support can now be precompiled and the CUDA.jl versions can be properly controlled via Compat.

Setting the CUDA Runtime Version in Tests

For GPU tests using ONNXRunTime, naturally the tests must depend on and import CUDA and cuDNN. Additionally a supported CUDA runtime version needs to be used, which can be somewhat tricky to set up for the tests.

First some background. What CUDA.set_runtime_version!(v"12.0") effectively does is to

Add a LocalPreferences.toml file containing

[CUDA_Runtime_jll]
version = "12.0"

In Project.toml, add

[extras]
CUDA_Runtime_jll = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"

If your test environment is defined by a test target in the top Project.toml you need to

Add a LocalPreferences.toml in your top directory with the same

contents as above.

Add CUDA_Runtime_jll to the extras section of Project.toml.
Add CUDA_Runtime_jll to the test target of Project.toml.

If your test environment is defined by a Project.toml in the test directory, you instead need to

Add a test/LocalPreferences.toml file with the same contents as

above.

Add CUDA_Runtime_jll to the extras section of test/Project.toml.