TensorFlow C++ Guide (Extensive Edition)

Everything you need to build, link, deploy & optimize TensorFlow models in modern C++

1 ▪ Why run TensorFlow from C++?

The C++ API exposes TensorFlow’s execution engine without the Python interpreter overhead, ideal for real-time systems, embedded devices, game engines, high-frequency trading, and any latency-critical pipeline. Production stacks often train in Python and export a SavedModel; inference then lives in a C++ micro-service, desktop app, or firmware image.

2 ▪ TensorFlow variants you might link against

  1. libtensorflow_cc — full runtime for desktop/server. Since TF 2.19 Google no longer publishes ready-made binaries; extract the .so/.dll from the Python wheel or build from source.
  2. TensorFlow Lite (TFLite) — stripped-down for mobile/IoT; interpreters as small as 500 kB.
  3. TensorFlow Serving — gRPC/REST micro-service with hot model-reload and batching. C++ core, configurable with custom sources and servables.

3 ▪ Building TensorFlow C++ from source

3.1 Prerequisites

  1. Bazel ≥ 7.0 (match TF’s .bazelversion)
  2. C++17-capable compiler (GCC 12 / Clang 16 / MSVC 19.36)
  3. Python 3.X — required by build scripts only
  4. Optional CUDA 12 & cuDNN 9 for GPU builds

3.2 Linux / macOS one-liner

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
./configure          // answer CUDA/XLA prompts
bazel build //tensorflow:libtensorflow_cc.so

After success, copy bazel-bin/tensorflow/*.so* (or *.dylib) and the bazel-bin/tensorflow/include headers into your tool-chain.

3.3 Windows notes

Install Bazel, add to %PATH%, and run ./configure.py. MSVC ≥ 2022 is required. See TensorFlow’s tested Windows configs for exact Bazel versions.

3.4 GPU flags

./configure                  // set: Build with CUDA support? Y
bazel build --config=cuda //tensorflow:libtensorflow_cc.so

From TF 2.18 onward Bazel downloads matching CUDA/cuDNN/NCCL if paths aren’t found.

4 ▪ Linking TensorFlow into a C++ application

4.1 Plain g++/clang

g++ -std=c++17 app.cpp \
  -I/path/to/tensorflow/include \
  -L/path/to/tensorflow/lib \
  -ltensorflow_cc -ltensorflow_framework \
  -pthread -O3

4.2 CMake example

find_package(TensorflowCC REQUIRED)  // via custom .cmake or pkg-config
add_executable(demo main.cpp)
target_link_libraries(demo TensorflowCC::TensorflowCC)

4.3 Using Bazel-external

Add the following to WORKSPACE:

local_repository(
    name = "org_tensorflow",
    path = "/absolute/path/to/tensorflow",
)

then depend on //tensorflow:tensorflow_cc in BUILD targets.

5 ▪ Loading & executing a SavedModel

#include <tensorflow/cc/saved_model/loader.h>
#include <tensorflow/core/framework/tensor.h>

int main() {
  tensorflow::SavedModelBundle bundle;
  TF_CHECK_OK(tensorflow::LoadSavedModel(
      {/*SessionOptions*/}, {/*RunOptions*/},
      "models/resnet50", {"serve"}, &bundle));

  tensorflow::Tensor input(tensorflow::DT_FLOAT,
                           {1, 224, 224, 3});
  auto *data = input.flat<float>().data();
  /* fill data … */

  std::vector<std::pair<std::string, tensorflow::Tensor>> feeds = {
    {"serving_default_input_1:0", input}};
  std::vector<tensorflow::Tensor> fetches;

  TF_CHECK_OK(bundle.session->Run(
      feeds, {"StatefulPartitionedCall:0"}, {}, &fetches));

  std::cout << fetches[0].DebugString() << '\n';
}

Inspect tensor names with saved_model_cli show --dir model --all.

5.1 GraphDef workflow

Older pipelines export a .pb graph; load with ReadBinaryProto & NewSession.

5.2 Parallel & pinned-thread inference

tensorflow::SessionOptions opts;
opts.config.mutable_inter_op_parallelism_threads()->set_value(1);
opts.config.mutable_intra_op_parallelism_threads()->set_value(4);

Adjust for NUMA or real-time scheduling.

6 ▪ TensorFlow Lite C++ inference

Build the TFLite static libs with bazel build -c opt //tensorflow/lite:minimal. The interpreter is ~700 kB (stripped). Use the tflite::InterpreterBuilder API, enable NNAPI/Metal delegates, and for MCUs consider tflite-micro.

7 ▪ Advanced techniques

7.1 Custom ops / kernels

Implement OpKernel, register with REGISTER_KERNEL_BUILDER, then compile a shared object and load with LoadLibrary.

7.2 XLA & ahead-of-time (AOT) compilation

Add --config=xla when building TF to enable JIT fusion; for AOT, export .pb then run tfcompile to emit a static DLL with no interpreter overhead.

7.3 Profiling & debugging

  1. TF Profiler (Chrome Trace view)
  2. Compile with -DTF_CPP_MIN_LOG_LEVEL=0 for verbose logs
  3. Use LD_PRELOAD=libjemalloc.so to inspect allocation hotspots

7.4 Multi-GPU & device placement

tensorflow::ConfigProto* cfg = &opts.config;
cfg->mutable_gpu_options()->set_allow_growth(true);
cfg->set_log_device_placement(true);

Pin ops to /device:GPU:1 by adding a tf.device attribute when exporting.

8 ▪ TensorFlow Serving in C++

Build with bazel build //tensorflow_serving/model_servers:tensorflow_model_server, start a server:

tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=eyes \
  --model_base_path=/srv/models/eyes
Integrate a C++ gRPC client via the generated prediction_service.grpc.pb.h.

Serving’s internal APIs (e.g., Source, Loader) allow custom batching or dynamic preprocess graphs.

9 ▪ Troubleshooting & FAQ

  1. Undefined symbol _errors — mismatch between -D_GLIBCXX_USE_CXX11_ABI; compile both TF and your app with the same value.
  2. Segfault on first Run — forgot to keep Tensor objects alive; they are ref-counted and must out-live the session call.
  3. Bazel mutable link errors — clean cache: bazel clean --expunge and rebuild.
  4. GPU kernel not found — model exported with newer ops than runtime; rebuild TF or re-export model with --tag_set serve.

10 ▪ Quick reference cheatsheet

// Inspect SavedModel
saved_model_cli show --dir model --all

// Extract C++ libs from Python wheel (Linux)
python -m pip download tensorflow==2.19.0
unzip tensorflow-2.19.0-*.whl 'tensorflow/**/libtensorflow_framework*.so*'

// Build minimal static TF Lite
bazel build -c opt //tensorflow/lite:minimal

// Run profiler (global)
TF_CPP_MIN_VLOG_LEVEL=1  ./my_app