Set up a reliable Python environment for TensorFlow 2.19
(a) Python ≥ 3.9 (b) pip ≥ 23 (c) CUDA 12.x + cuDNN 9 (optional for GPU)
# Terminal
python -m pip install --upgrade pip
python -m pip install tensorflow==2.19.0
Note ▸ TensorFlow 2.19 is the first release compiled against NumPy 2.0 while still supporting NumPy 1.26 for backward compatibility.
# Windows / Linux
python -m pip install tensorflow==2.19.0 --extra-index-url https://pypi.nvidia.com
Ensure nvidia-smi shows the driver ≥ 550 and nvcc --version matches CUDA 12.x.
import tensorflow as tf
print(tf.__version__) # 2.19.0
print(tf.config.list_physical_devices("GPU"))
Understand tensors, operations, and eager execution
import tensorflow as tf
a = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)
b = tf.random.normal(shape=(2, 2))
c = tf.add(a, b) # add
d = tf.matmul(a, b) # matmul
All ops execute immediately thanks to eager mode (default since TF 2.0). Wrap performance-critical code in @tf.function to stage a graph.
x = tf.Variable(3.0)
with tf.GradientTape() as tape: # GradientTape
y = x**2 + 2*x + 1
dy_dx = tape.gradient(y, x) # 6.0
Build, train, evaluate, and save neural networks
from tensorflow.keras import layers, models
model = models.Sequential([
layers.Input((28, 28, 1)),
layers.Conv2D(32, 3, activation="relu"),
layers.MaxPooling2D(),
layers.Flatten(),
layers.Dense(10, activation="softmax")
])
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=["accuracy"])
model.fit(train_ds, epochs=5,
validation_data=val_ds,
callbacks=[tf.keras.callbacks.TensorBoard("logs")])
model.save("cnn.keras") # Native Keras format
restored = models.load_model("cnn.keras")
The new .keras file (v2.12+) is now the recommended portable format.
Stream and transform large datasets efficiently
files = tf.data.Dataset.list_files("images/*.jpg")
def decode(path):
img = tf.io.read_file(path)
img = tf.image.decode_jpeg(img, channels=3)
return tf.image.resize(img, (224, 224))
ds = files.map(decode, num_parallel_calls=tf.data.AUTOTUNE) \
.batch(32).prefetch(tf.data.AUTOTUNE)
Cache ↦ Shuffle ↦ Batch ↦ Prefetch; use AUTOTUNE and TFRecord for maximum throughput.
Full control via GradientTape & optimizers
optimizer = tf.keras.optimizers.Adam(1e-3)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
for epoch in range(10):
for x, y in train_ds:
with tf.GradientTape() as t:
pred = model(x, training=True)
loss = loss_fn(y, pred)
grads = t.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
Scale workloads across GPUs, TPUs, or multiple hosts
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = build_model() # your builder
model.compile(...)
resolver = tf.distribute.cluster_resolver.TPUClusterResolver("grpc://...")
tf.config.experimental_connect_to_cluster(resolver)
tf.tpu.experimental.initialize_tpu_system(resolver)
strategy = tf.distribute.TPUStrategy(resolver)
Use for heterogeneous clusters where compute and memory resources differ; supports elastic training on-prem or in Kubernetes.
Serve models on servers, browsers, and mobile devices
Export a SavedModel and run the TF-Serving Docker image; REST + gRPC endpoints auto-generated.
converter = tf.lite.TFLiteConverter.from_saved_model("cnn.keras")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()
open("model.tflite","wb").write(tflite_model)
Convert with tensorflowjs_converter and run directly in WebGL/WebGPU-accelerated browsers.
Profile, debug, and optimise TensorFlow workflows
Capture traces with tf.profiler.experimental.start(); visualise kernels, memory, and device utilisation.
Enable tf.debugging.set_log_device_placement(True) to trace op placement; use check_numerics to catch NaNs.
Continue mastering TensorFlow beyond this guide
https://www.tensorflow.org/ – tutorials, API reference, and model garden.
1. Vision transformer fine-tuning 2. Multimodal text-image retrieval 3. On-device speech keyword spotting 4. GNNs for social-graph recommendations 5. Diffusion models for image generation