1 · Fundamentals of Digital Audio

Digital audio represents continuous sound‑pressure variations as a discrete sequence of samples that can be processed, stored, copied, and transmitted by computers and micro‑processors. Two critical parameters fully describe linear PCM (Pulse‑Code Modulation):

// 48 kHz, 24‑bit signed‑integer little‑endian stereo (₂ channels)
struct Sample24 {
	int32_t L : 24;
	int32_t R : 24;
};

Note ▸ Internally the struct above is padded to 32 bits per field on most CPUs; external media files use tightly packed 24‑bit samples.

2 · Signal Path: ADC → DSP → DAC

2.1 Analog‑to‑Digital Conversion (ADC)

Microphone voltage is amplified, anti‑aliased by a low‑pass filter, and then sampled by an ADC. The ADC’s time‑base must be phase‑locked to the system word‑clock.

2.2 Digital Signal Processing (DSP)

Typical real‑time processing stages:

  1. High‑pass / Low‑pass filtering.
  2. Dynamics processing (compression, limiting).
  3. Time‑domain effects (delay, reverb).
  4. Resampling for format conversion.

2.3 Digital‑to‑Analog Conversion (DAC)

The processed bit‑stream is clocked into a DAC, reconstructed by a smoothing filter, and sent to line‑level outputs or power‑amps.

3 · Core Concepts & Maths

3.1 Nyquist–Shannon Theorem

A band‑limited signal (max frequency fmax) is perfectly reconstructible when sampleRate >= 2 × fmax. This critical frequency is the Nyquist frequency.

3.2 Aliasing

If spectral content above Nyquist is not removed before sampling, it folds back as false low‑frequency components (aliases). Proper anti‑alias filtering is therefore mandatory.

3.3 Quantisation Error & Dither

Mapping a continuous amplitude to discrete levels introduces quantisation noise. Adding very low‑level noise dither randomises the error, preventing correlation with the signal and improving subjective linearity at low volumes.

4 · PCM Formats & Audio Codecs

4.1 Linear PCM Containers

ContainerTypical ExtensionFeatures
WAVE.wavRIFF‑based; little‑endian; ubiquitous on Windows
AIFF.aif/.aiffBig‑endian; popular on macOS & Pro DAWs
CAF.cafLarge‑file support; metadata‑rich; Apple CoreAudio
RF64.wav>4 GB WAV via 64‑bit size chunks (EBU tech)

4.2 Compressed Codecs

Note ▸ Lossless ≠ uncompressed. FLAC typically achieves 30–60 % size reduction without altering the PCM stream.

5 · Digital Audio Interfacing Protocols

5.1 Chip‑level Serial Buses

5.2 Professional & Consumer Links

6 · Clocking, Jitter & Sync

All digital‑audio devices must agree on sample‑rate and phase. An unstable clock manifests as jitter, injecting noise and distortion.

7 · Low‑Level Programming APIs

PlatformPrimary APINotes
Linux ALSA PCM & MIDI; snd_pcm_readi() / writei().
macOS / iOS Core Audio High‑level AVAudioEngine; low‑level AudioUnit.
Windows WASAPI / ASIO WASAPI shared/exclusive; ASIO bypasses KMixer for low latency.
Cross‑platform JACK, PortAudio Callback‑driven real‑time graph; used by Ardour, Reaper‑Linux.
Web Web Audio API Graph‑node processing inside browsers; AudioWorklet for DSP.

Buffers are typically delivered via a real‑time audioCallback() that must finish before the next block is due – otherwise you’ll hear underruns (glitches).

8 · Latency, Buffers & Throughput

Total round‑trip latency ≈ ADC‑block + processing + DAC‑block. Typical buffer sizes:

Note ▸ Smaller buffers lower latency but raise CPU‑wake‑ups → more context switches and potential underruns on low‑power devices.

9 · Essential DSP Building Blocks

10 · MIDI, OSC & Control Data

While audio streams carry raw PCM or encoded frames, control messages (MIDI, OSC, …) operate at far lower bandwidths. They handle note events, parameter automation, and device discovery.

// Send middle‑C (60) at max velocity
sendMIDI(0x90 /* NoteOn ch1 */, 60, 127);

High‑resolution extensions (MIDI 2.0, MPE) provide per‑note expression for modern synths and DAWs.

11 · Glossary of Key Terms

Bit Depth
Resolution of each sample’s amplitude (16‑bit ≈ 96 dB dynamic range).
Block Size
Number of frames per processing chunk delivered to the audio callback.
Endianness
Byte‑order of multibyte values; PCM may be LE (WAV) or BE (AIFF).
Frame
One sample per channel; e.g. stereo frame = L + R sample.
Inter‑leaved
Samples stored LRLRLR… versus planar (LLLL…RRRR…).
Jitter
Time‑base deviation of the sampling clock.
Latency
Delay from input to output through the digital chain.
Nyquist frequency
Half the sample‑rate; highest representable tone without aliasing.
Oversampling
Processing at n × sample‑rate to ease filter design and reduce noise.
Sample
Single numeric measurement of signal amplitude at a discrete time point.