Web Audio API and Digital Audio Theory

Digital Audio Theory

Fundamentals of sound, sampling, and quantization.

Analog vs digital representation of a sine wave at increasing sample rates. In the analog (left) plots the waveform is smooth, while in digital (right) plots only discrete sample points are stored. This illustrates that higher sample rates capture more detail. Digital audio converts an analog sound pressure wave (with continuous frequency and amplitude) into a series of discrete samples by measuring the amplitude at a fixed sample rate. Each sample’s amplitude is then quantized to a finite resolution (bit depth). Higher bit depths allow more precise amplitude values, reducing quantization noise.

Sound waves have key properties: frequency (pitch) and amplitude (loudness). A sine wave is a pure tone at a single frequency. Other basic waves include square, sawtooth, and triangle waves, each composed of different harmonics of a sine wave. Frequency is measured in hertz (cycles per second), and amplitude correlates with perceived volume. In digital audio, frequencies are represented within the Nyquist range and amplitudes are limited by quantization. According to the Nyquist theorem, to capture a frequency f without aliasing the sample rate must be at least 2×f. For example, a 44.1 kHz sample rate can capture frequencies up to 22.05 kHz, just above the human hearing limit. Signals above the Nyquist frequency will be aliased (folded) into lower frequencies. In practice, analog signals are low-pass filtered before sampling to remove ultrasonic content and prevent aliasing.

AudioContext and Audio Graph

The core audio context and routing graph.

The AudioContext is the primary object in Web Audio. It represents an audio-processing graph where AudioNodes are connected together. You create one by new AudioContext(). All sound generation and processing occurs within this context. The final output of the graph is an AudioDestinationNode (the context’s destination), which typically corresponds to the device’s speakers. Other nodes (oscillators, filters, etc.) are connected into chains that eventually lead to the destination.

const audioCtx = new AudioContext(); const osc = audioCtx.createOscillator(); const gain = audioCtx.createGain(); osc.connect(gain); gain.connect(audioCtx.destination); osc.start();

Each AudioNode has a fixed number of inputs/outputs. Sources (like OscillatorNode) have no inputs and one output; processing nodes (like GainNode, BiquadFilterNode) typically have one input and one output; the destination has one input and no outputs. Audio flows from sources through any chain of processing nodes to the destination. The graph can have forks (one output to multiple inputs) and merges (multiple sources into one) using the same connect() method for each connection.

OscillatorNode

Generating waveforms (basic tones).

The OscillatorNode generates a periodic waveform (tone) at a set frequency. It has no inputs and one output. Key properties (AudioParam objects unless noted):

type: waveform shape string; standard values are "sine" (default), "square", "sawtooth", "triangle", or "custom" for a user-defined waveform.
frequency: AudioParam in Hz. Default value is 440 (standard A4).
detune: AudioParam in cents for fine-tuning pitch. Default is 0 (no detune).

Methods (from AudioScheduledSourceNode):

oscillator.start(when): starts the oscillator at time when (seconds, relative to audioCtx.currentTime). If when is omitted or 0, starts immediately.
oscillator.stop(when): stops the oscillator at time when. After stopping, the oscillator cannot be restarted.
oscillator.setPeriodicWave(wave): use a custom periodic waveform defined by a PeriodicWave instead of a standard type.

Example: create and play a 440 Hz square wave:

const audioCtx = new AudioContext(); const osc = audioCtx.createOscillator(); osc.type = "square"; osc.frequency.setValueAtTime(440, audioCtx.currentTime); osc.connect(audioCtx.destination); osc.start();

GainNode

Volume control.

The GainNode applies a volume change (gain) to the audio passing through it. It has one input and one output. It has one key property:

gain: an AudioParam (a-rate) that multiplies each sample’s amplitude. Default value is 1.0 (unity gain).

No methods beyond standard AudioNode connect/disconnect. When changing gain dynamically, use audio-param automation (e.g. gainNode.gain.linearRampToValueAtTime(...)) to avoid clicks.

Example: fade in volume from 0 to 1 over 2 seconds:

const gainNode = audioCtx.createGain(); gainNode.gain.setValueAtTime(0.0, audioCtx.currentTime); gainNode.gain.linearRampToValueAtTime(1.0, audioCtx.currentTime + 2.0); // connect source -> gain -> destination source.connect(gainNode); gainNode.connect(audioCtx.destination);

BiquadFilterNode

Standard audio filter.

The BiquadFilterNode represents a second-order (biquad) filter. It has one input and one output. It can implement lowpass, highpass, bandpass, shelving, peaking, notch, etc., depending on its type. Key properties (AudioParam unless noted):

type: string enum, filter type. Valid values include "lowpass", "highpass", "bandpass", "lowshelf", "highshelf", "peaking", "notch", "allpass". Default is "lowpass".
frequency: AudioParam, cutoff or center frequency in Hz.
Q: AudioParam, quality factor (bandwidth of the filter). Default is 1.0.
gain: AudioParam in dB, used for shelving/peaking filters.
detune: AudioParam in cents applied to the frequency.

No special methods. Example: apply a low-pass at 1000 Hz:

const filter = audioCtx.createBiquadFilter(); filter.type = "lowpass"; filter.frequency.setValueAtTime(1000, audioCtx.currentTime); source.connect(filter); filter.connect(audioCtx.destination);

DynamicsCompressorNode

Audio compressor (limiter).

DynamicsCompressorNode reduces the dynamic range of audio to prevent clipping and even out loud peaks. It has one input and one output. Relevant AudioParam properties (k-rate):

threshold: dB above which compression starts.
knee: dB range above threshold where compression gradually increases.
ratio: ratio of input to output dB (e.g., 4 means 4 dB in -> 1 dB out).
attack: time (s) to reduce gain by 10 dB once threshold is exceeded.
release: time (s) to increase gain by 10 dB after signal falls below threshold.
reduction: read-only float, current dB reduction applied.

No unique methods. Example: set a gentle compression:

const compressor = audioCtx.createDynamicsCompressor(); compressor.threshold.setValueAtTime(-24, audioCtx.currentTime); compressor.ratio.setValueAtTime(4, audioCtx.currentTime); source.connect(compressor); compressor.connect(audioCtx.destination);

AnalyserNode

Real-time audio data for visualization.

The AnalyserNode provides real-time frequency and time-domain analysis data. It has one input and one output, passing the audio through unchanged. Main parameters:

fftSize: FFT window size (power of 2); default 2048. The frequencyBinCount equals fftSize/2, the number of data points in the spectrum.
minDecibels, maxDecibels: minimum and maximum dB values for scaling frequency data.
smoothingTimeConstant: smoothing factor for averaging frequency data.

Key methods to obtain data:

getByteTimeDomainData(array): copies the current waveform (byte values 0–255) into a Uint8Array.
getByteFrequencyData(array): copies the current frequency spectrum (byte values 0–255) into a Uint8Array.
Float versions (getFloatTimeDomainData, getFloatFrequencyData) copy into a Float32Array.

Example: draw time-domain waveform (oscilloscope) on a canvas:

const analyser = audioCtx.createAnalyser(); analyser.fftSize = 2048; const bufferLength = analyser.frequencyBinCount; const dataArray = new Uint8Array(bufferLength); function draw() { requestAnimationFrame(draw); analyser.getByteTimeDomainData(dataArray); // draw dataArray onto canvas... } draw();

MediaStreamAudioSourceNode

Microphone or stream input.

MediaStreamAudioSourceNode is an AudioNode that uses audio from a MediaStream (for example, from getUserMedia()). It has no inputs and one output. You create it with audioCtx.createMediaStreamSource(stream), where stream comes from the WebRTC/Media Capture APIs. It exposes the original stream via the mediaStream property. Typically you connect it to other nodes (e.g., GainNode) and finally to the destination to hear the live input.

navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => { const micSource = audioCtx.createMediaStreamSource(stream); micSource.connect(audioCtx.destination); });

AudioDestinationNode

The final output node (speakers).

The AudioDestinationNode represents the final output of the audio graph (e.g. the speakers or headphones). It has exactly one input and zero outputs (you can’t connect any node after it). You access it via audioCtx.destination. Its maxChannelCount property indicates the maximum number of channels the hardware supports. Example use: oscillator.connect(audioCtx.destination) sends the oscillator directly to the speakers.

const audioCtx = new AudioContext(); const osc = audioCtx.createOscillator(); osc.connect(audioCtx.destination); osc.start();

AudioWorkletNode

Custom DSP modules.

AudioWorkletNode allows custom audio processing via a user-defined processor class. You define an AudioWorkletProcessor (in a separate JavaScript module) and register it. Then load it into the context (audioCtx.audioWorklet.addModule("my-processor.js")) and create an AudioWorkletNode by name. Key features:

workletNode.port is a MessagePort for two-way communication between main thread and processor.
workletNode.parameters is an AudioParamMap of parameters declared by the processor (via static parameterDescriptors).

No unique methods beyond AudioNode. Example (white noise generator):

// noise-processor.js class NoiseProcessor extends AudioWorkletProcessor { process(inputs, outputs, parameters) { const output = outputs[0]; for(const channel of output) { for(let i = 0; i < channel.length; i++) { channel[i] = Math.random() * 2 - 1; } } return true; } } registerProcessor("noise-processor", NoiseProcessor); // main script
await audioCtx.audioWorklet.addModule("noise-processor.js");
const noiseNode = new AudioWorkletNode(audioCtx, "noise-processor");
noiseNode.connect(audioCtx.destination);

OfflineAudioContext

Non-real-time rendering (batch).

OfflineAudioContext is like AudioContext but renders audio as fast as possible into a buffer instead of playing it live. You specify number of channels, length (in frames), and sample rate on creation. Once all nodes are connected and events scheduled, call offlineCtx.startRendering(), which returns a Promise. When it resolves, you get an AudioBuffer with the rendered audio.

Example: mix an oscillator offline and then play the result:

const offlineCtx = new OfflineAudioContext(2, 44100*5, 44100); const osc = offlineCtx.createOscillator(); osc.connect(offlineCtx.destination); osc.start(); offlineCtx.startRendering().then(buffer => { const newSrc = audioCtx.createBufferSource(); newSrc.buffer = buffer; newSrc.connect(audioCtx.destination); newSrc.start(); });

Audio Routing and Scheduling

Connecting nodes and precise timing.

The Web Audio API’s audio graph is directed: audio flows from source nodes through any number of processing nodes to the destination. Use node.connect(targetNode) to connect an output to another node’s input, or to an AudioParam to modulate it. Fan-out (one output to multiple inputs) and fan-in (multiple outputs to one input) are supported by calling connect() on each desired link. For example:

const lfo = audioCtx.createOscillator(); const gainA = audioCtx.createGain(); const gainB = audioCtx.createGain(); lfo.connect(gainA.gain); lfo.connect(gainB.gain); lfo.start(); // lfo now modulates two different gain parameters simultaneously

Scheduling: all audio timing is based on audioCtx.currentTime. You can schedule precise playback and parameter changes. For example, osc.start(audioCtx.currentTime + 0.5) starts the oscillator 0.5 seconds in the future. AudioParam automation methods (setValueAtTime, linearRampToValueAtTime, etc.) let you change values over time. This allows sample-accurate envelopes and sequences. Real-time applications (drum machines, sequencers) rely on this precise timing. Keep the graph as simple as possible to minimize latency; for instance, place a GainNode last so volume changes take effect immediately.

Web MIDI and Polyphony

MIDI input and multi-voice synthesis.

The Web MIDI API enables access to MIDI devices. Call navigator.requestMIDIAccess() (in a secure context) which returns a Promise that resolves to a MIDIAccess object. You can then enumerate midi.inputs and listen for input.onmidimessage events. A MIDI "Note On" message has a status byte (0x90 + channel) and two data bytes: note number and velocity. A typical handler might do:

input.onmidimessage = (event) => { const [status, note, velocity] = event.data; const isNoteOn = (status & 0xF0) === 0x90; // handle note on/off... };

To synthesize sounds, map the MIDI note (0–127) to frequency (Hz). For equal-tempered tuning with A4=440Hz, use 440 * 2^((note-69)/12). For polyphony, maintain one oscillator (and a gain) per active note. On note-on, create an OscillatorNode and a GainNode, set osc.frequency.value and gain.gain.value (e.g., velocity/127), connect them, and osc.start(). Store them in a map keyed by note. On note-off, stop and disconnect that voice. Example:

let voices = {}; navigator.requestMIDIAccess().then((midi) => { midi.inputs.forEach(input => { input.onmidimessage = (msg) => { const [status, note, velocity] = msg.data; const isNoteOn = (status & 0xF0) === 0x90; const freq = 440 * Math.pow(2, (note - 69) / 12); if(isNoteOn && velocity > 0) { const osc = audioCtx.createOscillator(); const gain = audioCtx.createGain(); osc.frequency.setValueAtTime(freq, audioCtx.currentTime); gain.gain.setValueAtTime(velocity/127, audioCtx.currentTime); osc.connect(gain); gain.connect(audioCtx.destination); osc.start(); voices[note] = {osc, gain}; } else { const voice = voices[note]; if(voice) { voice.osc.stop(audioCtx.currentTime); voice.osc.disconnect(); voice.gain.disconnect(); delete voices[note]; } } }; }); });

This handles multiple notes for a simple polyphonic synth driven by MIDI.

Real-Time Processing and Visualization

Scheduling and data visualization.

Web Audio supports precise real-time scheduling. All timing is based on audioCtx.currentTime. For example, oscillator.start(audioCtx.currentTime + 1.0) schedules playback exactly 1 second in the future. AudioParam automation methods (setValueAtTime, linearRampToValueAtTime, etc.) allow you to create envelopes and LFOs with sample-level accuracy. Such scheduling enables rhythm and sequencer applications. According to MDN, this design allows developers to target specific samples even at high sample rates.

For visualization, use an AnalyserNode. In an animation loop (e.g., using requestAnimationFrame), call getByteTimeDomainData or getByteFrequencyData on the analyser to obtain audio data arrays, then draw them onto a <canvas>. Because AnalyserNode passes audio through unchanged, visualization runs in parallel without affecting playback. This is how waveform oscilloscopes or spectrograms are implemented in real time.

Spatial Audio (Panner and Reverb)

3D positioning and convolution reverb.

The API provides spatialization nodes. PannerNode simulates a point audio source in 3D space. It has one input and one output. Important properties include: positionX/Y/Z (AudioParams) for the source location, and orientationX/Y/Z for its facing direction. It also has parameters like panningModel (e.g. “HRTF”), distanceModel (how volume falls off with distance), and coneInnerAngle/coneOuterAngle for directional volume cones. To use, connect a source through a PannerNode before the destination: source.connect(panner); panner.connect(audioCtx.destination);.

ConvolverNode applies convolution with an impulse response (IR) buffer to achieve reverberation. It has one input and one output. Its buffer property holds an AudioBuffer containing a recorded impulse response (e.g. of a real room). The normalize boolean (default true) controls whether the IR is scaled. To use it, fetch and decode an IR audio file and assign it: convolver.buffer = irBuffer. Example:

const convolver = audioCtx.createConvolver(); fetch("impulse-response.wav") .then(res => res.arrayBuffer()) .then(data => audioCtx.decodeAudioData(data)) .then(irBuffer => { convolver.buffer = irBuffer; source.connect(convolver); convolver.connect(audioCtx.destination); });

This adds realistic reverb to the source using the convolution IR.