Fundamentals of sound, sampling, and quantization.
Analog vs digital representation of a sine wave at increasing sample rates. In the analog (left) plots the waveform is smooth, while in digital (right) plots only discrete sample points are stored. This illustrates that higher sample rates capture more detail. Digital audio converts an analog sound pressure wave (with continuous frequency and amplitude) into a series of discrete samples by measuring the amplitude at a fixed sample rate. Each sample’s amplitude is then quantized to a finite resolution (bit depth). Higher bit depths allow more precise amplitude values, reducing quantization noise.
Sound waves have key properties: frequency (pitch) and amplitude (loudness). A sine wave is a pure tone at a single frequency. Other basic waves include square, sawtooth, and triangle waves, each composed of different harmonics of a sine wave. Frequency is measured in hertz (cycles per second), and amplitude correlates with perceived volume. In digital audio, frequencies are represented within the Nyquist range and amplitudes are limited by quantization. According to the Nyquist theorem, to capture a frequency f without aliasing the sample rate must be at least 2×f. For example, a 44.1 kHz sample rate can capture frequencies up to 22.05 kHz, just above the human hearing limit. Signals above the Nyquist frequency will be aliased (folded) into lower frequencies. In practice, analog signals are low-pass filtered before sampling to remove ultrasonic content and prevent aliasing.
The core audio context and routing graph.
The AudioContext
is the primary object in Web Audio. It represents an
audio-processing graph where AudioNode
s are connected
together. You create one by new AudioContext()
. All sound generation and processing occurs within this
context. The final output of the graph is an AudioDestinationNode
(the context’s
destination
), which typically corresponds to the device’s
speakers. Other nodes (oscillators, filters, etc.) are
connected into chains that eventually lead to the destination.
const audioCtx = new AudioContext(); const osc = audioCtx.createOscillator(); const gain = audioCtx.createGain(); osc.connect(gain); gain.connect(audioCtx.destination); osc.start();
Each AudioNode
has a fixed number of inputs/outputs. Sources (like
OscillatorNode
) have no inputs and one output; processing nodes (like
GainNode
, BiquadFilterNode
) typically have one input and one output; the
destination has one input and no
outputs. Audio flows from
sources through any chain of processing nodes to the destination. The graph can have forks (one
output to multiple inputs) and merges (multiple sources into one) using the same
connect()
method for each connection.
Generating waveforms (basic tones).
The OscillatorNode
generates a periodic waveform (tone) at a set
frequency. It has no inputs and one output. Key properties
(AudioParam
objects unless noted):
type
: waveform shape string; standard values are "sine"
(default), "square"
, "sawtooth"
, "triangle"
, or
"custom"
for a user-defined waveform.frequency
: AudioParam
in Hz. Default value is 440
(standard A4).detune
: AudioParam
in cents for fine-tuning pitch.
Default is 0 (no detune).Methods (from AudioScheduledSourceNode
):
oscillator.start(when)
: starts the oscillator at time when
(seconds, relative to audioCtx.currentTime
). If
when
is omitted or 0
, starts immediately.oscillator.stop(when)
: stops the oscillator at time when
. After stopping, the oscillator cannot be restarted.oscillator.setPeriodicWave(wave)
: use a custom periodic waveform
defined by a PeriodicWave
instead of a standard type
.Example: create and play a 440 Hz square wave:
const audioCtx = new AudioContext(); const osc = audioCtx.createOscillator(); osc.type = "square"; osc.frequency.setValueAtTime(440, audioCtx.currentTime); osc.connect(audioCtx.destination); osc.start();
Volume control.
The GainNode
applies a volume change (gain) to the audio passing through
it. It has one input and one output. It has one key property:
gain
: an AudioParam
(a-rate) that multiplies each
sample’s amplitude. Default value is 1.0
(unity
gain).No methods beyond standard AudioNode
connect/disconnect. When changing gain dynamically,
use audio-param automation (e.g. gainNode.gain.linearRampToValueAtTime(...)
) to avoid
clicks.
Example: fade in volume from 0 to 1 over 2 seconds:
const gainNode = audioCtx.createGain(); gainNode.gain.setValueAtTime(0.0, audioCtx.currentTime); gainNode.gain.linearRampToValueAtTime(1.0, audioCtx.currentTime + 2.0); // connect source -> gain -> destination source.connect(gainNode); gainNode.connect(audioCtx.destination);
Standard audio filter.
The BiquadFilterNode
represents a second-order (biquad)
filter. It has one input and one output. It can implement
lowpass, highpass, bandpass, shelving, peaking, notch, etc., depending on its type
. Key
properties (AudioParam
unless noted):
type
: string enum, filter type. Valid values include
"lowpass"
, "highpass"
, "bandpass"
,
"lowshelf"
, "highshelf"
, "peaking"
, "notch"
,
"allpass"
. Default is
"lowpass"
.frequency
: AudioParam, cutoff or center frequency in Hz.Q
: AudioParam, quality factor (bandwidth of the filter). Default is
1.0
.gain
: AudioParam in dB, used for shelving/peaking filters.detune
: AudioParam in cents applied to the frequency.No special methods. Example: apply a low-pass at 1000 Hz:
const filter = audioCtx.createBiquadFilter(); filter.type = "lowpass"; filter.frequency.setValueAtTime(1000, audioCtx.currentTime); source.connect(filter); filter.connect(audioCtx.destination);
Audio compressor (limiter).
DynamicsCompressorNode
reduces the dynamic range of audio to prevent
clipping and even out loud peaks. It has one input and one
output. Relevant AudioParam
properties (k-rate):
threshold
: dB above which compression starts.knee
: dB range above threshold where compression gradually
increases.ratio
: ratio of input to output dB (e.g., 4 means 4 dB in -> 1 dB
out).attack
: time (s) to reduce gain by 10 dB once threshold is exceeded.
release
: time (s) to increase gain by 10 dB after signal falls below
threshold.reduction
: read-only float, current dB reduction
applied.No unique methods. Example: set a gentle compression:
const compressor = audioCtx.createDynamicsCompressor(); compressor.threshold.setValueAtTime(-24, audioCtx.currentTime); compressor.ratio.setValueAtTime(4, audioCtx.currentTime); source.connect(compressor); compressor.connect(audioCtx.destination);
Real-time audio data for visualization.
The AnalyserNode
provides real-time frequency and time-domain analysis
data. It has one input and one output, passing the audio
through unchanged. Main parameters:
fftSize
: FFT window size (power of 2); default 2048. The
frequencyBinCount
equals fftSize/2
, the number of data points in the
spectrum.minDecibels
, maxDecibels
: minimum and
maximum dB values for scaling frequency data.smoothingTimeConstant
: smoothing factor for averaging frequency
data.Key methods to obtain data:
getByteTimeDomainData(array)
: copies the current waveform (byte
values 0–255) into a Uint8Array
.getByteFrequencyData(array)
: copies the current frequency spectrum
(byte values 0–255) into a Uint8Array
.getFloatTimeDomainData
, getFloatFrequencyData
) copy
into a Float32Array
.Example: draw time-domain waveform (oscilloscope) on a canvas:
const analyser = audioCtx.createAnalyser(); analyser.fftSize = 2048; const bufferLength = analyser.frequencyBinCount; const dataArray = new Uint8Array(bufferLength); function draw() { requestAnimationFrame(draw); analyser.getByteTimeDomainData(dataArray); // draw dataArray onto canvas... } draw();
Microphone or stream input.
MediaStreamAudioSourceNode
is an AudioNode
that uses audio
from a MediaStream
(for example, from
getUserMedia()
). It has no inputs and one
output. You create it with audioCtx.createMediaStreamSource(stream)
, where
stream
comes from the WebRTC/Media Capture APIs. It exposes the original stream via the
mediaStream
property. Typically you connect it
to other nodes (e.g., GainNode
) and finally to the destination to hear the live input.
navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => { const micSource = audioCtx.createMediaStreamSource(stream); micSource.connect(audioCtx.destination); });
The final output node (speakers).
The AudioDestinationNode
represents the final output of the audio graph
(e.g. the speakers or headphones). It has exactly one input
and zero outputs (you can’t connect any node after it). You access it via
audioCtx.destination
. Its maxChannelCount
property indicates the maximum
number of channels the hardware supports. Example use:
oscillator.connect(audioCtx.destination)
sends the oscillator directly to the speakers.
const audioCtx = new AudioContext(); const osc = audioCtx.createOscillator(); osc.connect(audioCtx.destination); osc.start();
Custom DSP modules.
AudioWorkletNode
allows custom audio processing via a user-defined
processor class. You define an
AudioWorkletProcessor
(in a separate JavaScript module) and register it. Then load it
into the context (audioCtx.audioWorklet.addModule("my-processor.js")
) and create an
AudioWorkletNode
by name. Key features:
workletNode.port
is a MessagePort
for two-way communication between
main thread and processor.workletNode.parameters
is an AudioParamMap
of parameters declared by
the processor (via static
parameterDescriptors
).No unique methods beyond AudioNode
. Example (white noise generator):
// noise-processor.js class NoiseProcessor extends AudioWorkletProcessor { process(inputs, outputs, parameters) { const output = outputs[0]; for(const channel of output) { for(let i = 0; i < channel.length; i++) { channel[i] = Math.random() * 2 - 1; } } return true; } } registerProcessor("noise-processor", NoiseProcessor); // main script
await audioCtx.audioWorklet.addModule("noise-processor.js");
const noiseNode = new AudioWorkletNode(audioCtx, "noise-processor");
noiseNode.connect(audioCtx.destination);
Non-real-time rendering (batch).
OfflineAudioContext
is like AudioContext
but renders audio
as fast as possible into a buffer instead of playing it live.
You specify number of channels, length (in frames), and sample rate on creation. Once all nodes are
connected and events scheduled, call offlineCtx.startRendering()
, which
returns a Promise. When it resolves, you get an AudioBuffer
with the rendered
audio.
Example: mix an oscillator offline and then play the result:
const offlineCtx = new OfflineAudioContext(2, 44100*5, 44100); const osc = offlineCtx.createOscillator(); osc.connect(offlineCtx.destination); osc.start(); offlineCtx.startRendering().then(buffer => { const newSrc = audioCtx.createBufferSource(); newSrc.buffer = buffer; newSrc.connect(audioCtx.destination); newSrc.start(); });
Connecting nodes and precise timing.
The Web Audio API’s audio graph is directed: audio flows from source nodes through any number of
processing nodes to the destination. Use node.connect(targetNode)
to connect an output to another node’s input, or
to an AudioParam to modulate it. Fan-out (one output to
multiple inputs) and fan-in (multiple outputs to one input) are supported by calling
connect()
on each desired link. For example:
const lfo = audioCtx.createOscillator(); const gainA = audioCtx.createGain(); const gainB = audioCtx.createGain(); lfo.connect(gainA.gain); lfo.connect(gainB.gain); lfo.start(); // lfo now modulates two different gain parameters simultaneously
Scheduling: all audio timing is based on audioCtx.currentTime
. You can schedule precise
playback and parameter changes. For example, osc.start(audioCtx.currentTime + 0.5)
starts the oscillator 0.5 seconds in the future. AudioParam
automation methods
(setValueAtTime
, linearRampToValueAtTime
, etc.) let you change values over
time. This allows sample-accurate envelopes and sequences.
Real-time applications (drum machines, sequencers) rely on this precise timing. Keep the graph as
simple as possible to minimize latency; for instance, place a GainNode
last so volume
changes take effect immediately.
MIDI input and multi-voice synthesis.
The Web MIDI API enables access to MIDI devices. Call navigator.requestMIDIAccess()
(in a secure context) which returns a
Promise that resolves to a MIDIAccess
object.
You can then enumerate midi.inputs
and listen for input.onmidimessage
events. A MIDI "Note On" message has a status byte (0x90 + channel) and two data bytes: note number
and velocity. A typical handler might do:
input.onmidimessage = (event) => { const [status, note, velocity] = event.data; const isNoteOn = (status & 0xF0) === 0x90; // handle note on/off... };
To synthesize sounds, map the MIDI note
(0–127) to frequency (Hz). For equal-tempered
tuning with A4=440Hz, use 440 * 2^((note-69)/12)
. For polyphony, maintain one
oscillator (and a gain) per active note. On note-on, create an OscillatorNode
and a
GainNode
, set osc.frequency.value
and gain.gain.value
(e.g.,
velocity/127
), connect them, and osc.start()
. Store them in a map keyed by
note
. On note-off, stop and disconnect that voice. Example:
let voices = {}; navigator.requestMIDIAccess().then((midi) => { midi.inputs.forEach(input => { input.onmidimessage = (msg) => { const [status, note, velocity] = msg.data; const isNoteOn = (status & 0xF0) === 0x90; const freq = 440 * Math.pow(2, (note - 69) / 12); if(isNoteOn && velocity > 0) { const osc = audioCtx.createOscillator(); const gain = audioCtx.createGain(); osc.frequency.setValueAtTime(freq, audioCtx.currentTime); gain.gain.setValueAtTime(velocity/127, audioCtx.currentTime); osc.connect(gain); gain.connect(audioCtx.destination); osc.start(); voices[note] = {osc, gain}; } else { const voice = voices[note]; if(voice) { voice.osc.stop(audioCtx.currentTime); voice.osc.disconnect(); voice.gain.disconnect(); delete voices[note]; } } }; }); });
This handles multiple notes for a simple polyphonic synth driven by MIDI.
Scheduling and data visualization.
Web Audio supports precise real-time scheduling. All timing is based on
audioCtx.currentTime
. For example,
oscillator.start(audioCtx.currentTime + 1.0)
schedules playback exactly 1 second in the
future. AudioParam automation methods (setValueAtTime
,
linearRampToValueAtTime
, etc.) allow you to create envelopes and LFOs with sample-level
accuracy. Such scheduling enables rhythm and sequencer applications. According to MDN, this design
allows developers to target specific samples even at high sample
rates.
For visualization, use an AnalyserNode
. In an animation loop (e.g., using
requestAnimationFrame
), call getByteTimeDomainData
or
getByteFrequencyData
on the analyser to obtain audio data arrays, then draw them onto a
<canvas>
. Because AnalyserNode
passes audio through
unchanged, visualization runs in parallel without affecting
playback. This is how waveform oscilloscopes or spectrograms are implemented in real time.
3D positioning and convolution reverb.
The API provides spatialization nodes. PannerNode
simulates a point audio
source in 3D space. It has one input and one output.
Important properties include: positionX/Y/Z
(AudioParams) for the source location, and
orientationX/Y/Z
for its facing direction. It also has parameters like
panningModel
(e.g. “HRTF”), distanceModel
(how volume falls off with
distance), and coneInnerAngle
/coneOuterAngle
for directional volume
cones. To use, connect
a source through a PannerNode
before the destination:
source.connect(panner); panner.connect(audioCtx.destination);
.
ConvolverNode
applies convolution with an impulse response (IR) buffer to
achieve reverberation. It has one input and one output. Its
buffer
property holds an AudioBuffer
containing a recorded impulse
response (e.g. of a real room). The normalize
boolean (default true) controls whether the IR is scaled. To use it, fetch and decode an IR audio
file and assign it: convolver.buffer = irBuffer
. Example:
const convolver = audioCtx.createConvolver(); fetch("impulse-response.wav") .then(res => res.arrayBuffer()) .then(data => audioCtx.decodeAudioData(data)) .then(irBuffer => { convolver.buffer = irBuffer; source.connect(convolver); convolver.connect(audioCtx.destination); });
This adds realistic reverb to the source
using the convolution IR.