
These aren't edge cases—they're the kinds of problems that delay campaigns and erode client trust. Commercial audio mixing is a discipline where technical precision and creative intent have to coexist, sometimes within the space of five seconds.
This guide covers everything you need to know: the core audio elements, essential mixing techniques, US broadcast loudness standards, the post-production workflow, and the most common mistakes that send mixes back for revision.
TL;DR
- Voiceover/dialogue sits at the top of the sound hierarchy—always protect its clarity
- The US broadcast standard targets -24 LKFS (ATSC A/85), mandated by the CALM Act
- Over-compression backfires on air: broadcast normalization strips the added gain and leaves only the distortion
- Stems (dialogue-only, music-only, SFX-only) are required deliverables, not optional extras
- Final QC should check loudness compliance and playback translation across multiple speaker types
What Is Audio Mixing for TV Commercials?
Audio mixing for TV commercials is the process of combining voiceover, dialogue, music, and sound effects into a single, broadcast-ready soundtrack where every element serves the brand message without competing with the others.
As postPerspective defines it, audio post-production is "the process of adding or changing sound to picture"—encompassing voiceover recording, sound design, music editing, and final mixing to a moving image. For commercials specifically, that process also includes 5.1 and stereo broadcast LKFS mixing as a compliance requirement alongside the creative work.
How Commercial Mixing Differs from Other Formats
Film mixing allows a slow build—90 minutes of storytelling gives the sound team room to breathe. Music mixing is about serving the song, not a brand message or compliance deadline.
Commercial audio has none of that latitude. A :30 spot compresses what a feature film spreads across two hours into a single scene. Every layer must earn its place in the first second it appears.
Common spot lengths each impose their own timing constraints:
- :05 and :10 — pure brand recall, nearly every second carries audio impact
- :15 — one clear message, tightly scored
- :30 — the standard format, balancing narrative and compliance
- :60 — more room to breathe, but still no tolerance for filler
The underlying challenge is consistent across all of them: more creative and technical decisions per second, with immediate emotional impact required and no room for gradual development.
The Core Audio Elements of a TV Commercial
Voiceover and On-Camera Dialogue
Dialogue and voiceover sit at the top of the sound hierarchy. They carry the brand message and call to action, and EBU research on speech intelligibility identifies three primary factors affecting clarity: balance with background sounds, speech-to-noise ratio, and interference from other audio elements.
The challenge is playback variation. EBU notes that built-in flat-screen TV speakers often have "rather mediocre reproduction properties," while home theater setups introduce room simulation, bass boost, and spatial processing that can undermine dialogue clarity regardless of how well the mix was crafted.
A well-built mix accounts for all of these environments from the start — not as an afterthought during delivery.
Background Music
Music is the emotional driver. It tells the viewer how to feel about the product before they consciously process the words. The choice between royalty-free library music and custom composed tracks affects more than budget—it affects mixing flexibility.
Stem delivery makes a concrete difference in what's possible at the mix stage:
- Stems (separate files for melody, rhythm, pads, etc.) let a mixer adjust individual layers around voiceover without touching the full track
- Full stereo files lock the music as-is — no layer-level control once it's been rendered
- Requesting stems from music providers is worth doing regardless of whether the track is library or commissioned
Sound Effects and Ambience
SFX adds realism and punctuation to product moments—a soda pour, a car door closing, a product click. The choice between library SFX, custom-recorded Foley, and custom sound design scales with budget and ambition, but the right effect can transform a generic moment into something memorable.
Ambience and room tone serve a different function: they're the acoustic glue between edit cuts. When dialogue is assembled from multiple takes recorded in the same space, slight variations in background noise create audible gaps. Filling those gaps with matching room tone is a foundational dialogue editing step, not an optional polish.
The Sound Hierarchy
The standard priority order for commercial audio:
- Dialogue and voiceover — always intelligible, always present
- Music — supports emotional tone without masking VO
- Sound effects — texture and punctuation below the music layer
- Ambience — acoustic glue at the foundation

This hierarchy can be broken deliberately—a product-reveal moment where music surges to the front can be a powerful creative choice. When it's intentional, the mix typically signals the shift with a brief duck of the VO or a defined moment in the music — not a gradual drift that happens because levels weren't checked.
Essential Mixing Techniques for Commercial Audio
EQ for Clarity
EQ carves frequency space so each element can be heard without masking others. In practice for commercials:
- High-pass filter music tracks to clear low-mid frequencies where dialogue lives
- Add presence boost to VO (typically in the 2–5 kHz range) to help it cut through a busy mix
- Cut competing frequencies between music and dialogue rather than boosting VO into a fight it can't win
Compression for Dynamic Control
Compression manages the gap between the loudest and quietest moments in a track, keeping dialogue intelligible whether the scene is quiet or kinetic.
The trap: over-compression. NAB CALM best-practices guidance specifically recommends keeping inline compression settings "as light as possible" in loudness-management workflows. Heavy compression creates a flat, fatiguing sound—and creates problems downstream when broadcast normalization processes the file.
Level Balancing and Automation
Set levels with a clear anchor point: the VO track establishes the reference, and music and SFX are brought underneath it. Consistent gain staging from the start of the session prevents clipping and distortion from accumulating as the mix builds.
Rather than static fader positions, automation gives the engineer precise moment-to-moment control over how the mix evolves:
- Duck music under critical VO lines
- Swell it back up at the emotional payoff
- Ride SFX levels to punctuate a product moment
Without automation, the mix sits flat. With it, the audio tracks the story beat by beat.
Panning and Mono Compatibility
Place sounds within the stereo field to mirror the visual action and create a sense of space. For broadcast, three rules govern how you place those elements:
- Center channel for all VO and dialogue — ATSC A/85 requires this to prevent loudness build-up in stereo downmixes
- Verify stereo downmix — NAB guidance mandates loudness checks across both the stereo mix and the mono sum
- Test on consumer speakers — a mix that loses VO clarity in mono fails broadcast standards regardless of how it sounds on studio monitors
Broadcast Loudness Standards and Compliance
The CALM Act
The Commercial Advertisement Loudness Mitigation (CALM) Act was enacted on December 15, 2010, with FCC rules becoming effective December 13, 2012. It mandates that TV stations, cable operators, and satellite TV providers transmit commercials at the same average volume as surrounding programming — a direct response to consumer complaints about jarring volume jumps between programs and ads.
The FCC relies on consumer complaints to identify potential violations. As of a 2025 Federal Register notice, no forfeitures or fines have been issued for CALM Act violations, but network rejection of non-compliant audio is a practical enforcement mechanism that matters far more in day-to-day delivery.
The -24 LKFS Target
The US broadcast standard, ATSC A/85, targets -24 LKFS as the integrated loudness for a commercial. LKFS (Loudness, K-weighted, relative to Full Scale) is equivalent to LUFS — both measure program loudness using the ITU-R BS.1770 algorithm, which accounts for how humans perceive volume over time rather than just peak levels.
The European equivalent is EBU R128 at -23 LUFS — one LUFS louder than the US standard, so region matters when delivering internationally.
Why Peak-Chasing Backfires
The old commercial audio strategy was simple: push levels as high as possible to sound louder than competing spots. Broadcast loudness normalization makes that approach actively counterproductive.
Normalization adjusts overall audio level to hit the -24 LKFS target, but it has no impact on dynamic range. That means an over-compressed mix won't sound louder on air. It will land at the same perceived loudness as everything else, with one critical difference:
- Crushed dynamics that flatten the emotional impact of music and VO
- Baked-in distortion that normalization cannot remove after the fact
- No loudness advantage despite the sacrifice in audio quality

The goal isn't maximum level — it's maximum clarity within the loudness budget.
Stems and Technical Deliverables
Stems (also called splits) are separate mix files for individual components, delivered alongside the full stereo mix:
- Dialogue/VO only
- Music only
- Sound effects only
Networks and agencies require stems because they enable versioning without a full re-mix — replacing music for a localized spot, adjusting VO for a regional cut, or making compliance adjustments years after the original session.
Standard technical specs for North American broadcast delivery (per XR, Corus, and ESPN specifications):
| Spec | Standard |
|---|---|
| Sample rate | 48 kHz |
| Bit depth | 16-bit or 24-bit PCM |
| Format | Stereo interleaved |
| Loudness target | -24 LKFS |
| Sync tone | 2-pop (1 kHz, 1 frame, 2 seconds before first frame) |
Confirm the exact requirements with the receiving network or dubbing house before export, as specs can vary by distributor.
The TV Commercial Audio Mixing Workflow
Pre-Production Through Picture Lock
Audio post begins before the mix session. Pre-production alignment on music direction, voiceover tone, and sound design approach prevents expensive revisions later. For companies like Blare Video, which manage commercial production from initial shoot through post-production, this alignment happens during the planning phase—when music licensing, voiceover casting, and sound design direction are addressed before a frame is shot.
After picture lock, the video editor exports an OMF or AAF file that transfers the edited timeline, including all audio regions and edits, into a DAW environment. Before the mix session begins, verify that the timecode frame rate in the audio session matches the video reference file exactly. A frame rate mismatch causes sync drift that compounds across the length of the spot.
The Mix Session
A typical commercial mix session follows this track organization:
- Voiceover/dialogue tracks — cleaned, edited, consistent room tone
- Music tracks — edited to fit the spot's timing, ideally in stems
- SFX tracks — placed to picture, level-set beneath music
- Ambience tracks — continuous room tone and environmental beds

Build to the -24 LKFS target using a proper loudness meter throughout, not just at the end. Check sync against picture at the start of the session and again after any significant edits.
Final Delivery and QC
The finished mix is delivered as:
- Full stereo mix
- Dialogue stem
- Music stem
- SFX stem
- 2-pop sync tone at the head of each file
Before sending to the network or dubbing house, run a translation check across multiple playback systems—studio monitors, a laptop speaker, and a small TV or phone. What sounds balanced on monitors can expose problems on cheaper speakers, particularly in the mid-range where dialogue and music compete. A final integrated LUFS reading confirms the mix meets broadcast spec before it leaves your hands.
Common Mistakes to Avoid
Three errors cause the majority of rejected or underperforming commercial mixes:
Over-compressing for perceived loudness — Broadcast normalization eliminates any loudness advantage while leaving the sonic damage intact: crushed dynamics, listener fatigue, distortion. Mix for quality at the target level, not to fight the normalization system.
Burying the voiceover under music — The most common brand-damaging error in commercial audio. When music levels aren't managed around VO lines, the message disappears. Use automation to duck music under every voiceover line — static fader positions won't hold the balance as delivery changes.
Skipping loudness QC before delivery — Submitting a mix without measuring integrated LUFS against the -24 LKFS target risks network rejection, delayed air dates, and downstream costs. Run a full loudness check with a proper ITU-R BS.1770 meter and review on multiple speaker systems before delivery.
Frequently Asked Questions
Do TVs automatically adjust volume for commercials?
Since the CALM Act's FCC rules took effect in 2012, loudness normalization happens at the broadcast transmission level, not in the TV itself. Broadcasters, cable operators, and MVPDs are required to normalize commercial audio to match surrounding programming, so viewers shouldn't need to touch the volume dial.
How do you make audio for TV commercials?
Commercial audio is built through post-production across three stages:
- Dialogue and voiceover are recorded, cleaned, and edited
- Music and sound effects are sourced and timed to picture
- Everything is mixed in a DAW to broadcast loudness standards
The finished product is delivered as a full stereo mix plus stems.
What is audio mixing?
Audio mixing combines individual tracks — dialogue, music, and sound effects — into a single balanced soundtrack. A mixer adjusts levels, EQ, compression, and panning so each element sits clearly in the mix and serves the intended listening experience.
What are stems in a TV commercial audio mix?
Stems (also called splits) are separate audio files for each mix component—dialogue-only, music-only, and SFX-only—delivered alongside the full stereo mix. They allow agencies and networks to make edits, localizations, or compliance adjustments without returning to a full re-mix session.
What loudness standard should TV commercials follow in the US?
US broadcast commercials must comply with ATSC A/85, targeting an integrated loudness of -24 LKFS/LUFS measured per ITU-R BS.1770. This standard is mandated by the CALM Act to ensure commercials are not perceived as louder than surrounding programming.


