Dubbed or additional voices layered over picture — multiple tracks in mix for narration, thoughts, or crowd ambient. Demands clean studio isolation.
When voice-over stacking, you layer multiple audio tracks on top of each other—a technique that becomes crucial in editing and the final mix. You don't just have one voice over the image, but several simultaneously: a character's inner thoughts, documentary narration, ambient voices from a crowd recorded separately. Each track runs in isolation, is mixed individually, and then combined into the final stem. This only works if each voice is recorded cleanly—without room tone bleed, without interfering noises that overlap.
The biggest challenge lies in the studio setup. You need completely dry acoustics for each recording so that diffuse room tones don't add up later and turn into a muddy soup. Many projects make the mistake here: they record all voices one after another in the same studio and wonder why the mix later sounds like a cocktail party effect. Better: work for several days with different voice actors in different rooms, or at least vary the distances to the wall. This way, you achieve acoustic differentiation that later adds depth to the overall picture. In the mix, you then avoid everything sounding flat and blurry.
Practically, this means in editing: each voice-over track gets its own track. You edit them independently, position pauses, and vary the volume envelope for drama or focus. For an inner thought layer, for example: set it lower when visual action dominates; raise it when visual stillness creates time for introspection. The documentary narration often sits higher in the mix because it primarily carries information. Ambient voices—a murmuring crowd behind a scene—are subtly placed low, as a presence layer. In the final stem mix, all tracks are then combined into a voice-over stem, which the dub mix takes over one-to-one.
A practical example: a documentary about a detective. You have his inner thoughts voice-over, the narration of a third person, and quiet functional sounds from a police office in the background—all three layers on top of each other. Each has its own spatiality, its own depth, but none competes with the others for space. You only achieve this through consistent isolation during recording and intelligent panning, reverb, and EQ design in the dub. Sloppy recordings without room control ensure that three voices later sound like a dirty, incomprehensible soup.