FFmpeg Volume Filter eval=frame: A 4-Second Silent Bug
For four seconds at the start of every podcast episode, the intro music was not playing. RMS at minus infinity. Voice came in at second four, faded up from nothing. ffmpeg returned exit code zero. Nothing in the logs hinted at a problem. The bug was one keyword in the filter graph: eval=frame.
Quick Take
- The
volumefilter defaults toeval=once, evaluated one time at filter init- At init time, the
tvariable (frame time) is undefined, expressions return NaN, output goes silent- For any time-varying volume envelope, add
:eval=frameto re-evaluate per audio frame- The default is documented but easy to miss, especially when the filter “succeeds” silently
- Verify with
ffmpeg -ss N -t 0.5 -af astats -f null -at sample timestamps
The symptom that took two days to spot
The episode mixer had been working for months. After a refactor of the intro-music sidechain, the first 3.5 to 4 seconds of every output file went silent. Voice still landed at second four (where it was supposed to, after a delayed sidechain mix). Music never played at all. Output WAV looked normal, MP3 encoded cleanly, file size was right. The mixer printed [mix] ✓ Intro-Music because the ffmpeg command exited zero.
I caught it only because a listener noticed the missing intro and complained. Then I measured:
for t in 0 1 2 3 4 5; do
rms=$(ffmpeg -ss $t -t 0.5 -i episode.mp3 -af astats -f null - 2>&1 \
| grep "RMS level dB" | head -1 | awk -F: '{print $2}')
echo "t=${t}s: RMS=$rms"
done
# t=0s: RMS= -inf
# t=1s: RMS= -inf
# t=2s: RMS= -inf
# t=3s: RMS= -inf
# t=4s: RMS= -19.755297
# t=5s: RMS= -14.619059
Four seconds of perfect digital silence at the top of the file. The sidechain mix, the delay-and-amix logic, the limiter chain, all of them had executed. Something inside the filter graph was producing zero gain on the music input from t=0 to t≈4.
The expression that should have worked
The intro-music sidechain uses a volume envelope to ramp music: full volume for the first four seconds (solo intro), duck under the voice hook, swell back, fade out. That envelope is a piecewise expression:
vol_expr = (
f"if(lt(t,{s_ramp_start:.2f}),1.0," # solo intro: 1.0
f"if(lt(t,{s:.2f}),1.0-{drop:.4f}*(t-{s_ramp_start:.2f})/{r:.2f},"
f"if(lt(t,{s+h:.2f}),{d}," # ducked under voice
f"if(lt(t,{s+h+sw:.2f}),{d}+({1.0-d:.2f})*(t-{s+h:.2f})/{sw:.2f},"
f"if(lt(t,{s+h+sw+fade_s:.2f}),1.0*(1-(t-{s+h+sw:.2f})/{fade_s:.2f}),0)))))"
)
flt = f"[1:a]atrim=0:{total},aformat=sample_rates=48000:channel_layouts=stereo,volume='{vol_expr}'[m_base];..."
The expression itself is correct. For t < 3.6, returns 1.0, full music. For 3.6 < t < 4.0, ramps down to the duck level. For 4.0 < t < hook_end, holds at duck_base. And so on. I tested the expression with concrete values and the math is right.
But that math never executed. The filter applied gain zero from the very first sample.
The default that bit me, eval=once
Run the local ffmpeg help on the volume filter:
$ ffmpeg -h filter=volume
volume AVOptions:
volume <string> set volume adjustment expression (default "1.0")
eval <int> specify when to evaluate expressions (default once)
once 0 eval volume expression once
frame 1 eval volume expression per-frame
The (default once) is right there. The volume filter, by default, evaluates its expression one time at filter initialization, not per audio frame. At init time, the t variable (frame presentation time) is undefined; the expression evaluates to NaN; ffmpeg casts NaN to zero gain; every subsequent frame gets multiplied by zero. Output: silence. Exit code: zero. No warning.
The fix is one keyword:
flt = f"[1:a]atrim=0:{total},aformat=...,volume='{vol_expr}':eval=frame[m_base];..."
eval=frame forces re-evaluation per output frame, with t populated from the frame’s presentation time. The expression now does what its author intended.
Reproducer, before and after
I isolated the bug to confirm eval=frame was the only change needed. The reproducer uses pure-silence “voice” so no sidechain ducking happens; the music should play at full volume from second zero.
# Before: silent volume=expr without eval=frame
ffmpeg -y -i music.mp3 \
-af "atrim=0:70,aformat=sample_rates=48000:channel_layouts=stereo,volume='if(lt(t,3.6),1.0,0.5)'" \
/tmp/before.wav
# Measure RMS at t=0 to t=5:
for t in 0 1 2 3 4 5; do
ffmpeg -ss $t -t 0.5 -i /tmp/before.wav -af astats -f null - 2>&1 \
| grep "RMS level dB" | head -1
done
# All return -inf. Filter applied gain 0 throughout.
# After: same expression with :eval=frame
ffmpeg -y -i music.mp3 \
-af "atrim=0:70,aformat=sample_rates=48000:channel_layouts=stereo,volume='if(lt(t,3.6),1.0,0.5)':eval=frame" \
/tmp/after.wav
for t in 0 1 2 3 4 5; do
ffmpeg -ss $t -t 0.5 -i /tmp/after.wav -af astats -f null - 2>&1 \
| grep "RMS level dB" | head -1
done
# t=0..3: ~-22 dB (full music). t=4..5: ~-28 dB (half volume per the expression).
Three lines of git diff in mix_audio.py::mix_intro_sidechain. Zero dependencies. Done.
Why eval=once is even the default
The fast-path argument: for volume=0.5 (a literal numeric), evaluating once at init is faster than per-frame, and the most common volume-filter use is a literal gain. Per-frame evaluation costs one expression-tree walk per output frame. On a 48 kHz mono stream that is 48,000 evaluations per second, which is cheap but not free.
So ffmpeg defaults to fast for the common case. Time-varying expressions are the less common case, and the user is expected to opt in. The cost of getting it wrong is silent output that exits cleanly.
This is a documented footgun. The volume filter docs at ffmpeg.org/ffmpeg-filters.html#volume mention eval and its values, but the silent-output failure mode is not called out as a side effect. I read the docs and missed it. Local ffmpeg -h filter=volume does say (default once) next to eval. I missed that too. The lesson is generalizable: when an ffmpeg filter takes an expression with a time variable, check whether eval=frame is needed before trusting silent success.
The general lesson for ffmpeg expressions
A few filters in ffmpeg accept time-varying expressions. They all have a similar eval knob. The defaults vary, but the failure mode is the same: silent zero output, exit code zero. The list I have hit personally:
volume(this article): defaultonce, expressions withtneedeval=frameastreamselectandstreamselect: time-based switching expressions need expliciteval=framepan: matrix coefficients can be expressions; same pattern
In each case, the verification command is the same shape: measure the output at expected timestamps and check whether the filter actually did what you asked. Never trust silent ffmpeg success on a time-varying filter graph.
What to do
If you are using volume= with any expression containing t, add :eval=frame. If you are not sure whether your filter needs it, run a quick RMS sample at a few timestamps and compare to expected. The check takes 30 seconds and catches a class of silent failures that exit code zero will not.
What I Actually Use
- ffmpeg
6.1.1-3ubuntu5+esm7on Ubuntu 24.04 ARM64 (DGX Spark)volume='if(lt(t,3.6),1.0,...)':eval=framein the intro-music sidechain- RMS verification command in
scripts/mix_audio.pyafter every render- Companion fix for the global loudnorm leading-silence: see Part 4
Related in this series
This article is Part 3 of Voxtral Pipeline Discoveries (May 2026):
- Part 1: Voxtral 4B Open-Checkpoint: The Encoder is Gated. The architectural constraint behind this pipeline.
- Part 2: Voxtral Chunk Strategy. 30 to 38 percent render-time savings with whole-turn rendering.
- Part 3 (this article). The
volumefiltereval=framefootgun. - Part 4: Per-Segment Loudness for Multi-Speaker TTS. The companion
loudnormfootgun in the same pipeline.
ffmpeg volume filter, eval=once vs eval=frame
Time-varying volume expressions in audio mixing