FFmpeg Volume Filter eval=frame: A 4-Second Silent Bug

May 7, 2026 6 min read

For four seconds at the start of every podcast episode, the intro music was not playing. RMS at minus infinity. Voice came in at second four, faded up from nothing. ffmpeg returned exit code zero. Nothing in the logs hinted at a problem. The bug was one keyword in the filter graph: eval=frame.

Quick Take

The volume filter defaults to eval=once, evaluated one time at filter init

At init time, the t variable (frame time) is undefined, expressions return NaN, output goes silent

For any time-varying volume envelope, add :eval=frame to re-evaluate per audio frame

The default is documented but easy to miss, especially when the filter “succeeds” silently

Verify with ffmpeg -ss N -t 0.5 -af astats -f null - at sample timestamps

The symptom that took two days to spot

The episode mixer had been working for months. After a refactor of the intro-music sidechain, the first 3.5 to 4 seconds of every output file went silent. Voice still landed at second four (where it was supposed to, after a delayed sidechain mix). Music never played at all. Output WAV looked normal, MP3 encoded cleanly, file size was right. The mixer printed [mix] ✓ Intro-Music because the ffmpeg command exited zero.

I caught it only because a listener noticed the missing intro and complained. Then I measured:

for t in 0 1 2 3 4 5; do
  rms=$(ffmpeg -ss $t -t 0.5 -i episode.mp3 -af astats -f null - 2>&1 \
        | grep "RMS level dB" | head -1 | awk -F: '{print $2}')
  echo "t=${t}s: RMS=$rms"
done
# t=0s: RMS= -inf
# t=1s: RMS= -inf
# t=2s: RMS= -inf
# t=3s: RMS= -inf
# t=4s: RMS= -19.755297
# t=5s: RMS= -14.619059

Four seconds of perfect digital silence at the top of the file. The sidechain mix, the delay-and-amix logic, the limiter chain, all of them had executed. Something inside the filter graph was producing zero gain on the music input from t=0 to t≈4.

The expression that should have worked

The intro-music sidechain uses a volume envelope to ramp music: full volume for the first four seconds (solo intro), duck under the voice hook, swell back, fade out. That envelope is a piecewise expression:

vol_expr = (
    f"if(lt(t,{s_ramp_start:.2f}),1.0,"           # solo intro: 1.0
    f"if(lt(t,{s:.2f}),1.0-{drop:.4f}*(t-{s_ramp_start:.2f})/{r:.2f},"
    f"if(lt(t,{s+h:.2f}),{d},"                    # ducked under voice
    f"if(lt(t,{s+h+sw:.2f}),{d}+({1.0-d:.2f})*(t-{s+h:.2f})/{sw:.2f},"
    f"if(lt(t,{s+h+sw+fade_s:.2f}),1.0*(1-(t-{s+h+sw:.2f})/{fade_s:.2f}),0)))))"
)
flt = f"[1:a]atrim=0:{total},aformat=sample_rates=48000:channel_layouts=stereo,volume='{vol_expr}'[m_base];..."

The expression itself is correct. For t < 3.6, returns 1.0, full music. For 3.6 < t < 4.0, ramps down to the duck level. For 4.0 < t < hook_end, holds at duck_base. And so on. I tested the expression with concrete values and the math is right.

But that math never executed. The filter applied gain zero from the very first sample.

The default that bit me, eval=once

Run the local ffmpeg help on the volume filter:

$ ffmpeg -h filter=volume
volume AVOptions:
   volume   <string>  set volume adjustment expression (default "1.0")
   eval     <int>     specify when to evaluate expressions (default once)
     once    0        eval volume expression once
     frame   1        eval volume expression per-frame

The (default once) is right there. The volume filter, by default, evaluates its expression one time at filter initialization, not per audio frame. At init time, the t variable (frame presentation time) is undefined; the expression evaluates to NaN; ffmpeg casts NaN to zero gain; every subsequent frame gets multiplied by zero. Output: silence. Exit code: zero. No warning.

The fix is one keyword:

flt = f"[1:a]atrim=0:{total},aformat=...,volume='{vol_expr}':eval=frame[m_base];..."

eval=frame forces re-evaluation per output frame, with t populated from the frame’s presentation time. The expression now does what its author intended.

Reproducer, before and after

I isolated the bug to confirm eval=frame was the only change needed. The reproducer uses pure-silence “voice” so no sidechain ducking happens; the music should play at full volume from second zero.

# Before: silent volume=expr without eval=frame
ffmpeg -y -i music.mp3 \
  -af "atrim=0:70,aformat=sample_rates=48000:channel_layouts=stereo,volume='if(lt(t,3.6),1.0,0.5)'" \
  /tmp/before.wav

# Measure RMS at t=0 to t=5:
for t in 0 1 2 3 4 5; do
  ffmpeg -ss $t -t 0.5 -i /tmp/before.wav -af astats -f null - 2>&1 \
    | grep "RMS level dB" | head -1
done
# All return -inf. Filter applied gain 0 throughout.

# After: same expression with :eval=frame
ffmpeg -y -i music.mp3 \
  -af "atrim=0:70,aformat=sample_rates=48000:channel_layouts=stereo,volume='if(lt(t,3.6),1.0,0.5)':eval=frame" \
  /tmp/after.wav

for t in 0 1 2 3 4 5; do
  ffmpeg -ss $t -t 0.5 -i /tmp/after.wav -af astats -f null - 2>&1 \
    | grep "RMS level dB" | head -1
done
# t=0..3: ~-22 dB (full music). t=4..5: ~-28 dB (half volume per the expression).

Three lines of git diff in mix_audio.py::mix_intro_sidechain. Zero dependencies. Done.

Why eval=once is even the default

The fast-path argument: for volume=0.5 (a literal numeric), evaluating once at init is faster than per-frame, and the most common volume-filter use is a literal gain. Per-frame evaluation costs one expression-tree walk per output frame. On a 48 kHz mono stream that is 48,000 evaluations per second, which is cheap but not free.

So ffmpeg defaults to fast for the common case. Time-varying expressions are the less common case, and the user is expected to opt in. The cost of getting it wrong is silent output that exits cleanly.

This is a documented footgun. The volume filter docs at ffmpeg.org/ffmpeg-filters.html#volume mention eval and its values, but the silent-output failure mode is not called out as a side effect. I read the docs and missed it. Local ffmpeg -h filter=volume does say (default once) next to eval. I missed that too. The lesson is generalizable: when an ffmpeg filter takes an expression with a time variable, check whether eval=frame is needed before trusting silent success.

The general lesson for ffmpeg expressions

A few filters in ffmpeg accept time-varying expressions. They all have a similar eval knob. The defaults vary, but the failure mode is the same: silent zero output, exit code zero. The list I have hit personally:

volume (this article): default once, expressions with t need eval=frame
astreamselect and streamselect: time-based switching expressions need explicit eval=frame
pan: matrix coefficients can be expressions; same pattern

In each case, the verification command is the same shape: measure the output at expected timestamps and check whether the filter actually did what you asked. Never trust silent ffmpeg success on a time-varying filter graph.

What to do

If you are using volume= with any expression containing t, add :eval=frame. If you are not sure whether your filter needs it, run a quick RMS sample at a few timestamps and compare to expected. The check takes 30 seconds and catches a class of silent failures that exit code zero will not.

What I Actually Use

ffmpeg 6.1.1-3ubuntu5+esm7 on Ubuntu 24.04 ARM64 (DGX Spark)

volume='if(lt(t,3.6),1.0,...)':eval=frame in the intro-music sidechain

RMS verification command in scripts/mix_audio.py after every render

Companion fix for the global loudnorm leading-silence: see Part 4

This article is Part 3 of Voxtral Pipeline Discoveries (May 2026):

Part 1: Voxtral 4B Open-Checkpoint: The Encoder is Gated. The architectural constraint behind this pipeline.
Part 2: Voxtral Chunk Strategy. 30 to 38 percent render-time savings with whole-turn rendering.
Part 3 (this article). The volume filter eval=frame footgun.
Part 4: Per-Segment Loudness for Multi-Speaker TTS. The companion loudnorm footgun in the same pipeline.

Flow

ffmpeg volume filter, eval=once vs eval=frame

Time-varying volume expressions in audio mixing

Symptom First 4 seconds silent, RMS minus infinity

Filter graph volume='if(lt(t,3.6),1.0,...)' with t variable

Default behavior eval=once, expression evaluated at filter init

Root cause t undefined at init, NaN cast to zero gain

Fix Add :eval=frame, re-evaluate per audio frame

	Today	7d	30d	All-time
Unique readers	—	—	—	—
Page views	—	—	—	—