The intro music wasn't playing for the first four seconds of every podcast episode. RMS at minus infinity. The fix was one keyword, eval=frame.

FFmpeg Volume Filter eval=frame: A 4-Second Silent Bug

For four seconds at the start of every podcast episode, the intro music was not playing. RMS at minus infinity. Voice came in at second four, faded up from nothing. ffmpeg returned exit code zero. Nothing in the logs hinted at a problem. The bug was one keyword in the filter graph: eval=frame.

Quick Take

  • The volume filter defaults to eval=once, evaluated one time at filter init
  • At init time, the t variable (frame time) is undefined, expressions return NaN, output goes silent
  • For any time-varying volume envelope, add :eval=frame to re-evaluate per audio frame
  • The default is documented but easy to miss, especially when the filter “succeeds” silently
  • Verify with ffmpeg -ss N -t 0.5 -af astats -f null - at sample timestamps

The symptom that took two days to spot

The episode mixer had been working for months. After a refactor of the intro-music sidechain, the first 3.5 to 4 seconds of every output file went silent. Voice still landed at second four (where it was supposed to, after a delayed sidechain mix). Music never played at all. Output WAV looked normal, MP3 encoded cleanly, file size was right. The mixer printed [mix] ✓ Intro-Music because the ffmpeg command exited zero.

I caught it only because a listener noticed the missing intro and complained. Then I measured:

for t in 0 1 2 3 4 5; do
  rms=$(ffmpeg -ss $t -t 0.5 -i episode.mp3 -af astats -f null - 2>&1 \
        | grep "RMS level dB" | head -1 | awk -F: '{print $2}')
  echo "t=${t}s: RMS=$rms"
done
# t=0s: RMS= -inf
# t=1s: RMS= -inf
# t=2s: RMS= -inf
# t=3s: RMS= -inf
# t=4s: RMS= -19.755297
# t=5s: RMS= -14.619059

Four seconds of perfect digital silence at the top of the file. The sidechain mix, the delay-and-amix logic, the limiter chain, all of them had executed. Something inside the filter graph was producing zero gain on the music input from t=0 to t≈4.

The expression that should have worked

The intro-music sidechain uses a volume envelope to ramp music: full volume for the first four seconds (solo intro), duck under the voice hook, swell back, fade out. That envelope is a piecewise expression:

vol_expr = (
    f"if(lt(t,{s_ramp_start:.2f}),1.0,"           # solo intro: 1.0
    f"if(lt(t,{s:.2f}),1.0-{drop:.4f}*(t-{s_ramp_start:.2f})/{r:.2f},"
    f"if(lt(t,{s+h:.2f}),{d},"                    # ducked under voice
    f"if(lt(t,{s+h+sw:.2f}),{d}+({1.0-d:.2f})*(t-{s+h:.2f})/{sw:.2f},"
    f"if(lt(t,{s+h+sw+fade_s:.2f}),1.0*(1-(t-{s+h+sw:.2f})/{fade_s:.2f}),0)))))"
)
flt = f"[1:a]atrim=0:{total},aformat=sample_rates=48000:channel_layouts=stereo,volume='{vol_expr}'[m_base];..."

The expression itself is correct. For t < 3.6, returns 1.0, full music. For 3.6 < t < 4.0, ramps down to the duck level. For 4.0 < t < hook_end, holds at duck_base. And so on. I tested the expression with concrete values and the math is right.

But that math never executed. The filter applied gain zero from the very first sample.

The default that bit me, eval=once

Run the local ffmpeg help on the volume filter:

$ ffmpeg -h filter=volume
volume AVOptions:
   volume   <string>  set volume adjustment expression (default "1.0")
   eval     <int>     specify when to evaluate expressions (default once)
     once    0        eval volume expression once
     frame   1        eval volume expression per-frame

The (default once) is right there. The volume filter, by default, evaluates its expression one time at filter initialization, not per audio frame. At init time, the t variable (frame presentation time) is undefined; the expression evaluates to NaN; ffmpeg casts NaN to zero gain; every subsequent frame gets multiplied by zero. Output: silence. Exit code: zero. No warning.

The fix is one keyword:

flt = f"[1:a]atrim=0:{total},aformat=...,volume='{vol_expr}':eval=frame[m_base];..."

eval=frame forces re-evaluation per output frame, with t populated from the frame’s presentation time. The expression now does what its author intended.

Reproducer, before and after

I isolated the bug to confirm eval=frame was the only change needed. The reproducer uses pure-silence “voice” so no sidechain ducking happens; the music should play at full volume from second zero.

# Before: silent volume=expr without eval=frame
ffmpeg -y -i music.mp3 \
  -af "atrim=0:70,aformat=sample_rates=48000:channel_layouts=stereo,volume='if(lt(t,3.6),1.0,0.5)'" \
  /tmp/before.wav

# Measure RMS at t=0 to t=5:
for t in 0 1 2 3 4 5; do
  ffmpeg -ss $t -t 0.5 -i /tmp/before.wav -af astats -f null - 2>&1 \
    | grep "RMS level dB" | head -1
done
# All return -inf. Filter applied gain 0 throughout.
# After: same expression with :eval=frame
ffmpeg -y -i music.mp3 \
  -af "atrim=0:70,aformat=sample_rates=48000:channel_layouts=stereo,volume='if(lt(t,3.6),1.0,0.5)':eval=frame" \
  /tmp/after.wav

for t in 0 1 2 3 4 5; do
  ffmpeg -ss $t -t 0.5 -i /tmp/after.wav -af astats -f null - 2>&1 \
    | grep "RMS level dB" | head -1
done
# t=0..3: ~-22 dB (full music). t=4..5: ~-28 dB (half volume per the expression).

Three lines of git diff in mix_audio.py::mix_intro_sidechain. Zero dependencies. Done.

Why eval=once is even the default

The fast-path argument: for volume=0.5 (a literal numeric), evaluating once at init is faster than per-frame, and the most common volume-filter use is a literal gain. Per-frame evaluation costs one expression-tree walk per output frame. On a 48 kHz mono stream that is 48,000 evaluations per second, which is cheap but not free.

So ffmpeg defaults to fast for the common case. Time-varying expressions are the less common case, and the user is expected to opt in. The cost of getting it wrong is silent output that exits cleanly.

This is a documented footgun. The volume filter docs at ffmpeg.org/ffmpeg-filters.html#volume mention eval and its values, but the silent-output failure mode is not called out as a side effect. I read the docs and missed it. Local ffmpeg -h filter=volume does say (default once) next to eval. I missed that too. The lesson is generalizable: when an ffmpeg filter takes an expression with a time variable, check whether eval=frame is needed before trusting silent success.

The general lesson for ffmpeg expressions

A few filters in ffmpeg accept time-varying expressions. They all have a similar eval knob. The defaults vary, but the failure mode is the same: silent zero output, exit code zero. The list I have hit personally:

In each case, the verification command is the same shape: measure the output at expected timestamps and check whether the filter actually did what you asked. Never trust silent ffmpeg success on a time-varying filter graph.

What to do

If you are using volume= with any expression containing t, add :eval=frame. If you are not sure whether your filter needs it, run a quick RMS sample at a few timestamps and compare to expected. The check takes 30 seconds and catches a class of silent failures that exit code zero will not.

What I Actually Use

  • ffmpeg 6.1.1-3ubuntu5+esm7 on Ubuntu 24.04 ARM64 (DGX Spark)
  • volume='if(lt(t,3.6),1.0,...)':eval=frame in the intro-music sidechain
  • RMS verification command in scripts/mix_audio.py after every render
  • Companion fix for the global loudnorm leading-silence: see Part 4

This article is Part 3 of Voxtral Pipeline Discoveries (May 2026):

Flow

ffmpeg volume filter, eval=once vs eval=frame

Time-varying volume expressions in audio mixing

1
Symptom First 4 seconds silent, RMS minus infinity
2
Filter graph volume='if(lt(t,3.6),1.0,...)' with t variable
3
Default behavior eval=once, expression evaluated at filter init
4
Root cause t undefined at init, NaN cast to zero gain
5
Fix Add :eval=frame, re-evaluate per audio frame
Illustration: FFmpeg Volume Filter eval=frame: A 4-Second Silent Bug