Skip to content

Conversation

@yuhuan417
Copy link

Description

Add setpts=N reassigns the timestamp(PTS) of every frame to match its sequential frame number after -skip_frame, it removes the time gaps, making the stream continuous.

It tricks subsequent filters (like thumbnail or reverse) into treating the sparse frames as a standard, tightly packed video, preventing buffer overlows and segmentation faults.

Fixes #24254

How Has This Been Tested?

Run new ffmpeg command line in docker container to confirm there is no segment fault.
Run immich thumbnail job on missing files, no error log.
Re-run immich thumbnail job on missing files, no new job to run.
Run tests in [server/src/services/media.service.spec.ts, all passed.

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if applicable
  • I have no unrelated changes in the PR.
  • I have confirmed that any new dependencies are strictly necessary.
  • I have written tests for new code (if applicable)
  • I have followed naming conventions/patterns in the surrounding code
  • All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Please describe to which degree, if any, an LLM was used in creating this pull request.

Use gemini to help debug and explain ffmpeg parameters, modify code by hand.
...

@mertalev
Copy link
Member

mertalev commented Jan 6, 2026

Do you have an example video that fails with the previous command and succeeds with this?

@yuhuan417
Copy link
Author

Do you have an example video that fails with the previous command and succeeds with this?

Yes, specific example videos are attached in the issue #24254

Link to one of them:
https://github.com/user-attachments/assets/54ef1691-8f3c-41dd-89db-753ef82ba039

@yuhuan417
Copy link
Author

previous command and log:

# /usr/bin/ffmpeg -skip_frame nointra -sws_flags accurate_rnd+full_chroma_int -i "/myphoto/2025-05/jintong/2025-05-17 12.58.41.MOV" -y -fps_mode vfr -frames:v 1 -update 1 -v verbose -vf 'fps=12:start_time=0:eof_action=pass:round=down,thumbnail=12,select=gt(scene\,0.1)-eq(prev_selected_n\,n)+isnan(prev_selected_n)+gt(n\,20),trim=end_frame=2,reverse' /tmp/1.jpeg
ffmpeg version 7.1.1-Jellyfin Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 14 (Debian 14.2.0-19)
  configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto=auto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-opencl --enable-libdrm --enable-libxml2 --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libharfbuzz --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libsvtav1 --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=amd64 --enable-libshaderc --enable-libplacebo --enable-vulkan --enable-vaapi --enable-amf --enable-libvpl --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[aist#0:1/pcm_s16le @ 0x55bb284b5500] Guessed Channel Layout: mono
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/myphoto/2025-05/jintong/2025-05-17 12.58.41.MOV':
  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2025-05-17T04:59:15.000000Z
    com.apple.quicktime.location.accuracy.horizontal: 13.328273
    com.apple.quicktime.live-photo.auto: 1
    com.apple.quicktime.full-frame-rate-playback-intent: 0
    com.apple.quicktime.live-photo.vitality-score: 1.000000
    com.apple.quicktime.live-photo.vitality-scoring-version: 0
    com.apple.quicktime.location.ISO6709: +40.0291+116.3122+055.303/
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: iPhone 12 Pro
    com.apple.quicktime.software: 18.4.1
    com.apple.quicktime.creationdate: 2025-05-17T12:58:41+0800
    com.apple.quicktime.content.identifier: 186B093E-F74C-413B-B2F4-F617EBAD320B
  Duration: 00:00:02.27, start: 0.000000, bitrate: 11880 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main), 1 reference frame (hvc1 / 0x31637668), yuvj420p(pc, smpte170m/smpte432/bt709, left), 1920x1440, 11128 kb/s, 23.81 fps, 29.97 tbr, 600 tbn (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Video
        vendor_id       : [0][0][0][0]
        encoder         : HEVC
      Side data:
        Frame cropping: 88/88/66/66
  Stream #0:1[0x2](und): Audio: pcm_s16le (lpcm / 0x6D63706C), 44100 Hz, mono, s16, 705 kb/s (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Audio
        vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](und): Data: none (mebx / 0x7862656D), 0 kb/s (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Metadata
  Stream #0:3[0x4](und): Data: none (mebx / 0x7862656D), 27 kb/s (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Metadata
  Stream #0:4[0x5](und): Data: none (mebx / 0x7862656D), 43 kb/s (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Metadata
[out#0/image2 @ 0x55bb284e1240] No explicit maps, mapping streams automatically...
[vost#0:0/mjpeg @ 0x55bb284b5dc0] Created video stream from input stream 0:0
[Parsed_thumbnail_1 @ 0x55bb284b4300] batch size: 12 frames
[Parsed_fps_0 @ 0x55bb284d8280] 0 frames in, 0 frames out; 0 frames dropped, 0 frames duplicated.
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native))
[vost#0:0/mjpeg @ 0x55bb284b5dc0] Starting thread...
[vf#0:0 @ 0x55bb284c2cc0] Starting thread...
[vist#0:0/hevc @ 0x55bb284b5380] [dec:hevc @ 0x55bb284e03c0] Starting thread...
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x55bb284b0240] Starting thread...
Press [q] to stop, [?] for help
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x55bb284b0240] EOF while reading input
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x55bb284b0240] Terminating thread with return code 0 (success)
[vist#0:0/hevc @ 0x55bb284b5380] [dec:hevc @ 0x55bb284e03c0] Decoder thread received EOF packet
[vist#0:0/hevc @ 0x55bb284b5380] [dec:hevc @ 0x55bb284e03c0] Decoder returned EOF, finishing
[Parsed_thumbnail_1 @ 0x7fe924001200] batch size: 12 frames
[graph -1 input from stream 0:0 @ 0x7fe92400ef40] w:1920 h:1440 pixfmt:yuvj420p tb:1/600 fr:30000/1001 sar:0/1 csp:smpte170m range:pc
[crop @ 0x7fe92400f740] w:1920 h:1440 sar:0/1 -> w:1744 h:1308 sar:0/1
[Parsed_fps_0 @ 0x7fe924001300] Set first pts to (in:0 out:0) from start time 0.000000
[Parsed_fps_0 @ 0x7fe924001300] fps=12/1
[graph -1 input from stream 0:0 @ 0x7fe92400ef40] video frame properties congruent with link at pts_time: 0
[vist#0:0/hevc @ 0x55bb284b5380] [dec:hevc @ 0x55bb284e03c0] Terminating thread with return code 0 (success)
[Parsed_thumbnail_1 @ 0x7fe924001200] frame id #0 (pts_time=0.000000) selected from a set of 12 images
[Parsed_thumbnail_1 @ 0x7fe924001200] frame id #0 (pts_time=1.000000) selected from a set of 1 images
Segmentation fault

new command and log

# /usr/bin/ffmpeg -skip_frame nointra -sws_flags accurate_rnd+full_chroma_int -i "/myphoto/2025-05/jintong/2025-05-17 12.58.41.MOV" -y -fps_mode vfr -frames:v 1 -update 1 -v verbose -vf 'setpts=N,fps=12:start_time=0:eof_action=pass:round=down,thumbnail=12,select=gt(scene\,0.1)-eq(prev_selected_n\,n)+isnan(prev_selected_n)+gt(n\,20),trim=end_frame=2,reverse' /tmp/1.jpeg
ffmpeg version 7.1.1-Jellyfin Copyright (c) 2000-2025 the FFmpeg developers
  built with gcc 14 (Debian 14.2.0-19)
  configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto=auto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-opencl --enable-libdrm --enable-libxml2 --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libharfbuzz --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libsvtav1 --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=amd64 --enable-libshaderc --enable-libplacebo --enable-vulkan --enable-vaapi --enable-amf --enable-libvpl --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
  libavutil      59. 39.100 / 59. 39.100
  libavcodec     61. 19.101 / 61. 19.101
  libavformat    61.  7.100 / 61.  7.100
  libavdevice    61.  3.100 / 61.  3.100
  libavfilter    10.  4.100 / 10.  4.100
  libswscale      8.  3.100 /  8.  3.100
  libswresample   5.  3.100 /  5.  3.100
  libpostproc    58.  3.100 / 58.  3.100
[aist#0:1/pcm_s16le @ 0x557a4b0dc500] Guessed Channel Layout: mono
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/myphoto/2025-05/jintong/2025-05-17 12.58.41.MOV':
  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    creation_time   : 2025-05-17T04:59:15.000000Z
    com.apple.quicktime.location.accuracy.horizontal: 13.328273
    com.apple.quicktime.live-photo.auto: 1
    com.apple.quicktime.full-frame-rate-playback-intent: 0
    com.apple.quicktime.live-photo.vitality-score: 1.000000
    com.apple.quicktime.live-photo.vitality-scoring-version: 0
    com.apple.quicktime.location.ISO6709: +40.0291+116.3122+055.303/
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: iPhone 12 Pro
    com.apple.quicktime.software: 18.4.1
    com.apple.quicktime.creationdate: 2025-05-17T12:58:41+0800
    com.apple.quicktime.content.identifier: 186B093E-F74C-413B-B2F4-F617EBAD320B
  Duration: 00:00:02.27, start: 0.000000, bitrate: 11880 kb/s
  Stream #0:0[0x1](und): Video: hevc (Main), 1 reference frame (hvc1 / 0x31637668), yuvj420p(pc, smpte170m/smpte432/bt709, left), 1920x1440, 11128 kb/s, 23.81 fps, 29.97 tbr, 600 tbn (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Video
        vendor_id       : [0][0][0][0]
        encoder         : HEVC
      Side data:
        Frame cropping: 88/88/66/66
  Stream #0:1[0x2](und): Audio: pcm_s16le (lpcm / 0x6D63706C), 44100 Hz, mono, s16, 705 kb/s (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Audio
        vendor_id       : [0][0][0][0]
  Stream #0:2[0x3](und): Data: none (mebx / 0x7862656D), 0 kb/s (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Metadata
  Stream #0:3[0x4](und): Data: none (mebx / 0x7862656D), 27 kb/s (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Metadata
  Stream #0:4[0x5](und): Data: none (mebx / 0x7862656D), 43 kb/s (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Metadata
[out#0/image2 @ 0x557a4b108240] No explicit maps, mapping streams automatically...
[vost#0:0/mjpeg @ 0x557a4b0dcdc0] Created video stream from input stream 0:0
[Parsed_thumbnail_2 @ 0x557a4b0da0c0] batch size: 12 frames
[Parsed_fps_1 @ 0x557a4b0db300] 0 frames in, 0 frames out; 0 frames dropped, 0 frames duplicated.
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> mjpeg (native))
[vost#0:0/mjpeg @ 0x557a4b0dcdc0] Starting thread...
[vf#0:0 @ 0x557a4b0e9cc0] Starting thread...
[vist#0:0/hevc @ 0x557a4b0dc380] [dec:hevc @ 0x557a4b0fe440] Starting thread...
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x557a4b0d7240] Starting thread...
Press [q] to stop, [?] for help
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x557a4b0d7240] EOF while reading input
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x557a4b0d7240] Terminating thread with return code 0 (success)
[vist#0:0/hevc @ 0x557a4b0dc380] [dec:hevc @ 0x557a4b0fe440] Decoder thread received EOF packet
[vist#0:0/hevc @ 0x557a4b0dc380] [dec:hevc @ 0x557a4b0fe440] Decoder returned EOF, finishing
[Parsed_thumbnail_2 @ 0x7f94e80012c0] batch size: 12 frames
[graph -1 input from stream 0:0 @ 0x7f94e800f800] w:1920 h:1440 pixfmt:yuvj420p tb:1/600 fr:30000/1001 sar:0/1 csp:smpte170m range:pc
[crop @ 0x7f94e8010000] w:1920 h:1440 sar:0/1 -> w:1744 h:1308 sar:0/1
[crop @ 0x7f94e8010000] TB:0.001667 FRAME_RATE:29.970030 SAMPLE_RATE:nan
[Parsed_fps_1 @ 0x7f94e80026c0] Set first pts to (in:0 out:0) from start time 0.000000
[Parsed_fps_1 @ 0x7f94e80026c0] fps=12/1
[graph -1 input from stream 0:0 @ 0x7f94e800f800] video frame properties congruent with link at pts_time: 0
[vist#0:0/hevc @ 0x557a4b0dc380] [dec:hevc @ 0x557a4b0fe440] Terminating thread with return code 0 (success)
[Parsed_thumbnail_2 @ 0x7f94e80012c0] frame id #0 (pts_time=0.000000) selected from a set of 1 images
Output #0, image2, to '/tmp/1.jpeg':
  Metadata:
    major_brand     : qt  
    minor_version   : 0
    compatible_brands: qt  
    com.apple.quicktime.content.identifier: 186B093E-F74C-413B-B2F4-F617EBAD320B
    com.apple.quicktime.location.accuracy.horizontal: 13.328273
    com.apple.quicktime.live-photo.auto: 1
    com.apple.quicktime.full-frame-rate-playback-intent: 0
    com.apple.quicktime.live-photo.vitality-score: 1.000000
    com.apple.quicktime.live-photo.vitality-scoring-version: 0
    com.apple.quicktime.location.ISO6709: +40.0291+116.3122+055.303/
    com.apple.quicktime.make: Apple
    com.apple.quicktime.model: iPhone 12 Pro
    com.apple.quicktime.software: 18.4.1
    com.apple.quicktime.creationdate: 2025-05-17T12:58:41+0800
    encoder         : Lavf61.7.100
  Stream #0:0(und): Video: mjpeg, 1 reference frame, yuvj420p(pc, smpte170m/smpte432/bt709, progressive, left), 1744x1308, q=2-31, 200 kb/s, 12 fps, 12 tbn (default)
      Metadata:
        creation_time   : 2025-05-17T04:59:15.000000Z
        handler_name    : Core Media Video
        vendor_id       : [0][0][0][0]
        encoder         : Lavc61.19.101 mjpeg
      Side data:
        cpb: bitrate max/min/avg: 0/0/200000 buffer size: 0 vbv_delay: N/A
[out#0/image2 @ 0x557a4b108240] Starting thread...
[vf#0:0 @ 0x557a4b0e9cc0] All consumers returned EOF
[Parsed_fps_1 @ 0x7f94e80026c0] 2 frames in, 1 frames out; 1 frames dropped, 0 frames duplicated.
[vf#0:0 @ 0x557a4b0e9cc0] Terminating thread with return code 0 (success)
[AVIOContext @ 0x7f94e8001e00] Statistics: 57648 bytes written, 0 seeks, 1 writeouts
[vost#0:0/mjpeg @ 0x557a4b0dcdc0] Encoder thread received EOF
[vost#0:0/mjpeg @ 0x557a4b0dcdc0] Terminating thread with return code 0 (success)
[out#0/image2 @ 0x557a4b108240] All streams finished
[out#0/image2 @ 0x557a4b108240] Terminating thread with return code 0 (success)
[out#0/image2 @ 0x557a4b108240] Output file #0 (/tmp/1.jpeg):
[out#0/image2 @ 0x557a4b108240]   Output stream #0:0 (video): 1 frames encoded; 1 packets muxed (57648 bytes); 
[out#0/image2 @ 0x557a4b108240]   Total: 1 packets (57648 bytes) muxed
[out#0/image2 @ 0x557a4b108240] video:56KiB audio:0KiB subtitle:0KiB other streams:0KiB global headers:0KiB muxing overhead: unknown
frame=    1 fps=0.0 q=7.5 Lsize=N/A time=00:00:00.08 bitrate=N/A speed=0.403x    
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x557a4b0d7240] Input file #0 (/myphoto/2025-05/jintong/2025-05-17 12.58.41.MOV):
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x557a4b0d7240]   Input stream #0:0 (video): 54 packets read (3155427 bytes); 2 frames decoded; 0 decode errors; 
[in#0/mov,mp4,m4a,3gp,3g2,mj2 @ 0x557a4b0d7240]   Total: 54 packets (3155427 bytes) demuxed
[AVIOContext @ 0x557a4b0dfc80] Statistics: 3322595 bytes read, 3 seeks

@DaCHack
Copy link

DaCHack commented Jan 17, 2026

@mertalev Did you have the Chance to Review the Code? I have a similar issue and Hope this PR could solve it.

@yuhuan417 yuhuan417 changed the title feat: add setpts=N filter before fps=12 in thumbnail generation fix: add setpts=N filter before fps=12 in thumbnail generation Jan 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SIGSEGV crash in ffmpeg when processing short videos for thumbnail generation

4 participants