104 lines
4.2 KiB
Markdown
104 lines
4.2 KiB
Markdown
---
|
|
name: video-content-extractor
|
|
description: "Extract key frames from MP4 videos at configurable intervals, run Tesseract OCR, and generate structured Markdown reports with video metadata and timestamped text transcripts."
|
|
category: media-processing
|
|
risk: safe
|
|
source: community
|
|
source_repo: 274326424/video-content-extractor
|
|
source_type: community
|
|
date_added: "2026-06-06"
|
|
author: 274326424
|
|
tags: [video, ocr, ffmpeg, tesseract, frame-extraction, media]
|
|
tools: [codex]
|
|
---
|
|
|
|
# Video Content Extractor
|
|
|
|
## Overview
|
|
|
|
Automatically extracts key frames from MP4 video files at configurable time intervals, performs OCR text recognition on each frame, and generates a structured Markdown report. The report includes video metadata (duration, resolution, codecs) and frame-by-frame OCR transcripts with timestamp references.
|
|
|
|
This skill is designed for Codex CLI and requires FFmpeg and Tesseract OCR installed on the local machine.
|
|
|
|
## When to Use This Skill
|
|
|
|
- Use when you need to extract text content from video presentations, lectures, or screencasts.
|
|
- Use when you want to create searchable transcripts from video files without embedded subtitles.
|
|
- Use when you need to analyze video content programmatically and generate structured summaries.
|
|
- Use when the user asks to "read what is on screen" or "extract the content from this video."
|
|
|
|
## How It Works
|
|
|
|
### Step 1: Analyze Video Metadata
|
|
|
|
The skill uses ffprobe to extract video metadata: duration, resolution, frame rate, codec information, and file size.
|
|
|
|
### Step 2: Extract Key Frames
|
|
|
|
Using FFmpeg, the skill captures frames at the configured interval (default: every 30 seconds). Each frame is saved as a timestamped JPEG image.
|
|
|
|
### Step 3: OCR Text Recognition
|
|
|
|
Each extracted frame is processed by Tesseract OCR. If the default PSM mode returns no meaningful text, it falls back to fully automatic page segmentation.
|
|
|
|
### Step 4: Generate Markdown Report
|
|
|
|
All extracted data is assembled into a structured Markdown document.
|
|
|
|
## Examples
|
|
|
|
### Example 1: Basic Extraction
|
|
|
|
Agent prompt:
|
|
Use the video-content-extractor skill to extract content from lecture.mp4
|
|
|
|
Output generates lecture.md and lecture_frames/ directory.
|
|
|
|
### Example 2: Custom Interval
|
|
|
|
Parameters: video_path, output_dir, interval(seconds), lang
|
|
Extract every 60 seconds with English-only OCR:
|
|
python scripts/extract_video.py recording.mp4 ./output 60 eng
|
|
|
|
### Example 3: Bilingual Content
|
|
|
|
Extract with default Chinese + English OCR:
|
|
python scripts/extract_video.py lecture.mp4 . 15 chi_sim+eng
|
|
|
|
## Best Practices
|
|
|
|
- Use shorter intervals (10-15s) for fast-paced content with frequent text changes.
|
|
- Use longer intervals (30-60s) for presentation slides or slow lectures to reduce duplicate frames.
|
|
- For Chinese content, ensure Tesseract Chinese language pack is installed (chi_sim).
|
|
|
|
## Limitations
|
|
|
|
- Requires FFmpeg and Tesseract OCR to be installed and accessible via PATH.
|
|
- Tesseract OCR accuracy depends on video quality, text size, and font clarity.
|
|
- Does not extract audio or perform speech-to-text transcription.
|
|
- Frame extraction is time-based (not scene-change-based), which may produce near-duplicate frames.
|
|
- Large videos with short intervals can generate many frames - ensure sufficient disk space.
|
|
|
|
## Security and Safety Notes
|
|
|
|
- This skill only reads video files and writes extracted frames and Markdown reports.
|
|
- It does NOT send any data over the network - all processing is local.
|
|
- FFmpeg and Tesseract are invoked with fixed, pre-vetted arguments.
|
|
- The skill does not modify or delete the original video file.
|
|
|
|
## Common Pitfalls
|
|
|
|
- Problem: Tesseract returns garbled text
|
|
Solution: Ensure the correct language pack is installed. Run tesseract --list-langs to verify.
|
|
|
|
- Problem: FFmpeg fails with "not found"
|
|
Solution: Make sure FFmpeg is on PATH. Run ffmpeg -version to verify.
|
|
|
|
- Problem: OCR is slow on large videos
|
|
Solution: Increase the interval parameter to reduce frames processed.
|
|
|
|
## Related Skills
|
|
|
|
- @media-summarizer - For summarizing video content using visual and audio cues.
|
|
- @document-ocr - For OCR on static images or scanned documents without video processing.
|