Upload any video and let AI transcribe, analyse, and extract the most compelling moments — complete with captions, hashtags, and your watermark. Ready to post in minutes.
No manual editing required. The agent handles everything — from speech recognition to clip selection and social caption generation.
Drop any movie or video file directly from your browser, or select one already on the server. Chunked upload supports files up to 15 GB with zero data loss on refresh.
Whisper transcribes the audio while scene detection maps the structure. An LLM then picks the most engaging, spoiler-free moments that stand alone without full movie context.
Each clip is encoded with your watermark, merged with your end card, and packaged with AI-generated captions, descriptions, and 20 trending hashtags — ready for any platform.
Every step of the clip creation pipeline is automated, configurable, and production-ready.
LLM-powered clip picker avoids climax, twists, and plot reveals. Selects engaging moments from the safe 10–70% timeline window.
Generates a punchy hook caption, a 2–3 sentence description, and 15–20 trending hashtags for every clip. Download as JSON or ZIP.
Add text, logo, or movie-name watermarks. Auto-detects letterbox black bars and places overlays within the actual content area at the right scale.
Drop an end_merge_video.mp4 into the folder and every clip gets it appended —
auto-scaled to match the clip's resolution and frame rate.
Redis-backed sessions give each browser tab its own isolated state. Open the app in two tabs, process different videos simultaneously.
Three Whisper modes: CPU (base), GPU large-v3, and faster-whisper with real-time per-segment progress. CUDA memory is released after each job.
Plug in any provider for clip selection and caption generation. Or run fully offline with Ollama — no API key required.
Built for real video workflows. Handles edge cases like non-square pixel ratios, missing audio streams, Windows MAX_PATH limits, and long filenames.
Detects black bars and places overlays only within the visible content area.
End-card is automatically re-encoded to match the clip's exact resolution and frame rate.
Refresh the page at any step — progress, uploads, and results are fully restored.
Queue multiple videos and process them all simultaneously in background threads.
Open-source stack that runs on your own hardware — no cloud required.
State-of-the-art speech recognition for accurate transcription at any language.
Lossless extraction, filter-complex watermarking, and SAR-normalised concat.
Server-side session storage with 24-hour TTL keeps every tab's state independent.
Threaded background job runner, chunked uploads, and a clean REST API.
Open the app, upload your first video, and get AI-generated clips with captions in minutes. No account required.