> pipeline/stages
how it works
Six stages, fully automated. We run them on our infrastructure — you never install anything. Human oversight optional at each.
- 01 topic-research
Telegram messages, RSS feeds, and Tavily web research ingested and clustered by topic. AI picks the editorially strongest, most diverse cluster.
telegram apitavilygpt-4o-mini clustering - 02 script-generation
A LangGraph critique loop drafts the script, then self-critiques for banned modifiers, journalistic accuracy, and narrative arc. Prompts live in Django, editable via UI.
langgraphgpt-4.1critique loop (3 iterations) - 03 visual-selection
For each segment, 5 image candidates fetched from DuckDuckGo. gpt-4o-mini (vision) picks best fit given full narrative context.
duckduckgo imagesgpt-4o-mini visiondisk cache - 04 voice-synthesis
OpenAI TTS (tts-1-hd, onyx) generates audio per segment. ffmpeg strips leading silence, normalizes to EBU R128 (−16 LUFS).
openai tts-1-hdffmpeg loudnormsilenceremove - 05 video-assembly
Two paths: ffmpeg ken-burns slideshow for landscape long-form, or Remotion React compositions for vertical shorts.
ffmpegremotionreact compositions - 06 captions-and-delivery
Whisper transcribes audio to word-level SRT. Captions burned in via ffmpeg. The full production folder ships to your Google Drive or Dropbox — final landscape + vertical MP4s, full segmented script (JSON), word-level caption timings, per-segment images (with vision-model reasoning), per-segment voiceover audio (mastered + raw), thumbnail PNG, metadata JSON with suggested title / description / tags. You can re-cut, repurpose images on socials, translate captions, or just upload the MP4 — the assets are yours.
whisper verbose_jsongoogle drive / dropbox syncfull run-folder export
> founder beta — 5 spots
$297/mo locked forever for first 5 customers. Price goes to $497/mo after. Lifetime spot ($4,997 one-time + $97/mo) also available — only 3.
We reply within 48h via @yps_support_bot on Telegram. Email backup for invoicing + formal correspondence.