npm.io
1.0.9 • Published 4 weeks agoCLI

@optima-chat/video-translate-tools

Licence
MIT
Version
1.0.9
Deps
0
Size
789 kB
Vulns
0
Weekly
0

@optima-chat/video-translate-tools

Path-B styled subtitle rendering + BGM ducking for HeyGen-translated videos. Pairs with the gen video-translate CLI from @optima-chat/optima-gen-cli.

gen video-translate --video-url $URL --lang $LANG -o ./out
curl <caption_url> -o ./out/caption.srt
video-translate render-ass --srt ./out/caption.srt --lang $TAG --out ./out/subs.ass
video-translate mux        --raw ./out/translate_*.mp4 --ass ./out/subs.ass [--bgm ./bgm.wav] -o ./final.mp4

Install (in optima-ai-shell Dockerfile)

RUN apt-get install -y fonts-noto-core && \
    npm install -g @optima-chat/video-translate-tools@latest && \
    cp -r $(video-translate fonts-dir)/. /usr/share/fonts/video-translate/ && \
    fc-cache -f

CLI Reference

render-ass

Parse SRT → ASS path-B styled subtitles.

Flag Required Description
--srt <path> HeyGen SRT file (BOM auto-stripped)
--lang <en|ms|vi|th> Target lang. Determines font + max chars per line
--out <path> Output ASS path
--translations <path> Save / read JSON intermediate. If file exists, used as-is; user can hand-edit to add **word** for KW highlight
--style <name> Subtitle style preset (default classic). See Styles. Unknown value warns + falls back to classic
mux

Burn ASS onto HeyGen mp4. Optional BGM ducking.

Flag Required Description
--raw <path> HeyGen-output mp4 (has lip-synced voice)
--ass <path> ASS subtitle file
--out <path> Final mp4
--bgm <path> If set, mix BGM under voice via asplit sidechain ducking
--fonts-dir <path> Override fonts dir (defaults to bundled packages/video-translate-tools/fonts/)
fonts-dir

Print bundled fonts directory absolute path. Used by Dockerfile to register fonts.

Languages

Tag Font Max chars/line Source
en Bangers 32 npm bundle (this pkg)
ms Bangers 32 npm bundle (this pkg)
vi Noto Sans 30 apt-get install fonts-noto-core
th Sarabun 42 npm bundle (this pkg)

Thai uses 42 chars/line because Thai has no spaces between words — smartSplit would otherwise cut mid-word.

Styles

--style <name> picks a preset. Default classic = the original look, so callers that don't pass --style are unchanged. A style only changes colour / outline / shadow / keyword treatment + the Latin (en/ms) font; the per-language font fallback above always applies, so th/vi never tofu.

name en/ms font vi font Look
classic (default) Bangers Noto Sans White + black outline, pink-outline keyword
pop-soft Bangers Noto Sans classic + soft drop shadow (depth)
pop-3d Bangers Noto Sans Magenta hard 3D offset shadow + yellow keyword
pop-hl Bangers Noto Sans classic + bright-yellow filled keyword
anton Anton Anton Tall condensed bold + soft shadow + yellow keyword
luckyguy Luckiest Guy Noto Sans Rounded comic + magenta 3D shadow + yellow keyword

Bundled display fonts (fonts/): Bangers, Anton, Luckiest Guy. anton is the only display font that covers Vietnamese diacritics cleanly; the all-caps fonts (Bangers, Luckiest Guy) render vi as ugly mixed-case so vi falls back to Noto Sans for those styles.

System deps

  • Node ≥ 20
  • ffmpeg with libass (apt-get install ffmpeg on Ubuntu)
  • For Vietnamese subs: fonts-noto-core apt pkg (standalone Noto Sans family)
  • For Thai subs: bundled in this npm package (Sarabun-Regular.ttf)

Note: fonts-noto-cjk does NOT provide standalone Noto Sans — it ships only Noto Sans CJK * families. Use fonts-noto-core.

Scope

v1 ships subtitle rendering + BGM-ducked muxing only. Voice override (custom voice selection from HeyGen library) is deferred to v2 — default flow uses HeyGen auto-clone of original speaker. See SPEC v3.1 §Scope.

License

MIT