@optima-chat/video-translate-tools
Path-B styled subtitle rendering + BGM ducking for HeyGen-translated videos.
Pairs with the gen video-translate CLI from @optima-chat/optima-gen-cli.
gen video-translate --video-url $URL --lang $LANG -o ./out
curl <caption_url> -o ./out/caption.srt
video-translate render-ass --srt ./out/caption.srt --lang $TAG --out ./out/subs.ass
video-translate mux --raw ./out/translate_*.mp4 --ass ./out/subs.ass [--bgm ./bgm.wav] -o ./final.mp4
Install (in optima-ai-shell Dockerfile)
RUN apt-get install -y fonts-noto-core && \
npm install -g @optima-chat/video-translate-tools@latest && \
cp -r $(video-translate fonts-dir)/. /usr/share/fonts/video-translate/ && \
fc-cache -f
CLI Reference
render-ass
Parse SRT → ASS path-B styled subtitles.
| Flag | Required | Description |
|---|---|---|
--srt <path> |
HeyGen SRT file (BOM auto-stripped) | |
--lang <en|ms|vi|th> |
Target lang. Determines font + max chars per line | |
--out <path> |
Output ASS path | |
--translations <path> |
Save / read JSON intermediate. If file exists, used as-is; user can hand-edit to add **word** for KW highlight |
|
--style <name> |
Subtitle style preset (default classic). See Styles. Unknown value warns + falls back to classic |
mux
Burn ASS onto HeyGen mp4. Optional BGM ducking.
| Flag | Required | Description |
|---|---|---|
--raw <path> |
HeyGen-output mp4 (has lip-synced voice) | |
--ass <path> |
ASS subtitle file | |
--out <path> |
Final mp4 | |
--bgm <path> |
If set, mix BGM under voice via asplit sidechain ducking | |
--fonts-dir <path> |
Override fonts dir (defaults to bundled packages/video-translate-tools/fonts/) |
fonts-dir
Print bundled fonts directory absolute path. Used by Dockerfile to register fonts.
Languages
| Tag | Font | Max chars/line | Source |
|---|---|---|---|
en |
Bangers | 32 | npm bundle (this pkg) |
ms |
Bangers | 32 | npm bundle (this pkg) |
vi |
Noto Sans | 30 | apt-get install fonts-noto-core |
th |
Sarabun | 42 | npm bundle (this pkg) |
Thai uses 42 chars/line because Thai has no spaces between words — smartSplit would otherwise cut mid-word.
Styles
--style <name> picks a preset. Default classic = the original look, so callers that don't pass --style are unchanged. A style only changes colour / outline / shadow / keyword treatment + the Latin (en/ms) font; the per-language font fallback above always applies, so th/vi never tofu.
| name | en/ms font | vi font | Look |
|---|---|---|---|
classic (default) |
Bangers | Noto Sans | White + black outline, pink-outline keyword |
pop-soft |
Bangers | Noto Sans | classic + soft drop shadow (depth) |
pop-3d |
Bangers | Noto Sans | Magenta hard 3D offset shadow + yellow keyword |
pop-hl |
Bangers | Noto Sans | classic + bright-yellow filled keyword |
anton |
Anton | Anton | Tall condensed bold + soft shadow + yellow keyword |
luckyguy |
Luckiest Guy | Noto Sans | Rounded comic + magenta 3D shadow + yellow keyword |
Bundled display fonts (fonts/): Bangers, Anton, Luckiest Guy. anton is the only display font that covers Vietnamese diacritics cleanly; the all-caps fonts (Bangers, Luckiest Guy) render vi as ugly mixed-case so vi falls back to Noto Sans for those styles.
System deps
- Node ≥ 20
- ffmpeg with libass (
apt-get install ffmpegon Ubuntu) - For Vietnamese subs:
fonts-noto-coreapt pkg (standaloneNoto Sansfamily) - For Thai subs: bundled in this npm package (Sarabun-Regular.ttf)
Note: fonts-noto-cjk does NOT provide standalone Noto Sans — it ships only Noto Sans CJK * families. Use fonts-noto-core.
Scope
v1 ships subtitle rendering + BGM-ducked muxing only. Voice override (custom voice selection from HeyGen library) is deferred to v2 — default flow uses HeyGen auto-clone of original speaker. See SPEC v3.1 §Scope.
License
MIT