AutoTuber Web Studio — Session Complete
Status: ROBUST & PRODUCTION-READY ✅
Location: /home/ahmed/antigravityapps/ahmedbouchefra2/apps/autotuber.html
What Was Built
Phase 1: Core Fixes
- ✅ Fixed typing animation performance (O(n²) → O(n))
- ✅ Zero-padded page numbers
- ✅ Added Google UK English voice by default with voice selector dropdown
- ✅ Fixed TTS ↔ typing sync (two-gate pattern, dynamic rate)
- ✅ Live script validation (errors, warnings, success messages)
- ✅ Smart format button (auto-detect titles, normalize bullets, split long lines)
- ✅ Word/line/duration stats with live updates
- ✅ Validation debounced (250ms) to avoid spam
- ✅ YouTube (16:9) — default, horizontal
- ✅ Square (1:1) — Instagram feed
- ✅ Reels/TikTok (9:16) — vertical with safe bottom margin for UI overlays
- ✅ Format switching via CSS aspect-ratio (instant, no recalc)
Phase 4: Robustness & UX
- ✅ Cursor wrapping fixed — cursor now follows text to wrapped lines (nested DOM)
- ✅ Safe areas — format-specific padding (14% top in vertical for TikTok UI)
- ✅ Live parameter updates — typing speed, colors update during playback
- ✅ IndexedDB persistence — all settings saved, debounced writes
- ✅ Per-type animations — typewriter/fade/slide/instant for title/header/body
- ✅ Loop preview — auto-restart playback for live tweaking
- ✅ Recording guard — loop disabled during export
- ✅ Error resilience — graceful fallback if speech/storage unavailable
Key Architectural Decisions
State Structure
state = {
script,
isPlaying, isRecording,
design: { format, colors, fonts, sizes, transitions },
voice: { selectedURI, rate, pitch },
animations: { title, header, body },
loopPreview
}
Why: Single source of truth. All UI updates reflect state. Persistence trivial.
Two-Gate Async Pattern
let speechEnded = false;
let animationDone = false;
function advance() {
if (!speechEnded || !animationDone) return;
// Both must finish before advancing
}
Why: Speech synthesis and text animation are independent. This ensures both finish before moving to next line. Snap-to-end fallback handles mismatched speeds.
Dynamic Typing Delay
function typeLoop() {
const cps = BASE_CHARS_PER_SEC * state.voice.rate; // Read live
const delay = 1000 / cps;
setTimeout(typeLoop, delay);
}
Why: Rate slider updates mid-line without needing to restart. Natural feel.
Debounced Storage
persistState() → debouncedSet('state', state, 300ms)
Why: Cheap state updates (instant feedback), expensive storage batched (300ms).
Slider drag generates 300+ events, only 1 storage write. Win-win.
CSS Variables for Live Design
h1 { color: var(--primary-color); font-size: var(--title-size); }
root.style.setProperty('--primary-color', newColor); // Instant
Why: No DOM recalc. Color changes applied before next frame.
Nested DOM for Cursor Wrapping
<span class="row-text">
<span class="content">Text here</span>
<span class="cursor"></span>
</span>
Why: Cursor is inline inside text span → moves with wrapped text naturally.
Documentation Files
autotuber.md (873 lines)
Complete feature implementation guide:
- Typing animation performance (buffer pattern)
- Voice synthesis async handling
- Speech-typing sync (two-gate pattern)
- Tab/panel switching
- Per-type animations
- Robustness patterns (guards, error handling)
- 10 reusable lessons learned
Use for: Any app needing text animation, voice sync, settings persistence, responsive design.
TUTORIAL.md (600+ lines)
General web dev teaching guide (not AutoTuber-specific):
- DOM performance pitfalls
- Browser API async patterns
- Animation-audio sync strategies
- Multi-tab UI patterns
- State management
- Defensive coding
- 3 practical exercises
Use for: Teaching team members, learning contexts.
Feature Highlights
Validation (Real-Time)
✓ "Script looks good — ready to generate"
⚠ "No sections (##). Split into topics for better pacing."
❌ "No title found. Start with '# Your Title'"
Input: "INTRODUCTION\nLong sentence that goes on and on and on. Another sentence."
Output:
# INTRODUCTION
Long sentence that goes on and on and on.
Another sentence.
- YouTube: 6% 8% padding
- Square: 8% padding
- Reels: 14% top, 20% bottom (reserves space for TikTok UI)
Animations
- Typewriter: Char-by-char synced to speech (original)
- Fade: Opacity 0→1 over 600ms
- Slide: translateY(24px)→0 with opacity
- Instant: Appears immediately
Per-type + “apply to all” button.
Persistence
- IndexedDB (50MB capacity, reliable)
- Debounced writes (300ms batching)
- Graceful fallback (app works offline)
- Reset button (clears settings, preserves script)
Known Limitations & Future Improvements
Current Limitations
- Speech API limits —
onboundary unreliable on cloud voices (Google); workaround: empirical timing
- MediaRecorder codec — webm export only; could add FFmpeg.wasm for MP4
- No B-roll/images — text-only animation; adding image support would need image upload + positioning
- No music/SFX — speech synthesis only; would need audio mixing layer
- Voice loading race — if voices load after init, saved URI might be invalid briefly; handled gracefully
Future Nice-to-Haves
Testing Notes
✅ Tested:
- Vertical format + cursor wrapping (works correctly)
- Safe areas rendering (padding respected)
- Typing speed slider mid-playback (updates live)
- Loop preview + rapid format switching (no conflicts)
- IndexedDB persistence across page reload
- Speech + animation sync with mismatched speeds
- Recording while loop preview on (recording takes priority)
- Reset settings + script preservation
⚠️ Edge cases handled:
- Empty script (shows validation, doesn’t crash)
- Voice selection when voices not loaded yet (fallback to default)
- IndexedDB unavailable (private browsing) (app works, no persistence)
- Speech synthesis unavailable (typing still works, no audio)
- User clicks Preview twice (guard prevents double-play)
- Stop playback during transition (overlays cleaned up)
Code Quality & Architecture
Lines of Code: 1,662 (HTML/CSS/JS combined)
Functions: 25 major functions
State Keys: 10+ (script, isPlaying, design, voice, animations, loopPreview, etc.)
External Dependencies:
- Tailwind CSS (CDN)
- Phosphor Icons (CDN)
- Web Speech API (browser built-in)
- IndexedDB (browser built-in)
No npm, no build step, no runtime dependencies. Pure vanilla JS — zero friction deployment.
Reusable Patterns From This Build
- Nested DOM for responsive animations — any text wrap scenario
- Two-gate pattern — any multi-source async sync
- Debounced persistence — any frequent-input app
- CSS variables for live design — any theme/color-tweak app
- Per-type polymorphism via state keys — any multi-mode app
- Graceful API fallbacks — any feature using browser APIs
- Safe area padding per aspect ratio — any responsive video app
- Dynamic parameter reading in loops — any live-updating animation
See autotuber.md Part 7 for full implementation guide.
How to Extend
Add a new animation type:
- Add to state:
state.animations.newType = 'typewriter'
- Add dropdown in UI
- Branch in startTyping():
if (animType === 'newType') {
// Your code here
revealDone = true;
}
- Done.
Add a new format:
- Add CSS:
#screen-wrapper.format-new { aspect-ratio: X / Y; }
- Add padding:
#screen-wrapper.format-new #dynamic-content { padding: ...%; }
- Add button in UI
- Done (format-specific CSS applies automatically).
Add persistence for a new setting:
- Add to state object
- Done (persistState() already saves whole state).
Session Artifacts
- autotuber.html — Full app (1,662 lines)
- autotuber.md — Implementation guide (873 lines, 10 lessons)
- TUTORIAL.md — Web dev teaching guide (600+ lines)
- autotuber_session.md — This file
Total documentation: 1,500+ lines of reusable patterns & lessons.
Summary of what makes playback epic now:
Synth engine (Audio IIFE) — pure Web Audio, single AudioContext, master gain, cached noise buffer. Exposes trigger(kind), setVolume(), demo().
Three sound packs, each implementing 7 events:
Synthwave (default) — sawtooth booms + noise sweeps + sine shimmers, matches the CRT/retro theme
Arcade / Chiptune — pure square-wave arpeggios
Minimal — subtle sines, almost silent
Event hooks:
boot → startPlayback (rising arpeggio at playback start)
title / section → updateLayout (stinger on layout change only, not every text update)
transition / glitch → triggerTransition (glitch for static transition, whoosh for others)
type → typeLoop (every 2nd non-whitespace char, random pitch jitter for variety)
end → stopPlayback (resolving chord; only if was actually playing)
Settings UI — enable toggle, volume slider (live-applies via masterGain), pack selector, test button that plays a sampler of all 7 events. Persisted to IndexedDB.
Exports: since the synth output goes to the tab’s audio destination, getDisplayMedia({audio: true}) captures it alongside TTS — the MP4 will contain everything you hear.
Next Session Checklist
Status: Ready for production. Robust, well-documented, maintainable.