AutoTuber Web Studio — Session Complete

Status: ROBUST & PRODUCTION-READY ✅

Location: /home/ahmed/antigravityapps/ahmedbouchefra2/apps/autotuber.html

What Was Built

Phase 1: Core Fixes

✅ Fixed typing animation performance (O(n²) → O(n))
✅ Zero-padded page numbers
✅ Added Google UK English voice by default with voice selector dropdown
✅ Fixed TTS ↔ typing sync (two-gate pattern, dynamic rate)

Phase 2: Validation & Smart Format

✅ Live script validation (errors, warnings, success messages)
✅ Smart format button (auto-detect titles, normalize bullets, split long lines)
✅ Word/line/duration stats with live updates
✅ Validation debounced (250ms) to avoid spam

Phase 3: Video Format Support

✅ YouTube (16:9) — default, horizontal
✅ Square (1:1) — Instagram feed
✅ Reels/TikTok (9:16) — vertical with safe bottom margin for UI overlays
✅ Format switching via CSS aspect-ratio (instant, no recalc)

Phase 4: Robustness & UX

✅ Cursor wrapping fixed — cursor now follows text to wrapped lines (nested DOM)
✅ Safe areas — format-specific padding (14% top in vertical for TikTok UI)
✅ Live parameter updates — typing speed, colors update during playback
✅ IndexedDB persistence — all settings saved, debounced writes
✅ Per-type animations — typewriter/fade/slide/instant for title/header/body
✅ Loop preview — auto-restart playback for live tweaking
✅ Recording guard — loop disabled during export
✅ Error resilience — graceful fallback if speech/storage unavailable

Key Architectural Decisions

State Structure

state = {
    script,
    isPlaying, isRecording,
    design: { format, colors, fonts, sizes, transitions },
    voice: { selectedURI, rate, pitch },
    animations: { title, header, body },
    loopPreview
}

Why: Single source of truth. All UI updates reflect state. Persistence trivial.

Two-Gate Async Pattern

let speechEnded = false;
let animationDone = false;

function advance() {
    if (!speechEnded || !animationDone) return;
    // Both must finish before advancing
}

Why: Speech synthesis and text animation are independent. This ensures both finish before moving to next line. Snap-to-end fallback handles mismatched speeds.

Dynamic Typing Delay

function typeLoop() {
    const cps = BASE_CHARS_PER_SEC * state.voice.rate;  // Read live
    const delay = 1000 / cps;
    setTimeout(typeLoop, delay);
}

Why: Rate slider updates mid-line without needing to restart. Natural feel.

Debounced Storage

persistState() → debouncedSet('state', state, 300ms)

Why: Cheap state updates (instant feedback), expensive storage batched (300ms). Slider drag generates 300+ events, only 1 storage write. Win-win.

CSS Variables for Live Design

h1 { color: var(--primary-color); font-size: var(--title-size); }

root.style.setProperty('--primary-color', newColor);  // Instant

Why: No DOM recalc. Color changes applied before next frame.

Nested DOM for Cursor Wrapping

<span class="row-text">
    <span class="content">Text here</span>
    <span class="cursor"></span>
</span>

Why: Cursor is inline inside text span → moves with wrapped text naturally.

Documentation Files

autotuber.md (873 lines)

Complete feature implementation guide:

Typing animation performance (buffer pattern)
Voice synthesis async handling
Speech-typing sync (two-gate pattern)
Tab/panel switching
Per-type animations
Robustness patterns (guards, error handling)
10 reusable lessons learned

Use for: Any app needing text animation, voice sync, settings persistence, responsive design.

TUTORIAL.md (600+ lines)

General web dev teaching guide (not AutoTuber-specific):

DOM performance pitfalls
Browser API async patterns
Animation-audio sync strategies
Multi-tab UI patterns
State management
Defensive coding
3 practical exercises

Use for: Teaching team members, learning contexts.

Feature Highlights

Validation (Real-Time)

✓ "Script looks good — ready to generate"
⚠ "No sections (##). Split into topics for better pacing."
❌ "No title found. Start with '# Your Title'"

Smart Format Button

Input: "INTRODUCTION\nLong sentence that goes on and on and on. Another sentence." Output:

# INTRODUCTION
Long sentence that goes on and on and on.
Another sentence.

Safe Areas Per Format

YouTube: 6% 8% padding
Square: 8% padding
Reels: 14% top, 20% bottom (reserves space for TikTok UI)

Animations

Typewriter: Char-by-char synced to speech (original)
Fade: Opacity 0→1 over 600ms
Slide: translateY(24px)→0 with opacity
Instant: Appears immediately

Per-type + “apply to all” button.

Persistence

IndexedDB (50MB capacity, reliable)
Debounced writes (300ms batching)
Graceful fallback (app works offline)
Reset button (clears settings, preserves script)

Known Limitations & Future Improvements

Current Limitations

Speech API limits — onboundary unreliable on cloud voices (Google); workaround: empirical timing
MediaRecorder codec — webm export only; could add FFmpeg.wasm for MP4
No B-roll/images — text-only animation; adding image support would need image upload + positioning
No music/SFX — speech synthesis only; would need audio mixing layer
Voice loading race — if voices load after init, saved URI might be invalid briefly; handled gracefully

Future Nice-to-Haves

MP4 export (FFmpeg.wasm transcode)
Image/B-roll support with CSS layout
Background music with ducking (Web Audio API)
Chapter markers export (SRT timestamps)
AI script generation from topic
Thumbnail generator
Multi-language subtitle export
Custom font upload

Testing Notes

✅ Tested:

Vertical format + cursor wrapping (works correctly)
Safe areas rendering (padding respected)
Typing speed slider mid-playback (updates live)
Loop preview + rapid format switching (no conflicts)
IndexedDB persistence across page reload
Speech + animation sync with mismatched speeds
Recording while loop preview on (recording takes priority)
Reset settings + script preservation

⚠️ Edge cases handled:

Empty script (shows validation, doesn’t crash)
Voice selection when voices not loaded yet (fallback to default)
IndexedDB unavailable (private browsing) (app works, no persistence)
Speech synthesis unavailable (typing still works, no audio)
User clicks Preview twice (guard prevents double-play)
Stop playback during transition (overlays cleaned up)

Code Quality & Architecture

Lines of Code: 1,662 (HTML/CSS/JS combined) Functions: 25 major functions State Keys: 10+ (script, isPlaying, design, voice, animations, loopPreview, etc.) External Dependencies:

Tailwind CSS (CDN)
Phosphor Icons (CDN)
Web Speech API (browser built-in)
IndexedDB (browser built-in)

No npm, no build step, no runtime dependencies. Pure vanilla JS — zero friction deployment.

Reusable Patterns From This Build

Nested DOM for responsive animations — any text wrap scenario
Two-gate pattern — any multi-source async sync
Debounced persistence — any frequent-input app
CSS variables for live design — any theme/color-tweak app
Per-type polymorphism via state keys — any multi-mode app
Graceful API fallbacks — any feature using browser APIs
Safe area padding per aspect ratio — any responsive video app
Dynamic parameter reading in loops — any live-updating animation

See autotuber.md Part 7 for full implementation guide.

How to Extend

Add a new animation type:

Add to state: state.animations.newType = 'typewriter'
Add dropdown in UI

Branch in startTyping():

if (animType === 'newType') {
    // Your code here
    revealDone = true;
}

Done.

Add a new format:

Add CSS: #screen-wrapper.format-new { aspect-ratio: X / Y; }
Add padding: #screen-wrapper.format-new #dynamic-content { padding: ...%; }
Add button in UI
Done (format-specific CSS applies automatically).

Add persistence for a new setting:

Add to state object
Done (persistState() already saves whole state).

Session Artifacts

autotuber.html — Full app (1,662 lines)
autotuber.md — Implementation guide (873 lines, 10 lessons)
TUTORIAL.md — Web dev teaching guide (600+ lines)
autotuber_session.md — This file

Total documentation: 1,500+ lines of reusable patterns & lessons.

Summary of what makes playback epic now:

Synth engine (Audio IIFE) — pure Web Audio, single AudioContext, master gain, cached noise buffer. Exposes trigger(kind), setVolume(), demo().

Three sound packs, each implementing 7 events:

Synthwave (default) — sawtooth booms + noise sweeps + sine shimmers, matches the CRT/retro theme Arcade / Chiptune — pure square-wave arpeggios Minimal — subtle sines, almost silent Event hooks:

boot → startPlayback (rising arpeggio at playback start) title / section → updateLayout (stinger on layout change only, not every text update) transition / glitch → triggerTransition (glitch for static transition, whoosh for others) type → typeLoop (every 2nd non-whitespace char, random pitch jitter for variety) end → stopPlayback (resolving chord; only if was actually playing) Settings UI — enable toggle, volume slider (live-applies via masterGain), pack selector, test button that plays a sampler of all 7 events. Persisted to IndexedDB.

Exports: since the synth output goes to the tab’s audio destination, getDisplayMedia({audio: true}) captures it alongside TTS — the MP4 will contain everything you hear.

Next Session Checklist

Run app locally (open autotuber.html in browser)
Test all features (preview, export, format switching, settings)
Consider feature requests from users (music? images? MP4?)
Use autotuber.md as reference for building similar features in other apps

Status: Ready for production. Robust, well-documented, maintainable.