User Manual

A comprehensive guide to using Voiceworks Toolkit for Japanese language study.

Getting Started

After installing the userscript, browse to the platform. The toolkit activates automatically and you'll notice several enhancements right away:

  • A new sidebar menu with Radio Mode and Playlist Mode toggles
  • Keyboard shortcuts for playback control
  • Infinite scroll replacing pagination
  • Translated tags and UI elements (when translation services are enabled)
  • Progress tracking checkmarks on work cards

Most features work out of the box. AI-powered features need light setup in Settings: Whisper requires a one-time model download, and semantic search requires a Jina API key.

Enhanced interface with subtitles, furigana, and audio visualizer

Settings & Configuration

Access the settings panel from the Settings page. The toolkit adds its own configuration sections alongside the native settings.

General Settings

  • Language: Switch between English, Chinese, and Japanese UI localization
  • SFW Mode: Hide all images and thumbnails for use in public environments
  • Infinite Scroll: Toggle automatic page loading on scroll

AI Settings

  • Whisper Model: Choose the transcription model size (tiny, small, medium) and quantization level. Larger models are more accurate but use more memory
  • Whisper Language: Set the default transcription language (Japanese, English, Chinese, Korean, etc.)
  • Translation Settings: Configure web translation behavior and caching controls
  • Jina API Key: Enter your free Jina API key to enable semantic vector search

Playback Settings

  • Shuffle: Enable or disable shuffle for Radio and Playlist modes
  • Auto-advance: Automatically move to the next work when the current one finishes

Learner Mode

Learner Mode displays dual-language subtitles during playback, designed for immersion-based Japanese study.

How It Works

  1. Navigate to any voicework with available subtitle files (LRC format) or enable Whisper for live transcription
  2. The primary line shows Japanese text (kana/kanji)
  3. The secondary line shows English translation, blurred by default
  4. Hover over the English line or press B to reveal the translation

Subtitle Sources

Learner Mode uses subtitles from these sources, in priority order:

  1. LRC files: Pre-existing lyric files bundled with the voicework
  2. Whisper transcription: Live speech-to-text when enabled (overrides static subtitles)
  3. Cached transcripts: Previously transcribed content is loaded instantly

Chinese Content

When Chinese-language subtitles are detected (CJK text without kana), they are automatically translated to Japanese via remote translation so the primary line always shows Japanese text. This ensures learners studying Japanese always see their target language first.

Tip: Use playback speed controls ([ and ] keys) to slow down audio during study. The lead-time setting in the player lets you read ahead before each line is spoken.

Learner Mode subtitles with blur toggle

Live Transcription (Whisper)

The live transcription feature uses the Whisper speech recognition model to generate subtitles in real-time from any audio being played.

Enabling Transcription

  1. Click the microphone icon in the player controls
  2. On first use, the Whisper model will download (~150 MB). This is cached for future use
  3. Once loaded, transcription begins automatically as audio plays

Model Selection

Choose the appropriate model size in Settings based on your hardware:

  • Tiny: Fastest, lowest memory, good for real-time use on modest hardware
  • Small: Better accuracy, recommended for systems with WebGPU support
  • Medium: Best accuracy, requires a capable GPU and sufficient memory

Caching

Transcripts are automatically cached per-track with a 90-day TTL. When you revisit a previously transcribed voicework, subtitles appear instantly without re-running the model.

Exporting

Download transcripts in standard subtitle formats:

  • LRC: Lyric format, compatible with most music players and study tools
  • VTT: WebVTT format, standard for web video subtitles
  • SRT: SubRip format, widely supported by media players and Anki

Download buttons appear in the file tree and flat view after transcription completes.

Translation System

The toolkit includes two translation mechanisms that work together to provide comprehensive English translation across the interface.

Neural Translation (Web Pipeline)

Translation uses a web pipeline with host rotation, retry/backoff, in-flight deduplication, and shared caching. It translates:

  • Player track titles (Japanese/Chinese to English)
  • Content tags across all pages
  • Work card titles in grid and list views
  • Circle and voice actor names

No local translation model download is required. Cached translations are reused across features for faster repeat lookups.

Interface Translation (Static)

A built-in translation map converts static UI elements (buttons, menus, sort options, labels) to English. This works immediately without any model download and covers the complete interface.

Performance

Translation requests are batched in an 8ms coalescing window and deduplicated to prevent redundant work. Single-text requests (like the currently playing track title) are prioritized over batch operations for lower visible latency.

Semantic Search

Semantic search lets you find voiceworks by meaning rather than exact keywords. This is particularly useful for finding content by theme or topic when you don't know the exact Japanese title.

  1. Enter your Jina API key in Settings (free tier available)
  2. Click the search icon in the header to open the Semantic Search dialog
  3. Type your query in any language
  4. Results are ranked by semantic similarity to your query

The vector index builds automatically in the background. You can also trigger manual indexing from the dialog. Embeddings are stored locally in IndexedDB and persist across sessions.

Advanced Search

The Advanced Search panel provides structured filtering:

  • Tags: Filter by content tags with AND/OR logic
  • Circle: Search by creator/circle name
  • Voice Actor: Filter by voice actor
  • Date Range: Limit results to a time period
  • Rating & Price: Set minimum rating or price filters

Search history is saved for quick re-use of frequent queries.

Tag Filters

Click any tag on a work card or detail page to instantly filter by it. Active filters appear as removable chips and persist across navigation. Multi-tag filtering is supported, so you can click additional tags to narrow results.

Playback Modes

Radio Mode

Radio Mode provides continuous shuffled playback across your library, ideal for extended immersion sessions.

  1. Toggle Radio Mode from the sidebar menu
  2. Playback begins automatically with a randomly selected voicework
  3. When one work finishes, the next is selected randomly
  4. A short history buffer prevents frequent repetition
  5. Health-checking and auto-recovery keep the stream running through interruptions

Playback state persists across page refreshes. Radio Mode is mutually exclusive with Playlist Mode.

Playlist Mode

Playlist Mode enables sequential playback through curated collections.

  1. Open the Playlist Discovery panel from the sidebar
  2. Browse or search community-curated playlists
  3. Activate a playlist to begin sequential playback
  4. Use the forward/back controls in the player bar to navigate between works

Playlists auto-advance to the next voicework when the current one finishes.

Media Viewer

The Media Viewer provides a lightbox gallery for images and video files bundled with voiceworks.

Supported Formats

  • Images: JPG, PNG, GIF, WebP
  • Video: MP4, WebM, MOV, AVI, MKV
  • Documents: PDF, TXT, SRT

Controls

  • Click any media file to open the lightbox
  • Use arrow keys or swipe to navigate between files
  • Press ESC to close
  • Enable slideshow mode for auto-advance

Keyboard Shortcuts

All shortcuts are disabled when focus is in a text input field.

KeyActionContext
Space / KPlay / PausePlayer
MMute / UnmutePlayer
FFullscreen togglePlayer
Left ArrowSeek back 5 secondsPlayer
Right ArrowSeek forward 5 secondsPlayer
Shift + LeftSeek back 30 secondsPlayer
Shift + RightSeek forward 30 secondsPlayer
Up ArrowVolume up 5%Player
Down ArrowVolume down 5%Player
[Decrease playback speedPlayer
]Increase playback speedPlayer
09Jump to 0%–90% of trackPlayer
BToggle English subtitle blurLearner Mode
JToggle Japanese subtitlesLearner Mode
ESCClose lightbox / Exit fullscreenMedia Viewer
Left / RightPrevious / Next imageMedia Viewer

Backup & Restore

Export all your settings and preferences to a JSON file for safekeeping or transfer to another browser.

Export

  1. Open Settings
  2. Click Export Settings
  3. Save the downloaded JSON file

Import

  1. Open Settings
  2. Click Import Settings
  3. Select your previously exported JSON file
  4. Settings are applied immediately

Note: The export includes preferences, feature toggles, and configuration values. It does not include cached data (transcripts, translations, embeddings) as these are regenerated automatically.

FAQ

Does the toolkit send my data to external servers?

Whisper transcription runs on-device, while translation and semantic embeddings use external APIs. In practice, external calls include Google Translate endpoints for translation and Jina embeddings for semantic search when you provide an API key. Results are cached locally in your browser.

What browsers are supported?

Any modern Chromium-based browser (Chrome, Edge, Brave) or Firefox with Tampermonkey installed. WebGPU support (Chrome 113+, Edge 113+) is recommended for best Whisper performance, but the toolkit falls back to WASM automatically.

How much disk space do AI assets use?

The Whisper model is the main local download at roughly ~150 MB depending on model choice and quantization. Translation does not require a local model download. Cached transcripts, translations, and embeddings also use browser storage over time.

Can I use custom Whisper models?

Yes. The model selection field in Settings accepts arbitrary Hugging Face model IDs. Type a custom model ID (e.g., from the onnx-community namespace) and the worker will attempt to load it. Suggested models are provided in a dropdown for convenience.

Why are translations slow on first load?

First requests may be slower because the service is warming caches and making fresh network calls. After that, shared translation caching makes repeated text much faster.

How do I report bugs or request features?

Open an issue on the GitHub Issues page. Include your browser version, GPU model (if AI-related), and steps to reproduce.