transcriber¶

Voice transcription and audio processing

Transcriber provides voice-to-text transcription using OpenAI Whisper model, enabling audio input for applications. Fast, accurate, multi-language support.

What It Does¶

Audio Transcription - Convert speech to text
Speech Recognition - Identify speaker language
Audio Processing - Handle various formats
Multi-Language - Support 99+ languages
Accuracy Metrics - Confidence scores per segment

Key Capabilities¶

Transcription¶

Live Audio - Real-time transcription
File Upload - Process pre-recorded files
Format Support - MP3, WAV, M4A, etc
Speaker Diarization - Multiple speakers
Timestamps - Word-level timing

Language Support¶

99+ Languages - Automatic detection
Code-Switching - Mix languages in one file
Accents - Handles various accents
Domain Specific - Technical term handling

Quality Features¶

Confidence Scores - Per-word accuracy
Punctuation - Automatic sentence formatting
Capitalization - Smart casing
Noise Handling - Robust to background noise

Integration¶

REST API - Simple HTTP interface
WebSocket - Real-time streaming
Webhook - Async processing
MCP Tool - Available in herald

Accessing transcriber¶

URL: http://127.0.0.1:8019

Commands:

python manage.py transcribe --file audio.mp3
python manage.py transcribe --url https://example.com/audio.wav
python manage.py transcribe --language en

Common Use Cases¶

Transcribe Meeting Audio¶

Convert meeting recording to searchable text with timestamps.

Real-Time Transcription¶

Live speech-to-text for presentations.

Multi-Language Support¶

Transcribe interviews in multiple languages.

Accessibility¶

Generate captions for videos.

Troubleshooting¶

Low transcription accuracy¶

Try cleaner audio, slower speech, re-check language selection.

Timeout on long files¶

Use streaming mode for files over 30 minutes.

Unsupported audio format¶

Convert to WAV or MP3 first.

herald - Exposes transcriber tool in marketplace
All realms - Can use for audio processing