transcriber¶
Voice transcription and audio processing
Transcriber provides voice-to-text transcription using OpenAI Whisper model, enabling audio input for applications. Fast, accurate, multi-language support.
What It Does¶
- Audio Transcription - Convert speech to text
- Speech Recognition - Identify speaker language
- Audio Processing - Handle various formats
- Multi-Language - Support 99+ languages
- Accuracy Metrics - Confidence scores per segment
Key Capabilities¶
Transcription¶
- Live Audio - Real-time transcription
- File Upload - Process pre-recorded files
- Format Support - MP3, WAV, M4A, etc
- Speaker Diarization - Multiple speakers
- Timestamps - Word-level timing
Language Support¶
- 99+ Languages - Automatic detection
- Code-Switching - Mix languages in one file
- Accents - Handles various accents
- Domain Specific - Technical term handling
Quality Features¶
- Confidence Scores - Per-word accuracy
- Punctuation - Automatic sentence formatting
- Capitalization - Smart casing
- Noise Handling - Robust to background noise
Integration¶
- REST API - Simple HTTP interface
- WebSocket - Real-time streaming
- Webhook - Async processing
- MCP Tool - Available in herald
Accessing transcriber¶
URL: http://127.0.0.1:8019
Commands:
python manage.py transcribe --file audio.mp3
python manage.py transcribe --url https://example.com/audio.wav
python manage.py transcribe --language en
Common Use Cases¶
Transcribe Meeting Audio¶
Convert meeting recording to searchable text with timestamps.
Real-Time Transcription¶
Live speech-to-text for presentations.
Multi-Language Support¶
Transcribe interviews in multiple languages.
Accessibility¶
Generate captions for videos.
Troubleshooting¶
Low transcription accuracy¶
Try cleaner audio, slower speech, re-check language selection.
Timeout on long files¶
Use streaming mode for files over 30 minutes.
Unsupported audio format¶
Convert to WAV or MP3 first.
Related Realms¶
- herald - Exposes transcriber tool in marketplace
- All realms - Can use for audio processing