Whisper Transcription
B-Roll Me can transcribe video audio using OpenAI's Whisper model when YouTube captions aren't available. You can choose between running Whisper locally on your device or using the OpenAI Whisper API. Configure your preferred method in Settings > Transcription.
Transcription Methods
B-Roll Me supports two Whisper transcription methods. Choose the one that fits your workflow:
| Method | Local Whisper | OpenAI Whisper API |
|---|---|---|
| Cost | Free | Billed per minute of audio by OpenAI |
| Privacy | Fully on-device, no data sent externally | Audio sent to OpenAI servers |
| Setup | Download model (75 MB – 1.5 GB) | OpenAI API key required, no model download |
| Performance | Metal-accelerated on Apple Silicon | Cloud-based, consistent speed |
| File limit | No limit | Max 25 MB per file |
Select your preferred method in Settings > Transcription.
When Whisper Is Used
During the search phase, B-Roll Me fetches YouTube captions for each video. Some videos don't have captions available. When this happens:
- • If Auto-transcribe is enabled in Settings, Whisper will automatically transcribe the audio using your selected method.
- • The resulting transcript is used for keyword matching, just like YouTube captions.
OpenAI Whisper API
The OpenAI Whisper API sends audio to OpenAI's servers for transcription — no model download required.
- • Requires an OpenAI API key configured in Settings > API Keys.
- •
Audio is converted to MP3 and sent to OpenAI's
whisper-1model. - • Maximum file size is 25 MB per audio file.
- • Billed per minute of audio at OpenAI's standard rates.
- • Audio is processed on OpenAI's servers — do not use this method if your content is confidential.
Local Whisper
Local Whisper runs entirely on your machine — no cloud API calls required for transcription.
- • Completely free and private — audio never leaves your device.
- • Requires downloading a Whisper model (75 MB – 1.5 GB depending on model size).
- • Metal-accelerated on Apple Silicon Macs for significantly faster transcription.
Available Models (Local)
Choose a model based on your accuracy/speed tradeoff. Smaller models are faster but less accurate:
| Model | Size | Speed vs Accuracy |
|---|---|---|
| tiny.en | ~75 MB | Fastest, lowest accuracy |
| base.en | ~142 MB | Fast, decent accuracy |
| small.en | ~466 MB | Good balance |
| medium.en | ~1.5 GB | High accuracy, slower |
| large-v3-turbo-q5_0 | ~1.1 GB | Best accuracy, quantized for efficiency |
Downloading Models
Models must be downloaded before use. Go to Settings > Transcription, select a model, and click Download. The model is saved locally and can be deleted later to free space.
Apple Silicon Acceleration
On Apple Silicon Macs (M1/M2/M3/M4), local Whisper runs with Metal acceleration for significantly faster transcription. On Intel Macs and Windows, it runs on CPU which is slower but still functional.
Tips
- Most YouTube videos have captions. Whisper is mainly needed for less popular content, unlisted videos, or non-English content.
- If privacy matters, use Local Whisper — audio never leaves your machine.
- For local transcription, start with
base.enfor a good speed/accuracy balance on most machines. - If you have an Apple Silicon Mac with 16+ GB RAM,
large-v3-turbo-q5_0provides excellent accuracy with Metal acceleration. - Use the OpenAI Whisper API if you want fast transcription without downloading a model and don't mind the per-minute cost.