Speech to Text (Offline)
Transcribe English speech to text — record from your microphone or upload an audio file. The Whisper tiny.en model runs entirely in your browser via Transformers.js.
0:00
Up to 10 minutes. Audio never leaves your device.
How It Works
Capture audio
Record from your microphone via the MediaRecorder API or drop in an existing audio file. Audio stays in browser memory.
Decode + resample
AudioContext.decodeAudioData converts any codec to raw PCM; an OfflineAudioContext resamples to 16 kHz mono — the rate Whisper was trained on.
On-device Whisper
Whisper tiny.en runs inside a Web Worker on WebGPU (or WASM). Long clips are split into overlapping 30-second chunks and stitched.
Copy or download
The transcript renders below the audio player. Copy to clipboard or download as a .txt file — all in the browser, no server involved.
Why Use a Private, Offline Speech-to-Text Tool?
Cloud transcription services such as Google Speech-to-Text, Otter, and Rev send every recording to a remote server. For confidential interviews, internal meetings, voice memos, customer calls, or any audio covered by GDPR, HIPAA, or your employer's data-handling policy, that upload is a real risk. An on-device speech recogniser removes the upload entirely — your audio is decoded, resampled, and transcribed in the same browser tab and never traverses the network.
How to Use the Speech-to-Text Tool
- Choose Record to capture audio from your microphone, or Upload file to bring in an existing audio file.
- If recording: click Start recording, speak clearly, and click Stop recording when finished. The maximum recording length is 10 minutes per clip.
- If uploading: click Choose audio file and pick any supported format (MP3, WAV, M4A, OGG, WebM, FLAC, AAC).
- Click Transcribe. On first use you'll be asked to confirm a one-time ~75 MB model download.
- Read the transcript that appears below the audio player.
- Click Copy to copy the text, or Download .txt to save it as a plain-text file.
Supported Audio Formats
Any audio format your browser can decode is accepted. The most common formats are listed below:
| Format | Typical extension | Notes |
|---|---|---|
| MPEG Audio | .mp3 | Universally supported; lossy compression. |
| WAV (PCM) | .wav | Uncompressed; best fidelity but largest files. |
| MPEG-4 / AAC | .m4a, .aac | Default for iOS voice memos and Apple devices. |
| Ogg Vorbis | .ogg | Common in open-source recording tools. |
| WebM / Opus | .webm | Default for in-browser MediaRecorder output. |
| FLAC | .flac | Lossless compression; large but high quality. |
Key Features
- Runs 100% in your browser — audio is never uploaded to any server; transcription happens on your device.
- Free with no sign-up — no API key, no usage limits, no account, no watermarks.
- Record or upload — capture audio live via the MediaRecorder API, or transcribe existing files in any common format.
- Automatic chunked inference — recordings longer than 30 seconds are split into overlapping 30-second windows (5-second stride) so each chunk fits Whisper's training window without breaking words across chunk boundaries.
- Offline after first load — the model downloads once and is cached in your browser's IndexedDB storage.
- Copy or download — get the transcript on the clipboard or as a plain-text file in one click.
- Open-source model — OpenAI Whisper tiny.en is released under the MIT License.
Best Use Cases
- Confidential meetings — transcribe internal discussions without exposing the audio to a cloud provider.
- Journalist interviews — protect source confidentiality by keeping recordings on a single device.
- Medical and legal dictation — first-draft transcripts for HIPAA / GDPR / attorney-client privileged content.
- Voice memos to text — convert quick thoughts into searchable text without using a cloud service.
- Lecture and podcast notes — generate a reference transcript for your own learning material.
- Accessibility — produce written records of audio content for hearing-impaired colleagues or students.
Accuracy and Limitations
Whisper tiny.en is the smallest member of OpenAI's Whisper family — a ~75 MB English-only model. It is dramatically smaller than the multi-GB Whisper Large model that powers many cloud transcription services. Expect strong results on clear speech recorded in a quiet environment, and weaker results on heavy accents, overlapping speakers, very noisy backgrounds, or highly technical jargon. For high-stakes transcripts always have a human reviewer check the output. The trade-off is privacy and zero cost: nothing you say is ever transmitted off your device, and there are no per-minute charges.
Frequently Asked Questions
Is my audio uploaded to your servers?
No. Audio is processed entirely inside your browser using OpenAI Whisper via Transformers.js. Recorded audio and uploaded files are decoded in memory, resampled to 16 kHz, and passed to the Whisper model running in a Web Worker on your device. Nothing is sent to or stored on any server.
Why is the first transcription slow?
The first time you use the tool your browser downloads the Whisper tiny.en model (~75 MB) and saves it to your browser’s IndexedDB storage. After that one-time download, the model loads instantly and the tool works even offline.
Which AI model is used?
OpenAI Whisper tiny.en — the English-only variant of OpenAI’s open-source automatic speech recognition model. The ONNX build is hosted by the onnx-community project and runs through Transformers.js. Whisper was released under the MIT License.
Which languages are supported?
This release uses Whisper tiny.en, which is the English-only variant. Multilingual support via larger Whisper variants (e.g. whisper-tiny multilingual) is on the roadmap.
What audio formats can I upload?
Any audio format your browser can decode — typically MP3, WAV, M4A (AAC), OGG, WebM (Opus), FLAC, and AAC. The tool uses AudioContext.decodeAudioData for decoding, then resamples to 16 kHz mono via an OfflineAudioContext before running inference.
How long can the audio be?
Up to 10 minutes per clip. Audio longer than 30 seconds is automatically split into overlapping 30-second chunks (with 5-second stride) so each chunk fits Whisper’s training window, and the resulting per-chunk transcripts are stitched together into a single output.
Does this work offline?
Yes. Once the Whisper model is downloaded on your first use, it is saved to your browser’s IndexedDB. You can then transcribe audio without an internet connection.
Why does the browser ask for microphone permission?
Recording uses the browser’s MediaRecorder API, which is gated behind an explicit microphone permission prompt. Audio captured from your microphone stays in browser memory only — it is never transmitted off your device.
How does on-device transcription compare to cloud services like Otter or Rev?
On-device transcription is private and free with no rate limits, but a 75 MB browser model is smaller than the multi-billion-parameter cloud models used by Otter, Rev, and Google Speech-to-Text. For clear speech the accuracy is good; for heavily accented or noisy recordings, cloud services may still be stronger. Choose on-device when privacy matters most.
Privacy & Security