Setup Whisper
I wanted to generate chapter markers from a devlog audio recording using OpenAI’s Whisper, and figured I’d run it locally. Whisper is Python-based, and I’m on Arch. What could go wrong?
Turns out… not much, but it still took a few hops.
Choosing the Right Setup
I already had Python installed, but rather than littering system Python or managing a bunch of ad hoc virtualenvs, I decided to do it properly — with Poetry.
| |
So far so good.
PyTorch + CUDA: the PyPy Pitfall
My first attempt to install torch, torchvision, and torchaudio failed in a
confusing way — no versions found at all. The clue was in the command: I’d
accidentally run it with pip-pypy3. PyTorch doesn’t build wheels for PyPy.
CPython only.
Sorting Out Python Versions
My system Python was 3.13. PyTorch had just released 3.13 wheels for torch,
but not yet for torchaudio — version mismatch. I used pyenv to install 3.12
instead:
| |
Updated pyproject.toml:
| |
And re-pointed Poetry:
| |
Poetry ignored me the first time because 3.13 was still hardcoded. After recreating the environment and verifying the version, I was ready.
PEP 668: the “Externally Managed” False Alarm
Even inside the Poetry shell, Arch’s patched Python threw a
--break-system-packages error. This check is meant to protect system Python —
but it was firing inside a fully isolated Poetry environment. Safe to ignore. I
added the flag:
| |
Worked perfectly.
The Result
| |
Transcribed 2.5 hours of audio, timestamped segments ready for chapter generation. All local, GPU-accelerated, isolated from system Python, and repeatable.
In Summary
If you’re on Arch and want Whisper with CUDA:
- Use
poetry+pyenv - Set Python to 3.12 (not 3.13)
- Install torch with
--break-system-packagesand thecu121index - Whisper just works
A bit of fiddling up front, but now it’s a solid local tool — one less cloud dependency to think about.