Heads up!
gophercast
GopherCast turns any machine on your local network into a synchronized speaker. You run one server, every other device connects as a client, and they all start playing audio at precisely the same moment. No drift, no desync, no buffering banners.
Think of it as a personal audio player over multiple devices (in sync), built in Go, over plain WebSockets, with a terminal UI.
Table of Contents
- How It Works
- Features
- Platform Support
- Prerequisites
- Installation
- Usage
- Synchronization Model
- Single-Machine Multi-Terminal Mode
- Why WebSocket over WebRTC or UDP?
- Architecture
- Configuration
- Logs
- Development
- Known Limitations
- License
How It Works
At its core, GopherCast is a streaming pipeline over WebSocket:
- The server decodes MP3 files to raw PCM audio using go-mp3.
- PCM frames are paced to real time and broadcast to all connected clients as WebSocket binary messages.
- Each client receives the raw audio bytes and feeds them directly to its local audio output via oto/v3.
- Before playback begins, the server sends a
start_at_nsUnix nanosecond timestamp to every client. Clients sleep until that wall-clock time, then begin playing in unison.
Because clients receive raw PCM rather than compressed audio, there is no per-client decoding step.
Features
- Synchronized multi-device playback: all clients start within ~1ms of each other on NTP-synced LAN
- YouTube support: provide a video URL or full playlist URL and GopherCast downloads and converts audio via
yt-dlpbefore serving - Bubble Tea TUI: interactive terminal UI for source selection, download progress, track picking, lobby management, and live stream status
- Local directory streaming: point it at a folder of MP3 files and go
- Raw PCM file output: connect with
--output recording.pcmto capture the audio stream to disk instead of playing it - Lobby phase: clients join before playback starts and mid-stream joins are explicitly rejected so latecomers never desync the room
- Playlist shuffle:
--randomflag randomizes track order - Structured JSON logs: written to
~/.gophercast/logs/so they never interfere with the TUI
Platform Support
| Platform | Status | Notes | | -------- | --------------- | -------------------------------------------------------------------------------- | | Linux | Fully supported | Tested on Fedora/Ubuntu via ALSA/PipeWire | | macOS | Should work | oto supports CoreAudio, not regularly tested | | Windows | Likely broken | oto's Windows backend has known issues with the PCM pipeline used here, untested |
GopherCast relies on oto/v3 for audio output. On Linux it requires either ALSA or PipeWire. On macOS it uses CoreAudio directly with no extra dependencies.
Prerequisites
System audio library (Linux only)
Fedora / RHEL / CentOS:
sudo dnf install alsa-lib-devel
Debian / Ubuntu:
sudo apt install libasound2-dev
yt-dlp (required for YouTube features)
pip install yt-dlp
# or
pipx install yt-dlp
yt-dlp must be available on your PATH. If you only stream local files, you do not need it.
Go 1.24+
Download from go.dev/dl or use your system package manager.
Installation
Build from source
git clone https://github.com/shricodev/gophercast
cd gophercast
make build
The binary is written to bin/gophercast. You can copy it to anywhere on your PATH:
sudo cp bin/gophercast /usr/local/bin/
Run directly
make run
Usage
GopherCast has two subcommands: serve and play.
Serving Audio
The serve command launches the interactive TUI. From there you pick a source, optionally select tracks, open a lobby, and start streaming.
gophercast serve
You can also pre-configure the source with flags to skip the source-selection step:
Flags:
-d, --dir-to-mp3 string Path to a directory containing MP3 files
-y, --youtube string YouTube video URL
-p, --yt-playlist string YouTube playlist URL
--random Shuffle track order (works with --dir-to-mp3 and --yt-playlist)
Stream a local music directory:
gophercast serve --dir-to-mp3 ~/Music
Stream a YouTube video:
gophercast serve --youtube "https://youtube.com/watch?v=..."
Stream a YouTube playlist in random order:
gophercast serve --yt-playlist "https://youtube.com/playlist?list=..." --random
Once the TUI opens, the flow is:
- Select a source (if not provided via flags)
- Wait for any downloads to complete
- Choose whether to auto-select all tracks or pick them manually
- The lobby opens. Share the server address shown on screen with your clients
- When everyone has connected, press Enter to begin streaming
Connecting as a Client
On each device that should play audio, run:
gophercast play --host <ip> --port 8080
Flags:
--host string Server IP address
--port int Server port (default: 8080)
-n, --name string Display name shown in the server lobby (default: hostname)
-o, --output string Write raw PCM to this file instead of playing audio
--latency int Override auto-detected audio latency in milliseconds
Connect with a custom name:
gophercast play --host <ip> --port 8080 --name "Kitchen Speaker"
Capture audio to a file instead of playing:
gophercast play --host <ip --port 8080 --output session.pcm
Override latency if your device is consistently out of sync:
gophercast play --host <ip --port 8080 --latency 120
The client must connect before the server host starts playback. Once streaming begins, any new connection attempt is rejected with a clear message.
Synchronization Model
Getting multiple speakers to start at the same millisecond is the central challenge GopherCast solves. Here is how:
Initial sync (wall-clock alignment)
When the host starts playback, the server calculates a future start time:
target = now + lead_time + max_client_latency
Each client receives a personalized start_at_ns (Unix nanoseconds) adjusted for its individual audio pipeline latency. A client with a 150ms pipeline latency gets an earlier timestamp than one with a 50ms latency, so the actual sound exits both speakers at the same wall-clock moment. The default lead time is 750ms for the first track and 500ms for subsequent tracks (the audio sink is already warm).
This assumes all machines are NTP-synchronized. On a typical LAN, NTP keeps clocks within 1ms of each other, which is well within the threshold of human perception (~20ms for audio onset differences).
[!NOTE] But the audio kept drifting when multiple clients were connected for a long time. I wasn’t familiar with the issue, and Sonnet suggested the cause. This part was completely vibe-coded with Sonnet 4.6. Need to check the impl better.
Even with a perfect start, clocks drift. GopherCast clients measure their own drift every 50 audio frames (~1.15 seconds) by comparing samples written to wall-clock elapsed time. If drift exceeds 2ms:
- Behind (real time is ahead of playback): trim a few samples from the end of the current chunk
- Ahead (playback is ahead of real time): duplicate a few samples at the end of the current chunk
The correction is capped at 32 samples (~0.7ms) per interval to prevent audible glitches. This keeps all clients aligned across playlists that run for hours.
Single-Machine Multi-Terminal Mode
GopherCast is designed for multiple physical devices, but you can run both the server and multiple clients on the same machine. Each client terminal opens its own audio output via oto, which most systems will mix together through the audio server (PipeWire, PulseAudio, CoreAudio).
The practical effect is that the same audio plays from multiple output streams simultaneously, amplifying the overall volume through software mixing. This is not the intended use case, and audio quality may degrade at higher client counts due to mixing artifacts, but it works.
# Terminal 1: start the server
gophercast serve
# Terminal 2, 3, 4...: connect as clients on the same machine
gophercast play --host 127.0.0.1 --port 8080 --name "instance-2"
gophercast play --host 127.0.0.1 --port 8080 --name "instance-3"
Note that oto creates a new audio pipeline per client process. Whether the OS mixes them cleanly depends on your audio subsystem. PipeWire and PulseAudio handle this well. Direct ALSA access may not.
Why WebSocket over WebRTC or UDP?
Both WebRTC and RTP/UDP would technically work, but they solve problems GopherCast simply does not have. WebRTC is designed for peer-to-peer meshes, and the open internet. RTP/UDP's no-head-of-line-blocking advantage is basically irrelevant on a LAN where packet loss is near zero. Honestly, I have never worked with WebRTC before and I am not comfortable with it yet, so reaching for it on a personal project felt like the wrong call. WebSocket keeps it simple, ships control and audio on one connection, and uses a library I actually know. Good enough for a LAN tool.
[!NOTE] If I ever get comfortable with WebRTC or RTP/UDP and is practical enough, this might migrate to one of those transports. No plans for now though.
Architecture
Streaming Pipeline
MP3 file
|
| go-mp3 decodes to raw PCM
v
PCM chunks (4096 bytes = ~23ms at 44.1kHz)
|
| server paces output to real time
| wall-clock drift correction
v
WebSocket binary frames
[4B seq] [8B sample_offset] [4096B PCM payload]
|
| broadcast to all clients
v
Client audio buffer
|
| waits until start_at_ns
v
oto audio output (16-bit signed LE, stereo, 44.1kHz)
Server State Machine
lobby --> playing --> stopped
lobby: accepts new client connections, waits for host to start playbackplaying: streams audio, rejects new connectionsstopped: playlist ended or host quit
Wire Protocol
Control messages travel as JSON over WebSocket text frames using an envelope format:
{ "type": "start_playback", "data": { ... } }
Message types: hello, server_state, client_list, start_playback, stop_playback, track_change, reject
Audio frames travel as WebSocket binary frames with a 12-byte header:
| 4 bytes: sequence number |
| 8 bytes: sample offset |
| N bytes: raw PCM payload |
Package Layout
cmd/ Cobra CLI commands: root, serve, play
server/ AudioServer, hub, WebSocket handlers, broadcaster, streamer
client/ AudioClient, AudioSink interface, drift corrector
tui/ Bubble Tea TUI (model, view, keys, messages, server handlers)
pkg/
protocol/ Wire protocol types, frame format, marshal helpers
types/ Core domain types: Track, Playlist, Path, Source
internal/
downloader/ yt-dlp wrapper with concurrent worker pool
playlist/ Directory scanner, builds Playlist from MP3 files
logger/ Structured JSON logger (log/slog) with file output
testutil/ Assertion helpers for table-driven tests
Configuration
GopherCast uses a YAML config file at $HOME/.gophercast.yaml. It is optional and all settings can be passed as flags instead.
# $HOME/.gophercast.yaml
You can also point to a different config file:
gophercast --config /path/to/config.yaml serve
Logs
Logs are written in structured JSON format to:
~/.gophercast/logs/YYYY/MM/DD.json
They are intentionally written to a file rather than stdout so they never corrupt the Bubble Tea TUI display. Each log entry includes the timestamp, level, message, and relevant structured fields such as client_id, event, port, and error.
To follow logs in real time while the server is running:
tail -f ~/.gophercast/logs/$(date +%Y)/$(printf '%02d' $(date +%-m))/$(date +%d).json | jq # make sure to have jq installed
Development
make build # Build binary to bin/gophercast
make run # Build and run
make serve # Build and serve ~/Music as default directory
make fmt # Format all Go code
make test # Run all tests
make test-verbose # Run tests with verbose output
make clean # Remove compiled binary
# Run a single test by name
go test -v -run TestFunctionName ./path/to/package/...
External tool dependency
The downloader tests mock yt-dlp calls via the context cancellation path. If you want to run an end-to-end download test manually, ensure yt-dlp is installed and on your PATH.
Adding a new audio sink
Implement the AudioSink interface in client/:
type AudioSink interface {
Init(sampleRate, channels int) error
Write(data []byte)
Close()
Latency() time.Duration
}
Pass your implementation to NewAudioClient and it will receive the raw PCM stream.
Known Limitations
- Mid-stream joins are not supported. If you connect after the host has started playback, the server will reject your connection. You must connect during the lobby phase.
- MP3 only. The server uses go-mp3 which decodes MP3 files. Other formats (FLAC, AAC, OGG) are not supported without additional decoder plumbing.
- LAN only. The synchronization model relies on NTP-aligned clocks. Over the internet, clock skew and variable latency would break the wall-clock sync approach. There is no NAT traversal.
- Windows (also MacOS) is untested. oto's Windows backend may work, but the project has not been tested there. ALSA-specific build tags are not present, so a build attempt on Windows might succeed or fail depending on the oto version.
- One server per session. There is no multi-room support. Each
gophercast serveinstance handles one playlist and one group of clients.
License
Licensed under the Apache License, Version 2.0.
Copyright 2025 [email protected]
