WebSockets: When Request-Response Is Not Enough
HTTP is a request-response protocol. A client sends a request, a server sends a response, the transaction ends. For most of the web, this is fine. For some use cases, it’s the wrong model entirely.
Consider a chat application. User A sends a message. User B needs to see it immediately - without refreshing the page or polling the server every few seconds. With request-response HTTP, you have two options: short polling (client asks “any new messages?” repeatedly on a timer) or long polling (client makes a request, server holds it open until there’s something to return, then the client makes another request). Both feel like workarounds because they are.
WebSockets exist to solve this problem cleanly.
How the Connection Works
A WebSocket connection starts as an HTTP request. The client sends a special Upgrade header requesting a protocol switch:
GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
The server, if it supports WebSockets, responds with 101 Switching Protocols:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
From this point, the TCP connection that was carrying HTTP is repurposed. Both client and server can send messages at any time, in either direction, without the overhead of a new HTTP request for each message. The connection stays open until explicitly closed.
This is the key difference from HTTP: the connection is persistent and bidirectional. Server-initiated messages are native, not a workaround.
The Wire Format
WebSocket messages are sent as “frames.” Each frame has a small header (2-14 bytes) followed by the payload. The header indicates:
- Whether this is the final fragment of a message (WebSocket supports message fragmentation)
- The opcode (text, binary, ping, pong, close)
- Whether the payload is masked (client-to-server messages are always masked; server-to-client messages are not)
- The payload length
Text frames carry UTF-8 encoded text. Binary frames carry arbitrary bytes. Ping and pong frames are for keep-alive - the client or server sends a ping and expects a pong back.
The framing overhead is minimal compared to HTTP. An HTTP request includes headers that can be hundreds of bytes for every interaction. A WebSocket frame for a short message is a few bytes of overhead.
Building a WebSocket Server
Most web frameworks have WebSocket support. Here’s the pattern in Node.js with the ws library:
import { WebSocketServer } from 'ws';
const wss = new WebSocketServer({ port: 8080 });
wss.on('connection', (ws, request) => {
console.log('Client connected');
ws.on('message', (data) => {
const message = JSON.parse(data.toString());
// Handle message
// Broadcast to all connected clients:
wss.clients.forEach(client => {
if (client.readyState === ws.OPEN) {
client.send(JSON.stringify({ type: 'message', ...message }));
}
});
});
ws.on('close', () => {
console.log('Client disconnected');
});
ws.on('error', (error) => {
console.error('WebSocket error:', error);
});
});
On the client:
const ws = new WebSocket('wss://example.com/chat');
ws.addEventListener('open', () => {
ws.send(JSON.stringify({ type: 'join', room: 'general' }));
});
ws.addEventListener('message', (event) => {
const data = JSON.parse(event.data);
displayMessage(data);
});
ws.addEventListener('close', (event) => {
// Attempt reconnection
setTimeout(connect, 1000 * Math.min(retries++, 30));
});
Handling State at Scale
A single WebSocket server process can handle thousands of concurrent connections. The problem arises when you run multiple server instances: user A is connected to server 1 and user B is connected to server 2. A message from user A to user B needs to cross from server 1 to server 2.
The standard solution: a message broker between server instances. Redis pub/sub is commonly used. When server 1 receives a message, it publishes it to Redis. All server instances (including server 2) subscribe to the relevant channel and forward to their connected clients.
// Using Redis pub/sub with ioredis
const publisher = new Redis();
const subscriber = new Redis();
// Subscribe to room channel
subscriber.subscribe(`room:${roomId}`, (err, count) => {
subscriber.on('message', (channel, message) => {
// Broadcast to all clients on this server connected to this room
broadcastToRoom(roomId, message);
});
});
// When receiving a message from a connected client
ws.on('message', (data) => {
publisher.publish(`room:${roomId}`, data);
});
Socket.IO, a popular WebSocket library for Node.js, has adapters for Redis and other brokers that handle this automatically.
Authentication
WebSocket connections start as HTTP requests, so cookie-based authentication works naturally - cookies are sent with the upgrade request. Token-based authentication (JWTs) typically happens in the connection URL or in the first message after connecting:
// Send auth token in first message after connection
ws.addEventListener('open', () => {
ws.send(JSON.stringify({ type: 'auth', token: getAuthToken() }));
});
On the server, don’t trust unauthenticated connections with any data. Validate authentication early and close the connection if it fails.
When to Use WebSockets
WebSockets are the right choice when:
- The server needs to push data to the client without a client request (live notifications, real-time scores, collaborative cursors)
- High-frequency bidirectional messaging (chat, multiplayer game state, collaborative editing)
- Latency is critical and HTTP overhead per message is significant
WebSockets are overkill when:
- You need occasional server-to-client updates (use Server-Sent Events - simpler, HTTP-based, one-direction)
- Updates are infrequent (polling every 30 seconds is simpler to implement and operate)
- The data is inherently request-response (use HTTP, that’s what it’s for)
Server-Sent Events (SSE) deserve a mention: it’s an HTTP-based mechanism for server-to-client push, simpler than WebSockets, supported natively in browsers. If you only need server-to-client messages (dashboards, live feeds, notifications), SSE is often the better choice.
Reliability Concerns
WebSocket connections drop. Networks hiccup, load balancers time out idle connections, mobile devices switch from WiFi to cellular. Every WebSocket client needs reconnection logic with exponential backoff.
Load balancers typically require sticky sessions (affinity) for WebSocket connections, or need to be configured to forward WebSocket traffic properly. Standard HTTP load balancing can close WebSocket connections unexpectedly.
Heartbeats (periodic ping/pong messages) keep connections alive through firewalls and proxies that close idle TCP connections. Most WebSocket server libraries handle this, but it needs to be configured.
The operational complexity of WebSockets is real. Connection state on servers, reconnection handling on clients, message ordering guarantees (TCP gives you ordering per connection, but what happens across reconnects?). These are solvable problems, but they’re problems you don’t have with request-response HTTP.