Logging That's Actually Useful: Structured Logs and Log Levels
It’s 2am. Something is broken in production. The alert fired, the on-call engineer is awake, and the first thing they’re going to do is check the logs.
If those logs are a wall of concatenated strings with no consistent format, no context about what was happening, and no clear distinction between “this is informational” and “this is why the thing broke” - the logs are not helping. They’re just a very expensive way to store confusion.
Good logging is an operational discipline. It requires thinking about what information you’ll need when you’re tired, stressed, and can’t reproduce the issue locally.
The problem with print-style logs
Most logging starts like this:
print(f"Processing order {order_id} for user {user_id}")
print(f"Payment failed: {error_message}")
print(f"Order {order_id} complete")
Or the slightly more sophisticated version:
logger.info("Processing order " + str(order_id) + " for user " + str(user_id))
logger.error("Payment failed: " + error_message)
These logs are readable by a human looking at one request in isolation. They are nearly useless at scale.
If you want to find all failed payments for a specific user in the last hour, you’re running a regex over a text file. If you want to know the average time between “processing” and “complete” for orders, you’re parsing strings. If your log aggregator wants to alert on elevated error rates, it needs to parse the severity out of the log text.
Structured logging solves this by making the log a data record instead of a string.
What structured logging looks like
Instead of building a string, you log a structured object. The logger serializes it to JSON (or another structured format):
{
"timestamp": "2026-04-18T14:23:01.234Z",
"level": "INFO",
"message": "Processing order",
"order_id": "ord_8472",
"user_id": "usr_129",
"service": "payment-service",
"trace_id": "4bf92f3577b34da6"
}
Now every field is queryable. Your log aggregation platform - whether that’s Datadog, Elasticsearch, Loki, or CloudWatch - can index on user_id, filter on level, group by service, and join with other records by trace_id. The difference in operational capability is significant.
In Python, using structlog or the standard logging library with a JSON formatter:
import structlog
log = structlog.get_logger()
log.info("processing_order", order_id="ord_8472", user_id="usr_129")
log.error("payment_failed", order_id="ord_8472", error_code="insufficient_funds", amount=99.99)
In Node.js, pino or winston with JSON output:
const logger = pino({ level: 'info' });
logger.info({ orderId: 'ord_8472', userId: 'usr_129' }, 'Processing order');
logger.error({ orderId: 'ord_8472', errorCode: 'insufficient_funds', amount: 99.99 }, 'Payment failed');
The message is still a human-readable string, but the context is structured data attached to it.
Log levels: what they actually mean
Log levels exist to communicate severity and to let you filter noise. The standard set is:
DEBUG - Detailed information for diagnosing problems during development. Should never be enabled in production by default. “Entering function X with params Y” is DEBUG. If you’re logging every database query, that’s DEBUG.
INFO - Normal operational events. Things that are supposed to happen: request received, order placed, user logged in. The story of your application operating normally. INFO logs should read like a transaction log - meaningful, not noisy.
WARN - Something unexpected happened, but the application recovered and the request succeeded. A retry succeeded after one failure. A value was missing and a default was used. Something that should be investigated if it happens frequently, but isn’t an error in isolation.
ERROR - Something failed and the application could not recover from it for this request. A payment failed. A database write failed. An external API returned 500. This is what alerts should be watching.
FATAL/CRITICAL - The application cannot continue. A required configuration is missing. The database connection pool is exhausted. These usually precede a crash or shutdown.
The test for each level: if you’re paged at 2am because of an ERROR, does every log line at that level deserve it? If you grep for WARN and find hundreds of lines about expected edge cases, those should be DEBUG. If you search for INFO and can’t find the important events because they’re buried in noise, some DEBUG is being logged at INFO.
What context to include
Structured logs are only as useful as the fields you put in them. Some fields belong in every log line:
Timestamp - With millisecond precision. UTC, not local time.
Level - The severity as a string or normalized value.
Service - The name of the service emitting the log. Critical in a distributed system where logs from multiple services go to the same aggregator.
Request ID / Trace ID - A unique identifier for the request or operation. This is what lets you find all the log lines for one specific request across multiple services. If you don’t have this, correlating logs across a failure is painful.
User ID or session ID - When it’s available and relevant, attach identity. “Payment failed” is much less useful than “Payment failed for user usr_129”.
Correlation IDs from upstream systems - If your service is called by another service, propagate their trace ID. This enables distributed tracing.
For errors specifically, include the full stack trace and any context that helps reproduce the issue: input values, the operation being performed, the error code from the external system.
What to avoid
Don’t log sensitive data. Passwords, tokens, credit card numbers, and personal information have no place in logs. This is both a security concern and often a compliance requirement. Review your log fields carefully.
Don’t log inside tight loops without thought. INFO-level logs inside a loop that executes 10,000 times per request are not INFO. They’re a throughput problem.
Don’t use logging as a substitute for proper error handling. Catching an exception, logging it, and continuing as if nothing happened hides bugs. Log errors where they’re meaningful, but also handle them properly.
Don’t write logs that only make sense to the person who wrote the code. “Got here” is not a log message. “Null value encountered” without indicating which field or what was being processed is not useful. Write for the engineer who will read this log at 2am with no context.
Setting up log context
One pattern that significantly reduces repetition is binding context to the logger so you don’t have to pass it with every call. Most structured logging libraries support this:
# Bind request context at the start of a request handler
log = log.bind(request_id=request.id, user_id=request.user.id, endpoint=request.path)
# Every subsequent log call automatically includes the bound context
log.info("processing_request")
log.info("fetching_user_data")
log.error("database_timeout", table="orders", timeout_ms=5000)
In a web framework, this typically lives in middleware - bind the request ID when the request comes in, and every log within that request’s handling chain carries it automatically.
The operational payoff
Good logs let you answer questions you didn’t know you’d need to ask. Not just “what went wrong” but “how often does this happen”, “which users are affected”, “did this start after the last deploy”, “is this correlated with a specific endpoint”.
Structured, well-leveled logs with consistent context are the minimum viable foundation for understanding what your application is doing. Everything else in observability - metrics, tracing, alerts - builds on top of having legible records of what happened.
The engineer who writes the logs and the engineer who reads them in a crisis are often the same person. Write for that person.