Docker from Scratch: Images, Containers, and Registries


Most engineers use Docker without really understanding what it is. They copy a Dockerfile from a tutorial, run docker build and docker run, and it works. When it doesn’t, they paste the error into a search engine and try things until it does.

This article is about understanding what’s actually happening - so you can debug it, optimize it, and make good decisions about how you use it.

The Problem Docker Solves

Before Docker, deploying an application meant getting a server into a state where your code would run: the right OS, the right language runtime version, the right system libraries, the right environment variables. This state was usually documented in a README, partially automated with shell scripts, and routinely wrong on any machine other than the one it was written on.

The classic move was to provision a server, SSH in, and manually install things until the app started. Then hope nobody else touched the server. This is sometimes called “snowflake infrastructure” - every server is unique, fragile, and impossible to recreate exactly.

Docker solves this by making the environment part of the artifact. Instead of deploying code and hoping the server is configured right, you deploy a container image - a complete package containing your code and everything it needs to run.

What an Image Actually Is

A Docker image is not a virtual machine. It’s not a snapshot of a running OS. It’s a stack of filesystem layers.

When you write a Dockerfile:

FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
CMD ["node", "server.js"]

Each instruction that modifies the filesystem creates a new layer. The FROM instruction starts from an existing base image (which is itself a stack of layers). RUN npm ci adds a layer containing node_modules. COPY . . adds a layer containing your source code.

You can see this with:

docker history my-app:latest

Each line is a layer. Layers are content-addressed - they’re identified by a hash of their contents. If two images share a common base, those layers are stored once on disk and referenced by both.

This is why layer order matters in a Dockerfile. Docker caches each layer. If a layer hasn’t changed, Docker reuses the cached version. If layer N changes, all layers after N are invalidated and rebuilt.

This is why the pattern above puts COPY package*.json and npm ci before COPY . .. Your source code changes on every build, but your dependencies don’t. By installing dependencies before copying source, you get a cache hit on npm ci most of the time.

Reverse the order and every build reinstalls all dependencies from scratch.

Containers: Images in Motion

An image is static - it’s a description of a filesystem state. A container is what you get when you run an image.

When Docker starts a container, it creates a thin writable layer on top of the image’s read-only layers. All changes the container makes go into this writable layer. The underlying image is never modified.

This is why you can run fifty containers from the same image simultaneously without them interfering with each other - they all share the same read-only layers but each has its own writable layer.

When a container is deleted, the writable layer is deleted with it. Any data written inside the container is gone. This is intentional - containers are meant to be ephemeral. If you need data to survive container restarts, use a volume:

docker run -v /host/path:/container/path my-app

Volumes are directories that exist outside the container’s filesystem and get mounted in. They survive container restarts and can be shared between containers.

What Actually Runs in a Container

Docker containers use Linux kernel features - specifically namespaces and cgroups - to isolate processes.

Namespaces give each container its own view of certain system resources. A container gets its own network namespace (its own network interfaces, its own IP address), its own PID namespace (processes inside the container see their own process tree starting at PID 1), its own filesystem namespace (mount points are isolated), and others.

Cgroups (control groups) limit how much of the host’s resources a container can use: CPU, memory, network bandwidth.

This is different from a virtual machine. A VM emulates hardware and runs a full OS kernel. A container runs on the host kernel directly, using namespaces and cgroups to create the appearance of isolation. This is why containers start in milliseconds and have almost no overhead compared to VMs - there’s no second kernel to boot.

It also means containers are only as isolated as the kernel allows. A container escape is a real attack vector where code running inside a container exploits a kernel vulnerability to break out. This is why running containers as root inside the container is a bad idea:

# Don't do this - default is root
CMD ["node", "server.js"]

# Do this
RUN addgroup -S app && adduser -S app -G app
USER app
CMD ["node", "server.js"]

Registries

An image lives on your local machine after you build it. To use it somewhere else - a staging server, a production cluster, a colleague’s laptop - you need to push it to a registry.

A registry is a content-addressed store for Docker images. Docker Hub is the default public one. ECR (Amazon), GCR (Google), and ACR (Azure) are common private options. You can also run your own.

The push/pull protocol is efficient because of layers. When you push an image, Docker checks which layers already exist in the registry and only uploads the ones that are new. Pull works the same way - only layers you don’t have locally are downloaded.

Image names encode the registry location:

registry.example.com/team/app-name:tag
^                    ^    ^         ^
registry host        org  name      version

nginx:latest is shorthand for docker.io/library/nginx:latest. The registry host defaults to Docker Hub, the org to library (official images), and the tag to latest.

latest is a tag like any other - it’s not automatically the newest version. It’s whatever the image maintainer decided to tag as latest last. Relying on latest in production means your deployments are non-deterministic. Pin to a specific tag, ideally a digest:

node:20.11.1-alpine
node@sha256:4a92b3c3...  # immutable - content-addressed

Multi-Stage Builds

One common mistake is building your final image from a full build environment. A Node.js app built from a node:20 image carries the full Node.js toolkit even though production only needs the runtime.

Multi-stage builds let you separate the build environment from the runtime environment:

# Stage 1: build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: runtime
FROM node:20-alpine AS runtime
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY --from=builder /app/dist ./dist
CMD ["node", "dist/server.js"]

The final image contains only what’s needed to run. The builder stage - with all its build tools, dev dependencies, and intermediate artifacts - is discarded. The result is a smaller, faster, more secure image.

The Mental Model

Think of Docker in three layers:

Images are blueprints. Immutable, layered, content-addressed. They live in registries.

Containers are instances. Ephemeral, isolated processes running from an image. They share the host kernel, isolated by namespaces and cgroups.

Registries are distribution. How images get from where they’re built to where they run.

Most Docker confusion comes from conflating these - treating containers as persistent (they’re not), treating latest as a version (it’s a pointer), treating images as VMs (they’re layered filesystems).

Once you see the model clearly, the behavior stops being surprising.



Read more