Git Internals: What Actually Happens When You Commit


Most developers use Git by memorizing commands. git commit, git branch, git merge, git rebase - you learn what they do, not what they are. This works fine until something goes wrong, at which point you’re guessing at commands and hoping for the best.

The thing that makes Git behavior predictable is understanding its data model. Once you see what a commit actually is, branches stop being mysterious, rebasing makes sense, and the “detached HEAD” warning becomes obvious rather than alarming.

Everything Is an Object

Git stores all its data as objects in .git/objects/. There are four types: blobs, trees, commits, and tags. The first three are what matter for understanding commits.

A blob stores file contents. Just the contents - no filename, no permissions, just bytes. Every version of every file you’ve ever committed is stored as a blob.

A tree represents a directory. It maps filenames to blobs (for files) or other trees (for subdirectories), along with permissions.

A commit points to a tree (representing your project’s root directory at that point in time), references one or more parent commits, and stores author, committer, timestamp, and your commit message.

Every object is identified by its SHA-1 hash - a 40-character hex string derived from the object’s contents. This is why Git is content-addressed: the same content always produces the same hash.

Inspecting Real Objects

You can examine any object with git cat-file:

# Look at the most recent commit
git cat-file -p HEAD

Output:

tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent 9fceb02d0ae598e95dc970b74767f19372d61af8
author Milan Maksimovic <milan@example.com> 1713398400 +0200
committer Milan Maksimovic <milan@example.com> 1713398400 +0200

Add user authentication endpoint

Now look at the tree it points to:

git cat-file -p 4b825dc642cb6eb9a060e54bf8d69288fbee4904

Output:

100644 blob a8c3f8e2... README.md
040000 tree d3e8f4a1... src
100644 blob b1f2c3d4... package.json

And follow the src tree:

git cat-file -p d3e8f4a1...
100644 blob c9d2e1f0... auth.js
100644 blob f8a7b6c5... index.js

This is the complete structure: a commit points to a tree, which points to blobs and other trees, all the way down to individual file contents.

A Commit Is a Snapshot, Not a Diff

This is the key insight. When you run git commit, Git doesn’t store what changed since the last commit. It stores the complete state of your project - every file, every directory - as a tree of objects.

If 10 out of 1000 files changed, Git creates new blobs for the 10 changed files and a new tree structure. The 990 unchanged files are represented in the new tree by pointers to the same blobs that existed before. No duplication - the content is shared.

git diff is not how Git stores data. It’s computed on the fly by comparing two trees. The storage model is snapshots. The display model is diffs. Don’t confuse them.

What a Branch Actually Is

Open .git/refs/heads/main:

cat .git/refs/heads/main
# 9fceb02d0ae598e95dc970b74767f19372d61af8

A branch is a text file containing a single commit hash. That’s it. When you make a new commit on main, Git updates this file to point to the new commit. When you create a new branch, Git creates a new file with the same hash.

This is why branches in Git are cheap. Creating a branch copies one 40-character string. There’s no copying of files, no duplicating history. Just a new pointer.

What HEAD Is

HEAD is a pointer to the current branch (or directly to a commit in “detached HEAD” state):

cat .git/HEAD
# ref: refs/heads/main

When you git checkout a branch, HEAD gets updated to point to that branch. When you make a commit, the current branch pointer advances to the new commit - and HEAD follows automatically because it points to the branch.

“Detached HEAD” happens when HEAD points directly to a commit hash instead of a branch name. You end up there when you git checkout a specific commit hash, a tag, or a remote branch. Making commits in this state works fine - new commits are created - but because no branch pointer is advancing, you can “lose” those commits when you checkout something else. Git will warn you about this.

# Detach HEAD at a specific commit
git checkout 9fceb02

cat .git/HEAD
# 9fceb02d0ae598e95dc970b74767f19372d61af8
# (a hash, not a branch reference)

How Merging and Rebasing Work

Merge creates a new commit with two parents - the tip of the current branch and the tip of the branch being merged. The commit graph now has a node with two incoming edges. History is preserved exactly as it happened.

Rebase replays your commits on top of another branch. It takes each commit in your branch, computes the diff it introduced, and applies that diff on top of the target branch - creating new commits with new hashes. The result is a linear history, but the commits are not the same objects. They have new hashes because they have new parent pointers.

This is why rebasing rewrites history, and why you should not rebase commits that have already been pushed and shared with others. The “same” commit is now a different object with a different hash. Anyone who based work on the original commit now has divergent history.

The Practical Value

Understanding this model makes several things click:

git reset --hard <hash> moves the current branch pointer to that hash. No magic - it’s just updating a file.

git cherry-pick takes a commit, computes its diff, and applies that diff as a new commit on the current branch. Same content, new hash, new parent.

git stash creates a commit (or two) that aren’t pointed to by any branch. Your working state becomes an object in the graph, temporarily invisible but recoverable.

Lost commits aren’t really lost. Git keeps objects around for a while even after they’re dereferenced. git reflog shows where HEAD has pointed, and git checkout <hash> can recover any commit you can find the hash for.

The data model is a directed acyclic graph of immutable objects. All the commands are operations on that graph. Once you see the graph, the commands become obvious.



Read more