Cryptography Basics Every Developer Should Know


There’s a rule in cryptography that most developers have heard: don’t roll your own crypto. It’s good advice, but it leaves out something important. If you don’t understand the tools you’re using, you’ll misuse them in ways that are just as bad as implementing them yourself - and much harder to spot.

Using MD5 to hash passwords. Storing symmetric keys next to the data they encrypt. Using ECB mode for block cipher encryption. None of these require implementing a cipher from scratch. They’re all mistakes made by developers using the right library in the wrong way.

The goal here isn’t mathematical rigor. It’s the minimum mental model needed to use cryptographic tools correctly.

Hashing

A cryptographic hash function takes input of any length and produces a fixed-length output. The same input always produces the same output. Different inputs should produce different outputs - collisions (two inputs producing the same hash) should be computationally infeasible to find.

Hashing is one-way. You cannot recover the original input from the hash. This makes it useful for:

  • Verifying data integrity - hash a file, store the hash, later re-hash to confirm nothing changed
  • Storing passwords - store the hash, not the password
  • Generating fingerprints or content-addressed identifiers

Common hash functions: SHA-256, SHA-3, BLAKE2. These are general-purpose cryptographic hash functions.

What not to use MD5 and SHA-1 for: both have known collision vulnerabilities. Don’t use them where collision resistance matters. They’re still fine for non-security uses like checksums or hash tables, but not for anything you’re relying on for security.

Password hashing is different

General-purpose hash functions are fast by design. For passwords, that’s a problem - a fast hash function lets an attacker try billions of passwords per second with a GPU.

Password hashing needs to be slow. Use algorithms designed for this:

  • bcrypt - the classic choice, deliberately slow, has been battle-tested for decades
  • Argon2 - the winner of the Password Hashing Competition, better tuning options
  • scrypt - memory-hard, harder to parallelize on ASICs

All three have a “work factor” or “cost” parameter you can tune to keep them slow as hardware gets faster. A modern bcrypt cost factor of 12 takes about 300ms - that’s fine for a login flow, but makes a brute-force attack 10 billion times harder than using SHA-256.

If you’re storing passwords with SHA-256 or MD5, you are storing passwords insecurely.

Symmetric encryption

Symmetric encryption uses the same key to encrypt and decrypt. It’s fast and suitable for encrypting large amounts of data.

The standard choice today is AES-GCM (Advanced Encryption Standard in Galois/Counter Mode). There are a few things to get right:

The key - must be random, must be kept secret. 256 bits is standard. The security of symmetric encryption is entirely in the key.

The IV/nonce - most modern modes require a unique value per encryption operation. In GCM, this is called a nonce (“number used once”). It doesn’t need to be secret, but it must be unique. Reusing a nonce with the same key in GCM catastrophically breaks security - an attacker can recover the key. Generate it randomly for each encryption.

Authenticated encryption - GCM provides both confidentiality (you can’t read the plaintext without the key) and authenticity (you can detect if the ciphertext was tampered with). This is what the “G” in GCM means. Always use an authenticated mode - CBC without authentication is vulnerable to tampering attacks.

from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os

key = AESGCM.generate_key(bit_length=256)
nonce = os.urandom(12)  # 96 bits, random, unique per operation
aead = AESGCM(key)

ciphertext = aead.encrypt(nonce, plaintext, associated_data)
plaintext = aead.decrypt(nonce, ciphertext, associated_data)

The associated data is optional - it’s additional context that gets authenticated but not encrypted (like a user ID or record type). If the associated data doesn’t match on decryption, it fails.

Asymmetric encryption

Asymmetric encryption uses a key pair: a public key and a private key. What one key encrypts, only the other can decrypt.

The public key can be shared with anyone. The private key must be kept secret.

Two main uses:

Encryption: Encrypt with the recipient’s public key. Only the recipient (with the private key) can decrypt. Useful for sending something to someone whose public key you have, without needing a shared secret.

Signing: Sign with your private key. Anyone with your public key can verify the signature. This proves the message came from you and wasn’t tampered with.

RSA and Elliptic Curve Cryptography (ECC) are the main families. ECC (specifically Ed25519 for signing, X25519 for key exchange) is generally preferred for new systems - smaller keys, faster operations, and fewer implementation footguns than RSA.

Asymmetric encryption is slower than symmetric encryption and not designed for encrypting large amounts of data directly. In practice, asymmetric crypto is used to exchange or establish a symmetric key, and then symmetric crypto does the actual data encryption. This is exactly what TLS does.

Signatures and verification

A digital signature proves that a piece of data came from the holder of a private key and hasn’t been modified.

The signing process: hash the data, then encrypt that hash with your private key. The result is the signature.

The verification process: decrypt the signature with the public key to get the hash, then independently hash the data and compare. If they match, the data is authentic and unmodified.

This is used everywhere: software packages are signed so you can verify they came from the publisher. JWTs can be signed with asymmetric keys so a service can verify claims without having access to the signing key. Git commits can be signed.

A signature is not encryption. A signed message is not confidential - it’s just verifiable.

How TLS puts it together

When your browser connects to a site over HTTPS, TLS uses all of the above:

  1. The server presents a certificate containing its public key. The certificate is signed by a Certificate Authority (CA) whose root certificate your browser trusts.

  2. Your browser verifies the certificate chain - the CA’s signature proves the server is who it claims to be.

  3. The client and server perform a key exchange (using asymmetric crypto - Diffie-Hellman or an elliptic curve variant) to agree on a shared session key. The private key never leaves the server, but both sides end up with the same symmetric key.

  4. All subsequent communication is encrypted with AES-GCM using that symmetric session key.

The asymmetric crypto handles identity verification and key establishment. The symmetric crypto handles the actual data encryption. This is the standard hybrid approach.

A TLS certificate proves that the public key belongs to a specific domain - that’s it. The CA is asserting “we verified this domain, and this is their public key.” It says nothing about the character of the site owner or the content of the site. HTTPS means the channel is encrypted; it doesn’t mean the site is trustworthy.

A few practical rules

Use well-maintained libraries. In Python: cryptography. In Node.js: the built-in crypto module or libsodium-wrappers. Don’t implement primitives yourself.

Use high-level APIs when available. Many libraries offer higher-level abstractions that make it harder to misuse the primitives - like libsodium’s secretbox and box which handle nonce generation and mode selection for you.

Generate keys and nonces with a cryptographically secure random number generator. Not Math.random(). Not random.random(). The os.urandom() equivalent in your language.

Store secrets outside your code. Keys, credentials, and tokens don’t belong in source code. Environment variables, a secrets manager, or a dedicated vault - not a config file committed to the repository.

Understand what you’re protecting and against what. Encryption at rest protects against someone reading your database backup. Encryption in transit protects against network eavesdropping. Neither protects against a compromised application server. Threat modeling - knowing who the attacker is and what they can do - tells you which tools are relevant.

Cryptography is one of those fields where the gap between “using it” and “using it correctly” is large. The concepts here won’t make you a cryptographer, but they’ll help you recognize the wrong tool and ask better questions when something doesn’t feel right.



Read more