Understanding Cryptographic Primitives

Cryptographic Building Blocks

Any piece of software that performs cryptographic operations or implements complex cryptographic protocols like TLS relies on cryptographic primitives. Cryptographic primitives are, from the perspective of a software developer, atomic cryptographic operations.

Hashing functions (MD5, SHA-1, SHA-256, SHA-512, etc.) are one the most widely known cryptographic primitives. Other examples include symmetric encryption schemes like AES, asymmetric crypto schemes like public key systems, and digital signatures like HMACs.

An effective cryptosystem must ensure three things: confidentiality, integrity, and authenticity. Confidentiality means no one but the intended recipient(s) can make any sense of the data. Integrity means that no one has tampered with the data since it was encrypted. Authenticity means the parties communicating with each other can verify each others’ identity.

Cryptographic primitives provide for each of these three guarantees.

Encryption and Confidentiality

Confidentiality is typically provided by an encryption algorithm like the 128-bit or 256-bit block cipher AES (Advanced Encryption Standard).

AES is a block cipher which means it operates on fixed-size chunks of data. Both AES-128 and AES-256 operate on 16 byte blocks, though their key sizes differ. AES-128 uses 128 bit/16 byte private keys and AES-256 uses 256 bit/32 byte private keys. Both versions of AES are considered to be very secure and are widely used in industry and government.

The product of AES encryption is called a ciphertext. A ciphertext is an encrypted form of data and is akin to binary white noise. Without the private key used to encrypt the data originally, a ciphertext’s content becomes completely indiscernible, thus protecting the confidentiality of the data.

Integrity and Signatures

Using an HMAC (Hash-based Message Authentication Code) is one way to verify the integrity of data. HMAC algorithms use a hashing algorithm like SHA-256 or SHA-512 to combine a private signing key with data, producing a unique fixed-length hash.

HMACs are a great way to protect the integrity of a ciphertext. Assuming the recipient possesses the private key used to compute the HMAC, the recipient can recompute the HMAC using the private key and the ciphertext. If the HMAC computed by the recipient matches the HMAC provided by the sender, the recipient knows with absolute certainty that the ciphertext has not been altered since it was signed by the sender.

It goes without saying that the secrecy of the private key used to “sign” (i.e. HMAC the ciphertext with a private key and include the produced hash with the ciphertext) is paramount.

Authenticity

Authenticity is a more complex matter. As it turns out, establishing rock solid proof of remote computer’s identity on the Internet is damn hard. We should probably explain some nuances up front.

Authenticity can be provided in two ways. One way is using symmetric cryptographic primitives like HMACs and AES. In this simplistic model, two people that know each other can share the private key outside of the computer system by, say, telling each other over the phone or in person. If no one else knows the secret, then all of the data encrypted with AES and signed with HMAC using the secret key is absolutely secure.

The other way is to use asymmetric cryptography techniques. Using asymmetric cryptographic primitives - we’ll call them certificates - allows us to establish the identity of a remote computer that we are unable to communicate with over the phone or in person. A computer that presents a signed certificate is much like a person providing a government-issued identification card to prove their identity.

A signed certificate demonstrates that a third party signing authority - trusted by both the presenter of the certificate and the receiver of the certificate - has vouched for the identity of the entity presenting the certificate in an unforgeable fashion

Since certificates are the most common method used to prove and verify identities on the Internet, we’ll cover them first.

Asymmetric Cryptographic Certificates

Background

What you need to know about cryptographic public/private key pairs:

  • Keys are generated in pairs: one private key and one public key
  • Data encrypted with a private key can only be decrypted with the corresponding public key
  • Data encrypted with a public key can only be decrypted with the corresponding private key

X.509 Certificate Introduction

X.509 certificates are ubiquitous.

You know that little green lock that shows up next to HTTPS URLs on many of your favorite websites? That comes from a signed X.509 certificate. Arguably one of the most publicly visible manifestations of cryptography in the world.

Certificate Contents

An X.509 certificate contains many different fields but many of them are unused and unsupported by most software. For the sake of simplicity, we are only going to discuss the ones we care about:

  • Issuer
  • Validity
    • Not before
    • Not after
  • Subject (FQDN or server’s hostname)
  • Subject public key information
    • Algorithm
    • Public key
  • Signature algorithm
  • Signature

The Issuer is the name of the signing Certificate Authority (CA). More on that in a moment.

The Validity is pretty self explanatory: the certificate is not valid before the “Not before” date and the certificate is not valid after the “Not after” date, establishing a specific period of time during which the certificate is to be considered valid.

The Subject Name is typically the hostname that will be used to resolve the server that will be presenting this certificate. Well known examples would be something like www.facebook.com, www.google.com, or www.tnichols.org.

The Public Key Information includes the algorithm used to generate the public key and the public key itself.

The Signature Algorithm field specifies the algorithm used to generate the signature and indicates the hashing function used as well as the asymmetric encryption algorithm used (e.g. RSA).

The signature is a hash of the entire certificate with the Signature field zero’ed out during the hashing process which is then taken and encrypted by a signing Certificate Authority using their private key. Operating systems, applications, and browsers have trusted stores containing the public keys of many trusted third party CAs like Comodo, DigiCert, Symantec (formery Verisign), GoDaddy, and others.

Checking the Signature

When a web server presents a certificate signed by a legitimate, widely trusted CA, the browser looks up the CA’s public key in its (or the operating system’s) trusted certificate store based on the Issuer field in the presented X.509 certificate. The browser can decrypt the encrypted hash (that is a signature), zero out the Signature field, re-hash the entire certificate, and check to see that the resulting hash is identical to the hash that was decrypted using the CA’s public key, thus proving that the certificate has indeed been signed by a trusted Certificate Authority.

Public Key Infrastructure (PKI)

All of this infrastructure, all the certificate authorities, all the signing mechanisms, all the certificates - the whole kit and kaboodle - is known collectively as Public Key Infrastructure (PKI).

Here’s a little more on how Certificate Authorities actually work.

Assumptions

The big assumption being made here is that these CAs are actually trusted by everyone. Should they be? No, not in my opinion. Not in the opinion of many other security professionals as well. As it turns out, it’s a hard problem to solve. One that I worked on quite a bit during my time in an Applied Computer Security PhD program, as a matter of fact. But this is a discussion for another time.

Assuming we trust these CAs - and we generally do - to verify the identity of every person requesting a signed certificate for a given hostname, we can trust that the machine presenting the certificate is actually the machine we intended to communicate with to begin with.

It is the responsibility of the widely trusted CAs to verify that when Joe Blow requests a signed certificate for www.google.com to verify that Joe Blow does, in fact, run www.google.com. But how do they do this?

Domain Validation

The cheap way out that most of them take is a method called Domain Validation. When I request a certificate for www.tnichols.org, the CA will typically provide me with a long hash and ask me to set a DNS record for a subdomain using the hash. I would have to create an A record for .tnichols.org. Once I have done that, I notify the CA and they perform a DNS lookup of my domain using the hash as the subdomain. If the hostname resolves, the CA can safely assume that I, the requester, have complete control over the DNS records for the domain that I am requesting a signed certificate for. This, in many cases, is considered sufficient proof of identity.

Conclusion

Certificates signed by Certificate Authorities are the most common way that trust is established on the Internet. It’s how your browser knows that you are actually browsing amazon.com servers when you visit www.amazon.com.

Does this completely obviate the need for establishing authenticity through symmetric cryptographic primitives? Absolutely not. Symmetric cryptographic primitives are still very widely used to provide authenticity guarantees. Only, they play more of a role in complex protocols like TLS.

We won’t touch on TLS here; that’s an article for another day.

It’s still worth covering how symmetric cryptographic primitives provide authenticity, though.

Symmetric Cryptographic Authenticity

In order to understand how to authenticate encrypted data using symmetric cryptographic primitives, we need to understand some of the more common modes of operation of block ciphers like AES.

Common Block Cipher Modes

Electronic Codebook (ECB)

This is a naive and insecure mode. The encryption algorithm is applied independently to each block and each block is decrypted independently. Ciphertexts produced by block ciphers using ECB mode can reveal repeating patterns in plaintext. In this mode, identical plaintext input in two separate blocks would produce two identical ciphertexts. This is not ideal from a security perspective. You should never use ECB in production.

Cipher Block Chaining (CBC)

Often times, we see AES used in cipher block chaining mode. This mode of operation combines ciphertexts produced from previously encrypted blocks starting with the ciphertext produced from the random initialization vector with the plaintext of the next block to be encrypted before encrypting it. This effectively “cascades” randomness down through the cipher block chain as encryption is iteratively performed.

When using CBC, the developer must also compute an HMAC of the data. The integrity of the data must be checked prior to attempting decryption in order to avoid chosen ciphertext attacks. Since the HMAC must be verified prior to attempting decryption, it follows that the HMAC must be computed over the ciphertext, not the plaintext. It’s usually worthwhile to include all other data required for the encryption/decryption processes in the HMAC for the sake of ensuring the integrity of the entire package.

You will often see the following components in a packed, encrypted blob that was encrypted using AES-{128,256} in CBC mode:

  • Initialization vector for the encryption algorithm
  • Salt used during the key derivation/stretching process (when appropriate)
  • Ciphertext
  • HMAC over all of the above

Authenticated Encryption Schemes

“Authenticated Encryption” (AE) or “Authenticated Encryption with Associated Data” (AEAD) is a set of block cipher modes that melds authentication with encryption.

Galois/Counter Mode (GCM)

Galois/Counter Mode (GCM) is a sophisticated, efficient, and widely used authenticated encryption mode of operation for block ciphers. Due to the way it is designed, it can leverage hardware pipeline optimizations that the linear CBC mode can not, resulting in significantly better performance and efficiency for GCM as compared to CBC. If you aren’t using something like HMACs to protect the integrity and authenticity of your data, GCM is a great choice. Some may argue that using GCM to begin with is better than CBC combined with an HMAC.

Tying it all together

TLS is a great example of a cryptographic protocol that combines these primitives to achieve strong confidentiality, integrity, and authenticity guarantees for communication over an untrusted and potentially hostile network.

TLS uses public/private key pairs to establish identity and exchange symmetric private keys that are subsequently used for encryption, integrity checks, and authenicity checks.