This is a somewhat lengthy article that is intended to help anyone who is taking their first steps into learning about encrypting sensitive data in a compliant environment such as meeting PCI DSS requirements. The hope is that this is an effective stepping stone into the dry, dry world of encryption standards and compliance.
As part of some recent work on a proposal for a PCI DSS compliant solution I found myself having to become intimately acquainted with the concepts and standards for protecting data. My initial foray into this world was met with what felt like an impenetrable wall of esoteric information. I had a few terms to get me started on my research. I knew that the solution has been designed to integrate with a ‘Key Management System’ and to use tiered encryption keys known as a ‘Data Encryption Key’ – for encrypting data – and a ‘Key Encryption Key’ for encrypting the Data Encryption Key – and that these are used in symmetric ciphers such as AES256. Now I’m usually pretty good at research (ahem! Googling *cough*) but I struggled to find any clear, easily digestible information on how these concepts all hung together. Wikipedia wasn’t much help and there was a lot of ambiguity between the various articles provided by vendors that only served to hinder a fledgling student of this subject. At any rate, I ploughed on and after a few late nights of reading through extensive, lengthy, dry product briefs and standards documents I managed to wrap my head around the problem space. This whole experience drove me to promise myself that I would record this knowledge in a simple form for posterity. So here we go…
What Problem Domain?
Alright, so where to begin?! Let’s start with the basics… The first problem I encountered here was in trying to understand what *this* is even called! Surely once I knew what the problem domain is commonly called then research would be so much easier. If only! Starting with the good ol’ Wikipedia material on ‘Key Management‘ didn’t turn up anything particularly useful. I know we were looking at using an external Key Management Service (KMS) such as AWS KMS so looking at the documentation there I found this problem space referred to as ‘Envelope Encryption‘. Interestingly this terminology is also used by Google GCM. Oddly however, more ‘classic’ non-vendor sources such as Wikipedia don’t have any reference to this as established terminology; is ‘Envelope Encryption’ a vendor-specific term? It wouldn’t surprise me if it was, especially given the confusion it raises with PKCS envelopes in the PKI space. Searching for Envelope Encryption does however turn up a Wikipedia article on ‘Key Encapsulation‘, which refers us back to the concepts of asymmetric PKI – GAH! 😩. Even worse than that, some OWASP info I found on the subject referred to this as ‘Tiered Encryption’. Makes sense but nowhere else seems to use that term. Finally, further digging in Wikipedia turned up ‘Key Wrap‘ as a concept that seems to describe the problem quite well, even referring to the NIST standard 800-38f – AES Key Wrap Mode covering ‘Key Wrapping’ and the use of ‘Key Encryption Keys’. Turns out this also aligns with PCI, ISO and IETF. Phew!
So, we’re dealing with Key Wrapping. Good, let’s go.
Gimme the freaking concepts already!
I’ll set the scene with the most fundamental tool we need to use: symmetric encryption. Protecting data at rest is typically achieved using ‘symmetric encryption‘, i.e. one single secret key for encryption and the same key for decryption. It is more than likely that we’re talking about the NIST approved AES (Rijndael) block cipher to perform the cryptographic operations on our sensitive data. For my fellow Microsoft stack developers you’ll probably be using one of the following APIs:
- CryptoAPI – Also known as CAPI, now obsolete in favour of CNG:
- CryptoAPI Next Gen – Also known as CNG, available since Windows Vista. The AES API here is accessed via BCryptEncrypt with the AES-GCM flag set
- AesCryptoServiceProvider – Wraps CAPI, also now obsolete in favour of AesCng
- AesCng – Wraps CNG
- AesManaged – A completely managed/.NET version
- All these APIs are detailed here: https://docs.microsoft.com/en-us/dotnet/standard/security/cryptography-model
I hope to cover off the differences in the Microsoft Cryptographic APIs in a future post. For now if you are not sure what to use then read up on the various sources above but you’ll probably want to just stick with CNG in your preferred programming model and you should be fine.
Using most crypto APIs is a fairly well documented and relatively simple process so we’ll assume you’re not doing anything too crazy and get straight onto key management.
The saying goes that encryption is easy and key management is very, very hard. As I’m sure you are aware, if we only have one secret key for encrypting and decrypting our data then we’d better make jolly well certain that we’re handling that key carefully.
The problem at the root of Key Wrapping is how an information system should store its’ sensitive data at rest (i.e. on disk; in a filesystem or in a database, etc.) while ensuring Confidentiality, Integrity and Availability (CIA triad). So this is different from other common problem domains of encryption such as transmission and identity (PKI, PKCS, signing, etc.) and as such different concepts apply here.
The ‘wrapping’ part refers to the fact that we want to use two types of keys to protect our data. Specifically when talking about symmetric data encryption we’ll want a data encryption key to protect the data and we’ll also want a key encryption key to protect the data encryption key. For this document I’ll use the terminology DEK (data encryption key) and KEK (key encryption key) as per the terminology accepted by NIST.
KEK, MEK, DEK? What the feck?
It’s worth treading carefully in this space and ensuring that wires are not getting crossed when talking about the different keys. For instance, Microsoft frequently uses the DEK terminology to refer to the data encryption key but at the same time using the term Master Key in its DPAPI and SQL TDE models to refer to the KEK with AWS KMS using the term Customer Master Key for the KEK. Where this gets confusing is going back to standards such as NIST that use the term Master Key for something quite different and so it is worth always being aware of your frame of reference when researching in this space. Notably Google’s GCM KMS also uses the NIST style DEK/KEK terminology.
Why bother wrapping?
We need a DEK to encrypt our data, that is inescapable. Furthermore application design best practices dictate that it is worth keeping the DEK close to our data so that a) we can encrypt and decrypt our data without sending the sensitive data outside of our sovereignty (ideally without sending it beyond our application scope), and b) so that we can encrypt and decrypt our data without external dependencies and without the cost of network overheads (resilience, performance). But if we simply keep the unprotected DEK next to the data it protects then anyone who gets the data will be able to decrypt it.
This is where key wrapping comes in. By encrypting the DEK at rest we can keep the DEK close to its subjects and keep it secure and so we use a KEK to protect the DEK. To ensure then that we don’t have the same issue with an unprotected KEK we turn to a tamper proof and standards compliant key management tool such as a Hardware Security Module or a Key Management Service such as AWS KMS.
Your application should never see the KEK and so all of that key management and all of the complexity that comes with it is outsourced to standards compliant (PCI, FIPS, ISO) suppliers. Instead, our application requests a DEK from the KMS or HSM, which returns the DEK in both encrypted and unencrypted form. We store the encrypted form and use the unencrypted form in a transient process (I’ll cover in-memory DEK protection in a future post), disposing of it when we’re done encrypting. We then call the KMS to decrypt the DEK again at a later time when we need to decrypt the data. In short, key wrapping enables us to decouple key management responsibilities from our application’s data encryption requirements.
For further reading on this I’ll point you to the documentation for AWS KMS as this explains the concepts perfectly clearly. And don’t forget, AWS uses the term Customer Master Key – or CMK – to refer to the KEK!
The final concept that your solution will need to consider is key rotation. ‘Key Rotation’ refers to the process of continually changing your encryption keys. This is a process that should be factored into the design of your solution and for the most part this should be completely automated and securely out of reach of human eyes. There should however also be provisions for manual intervention in response to security incidents.
Before we complete the discussion on key rotation we must first cover the inescapably esoteric concept of Cryptographic Periods (or Cryptoperiods). A cryptoperiod is the amount of time that an encryption key should ‘live’. It is not enough to have an encryption key and keep it safe. A key won’t last forever. At some point it will become too weak or compromised to serve its purpose. This could be due to anything from the risk of someone discovering the key to the fact that computers will eventually become powerful enough to break the key’s protection. Cryptoperiods are there to manage the risk of compromised encryption. There are a number of key points to be aware of when dealing with cryptoperiods.
First of all, the timespan is usually calculated starting not in days or hours but in terms of cryptographic operations. So if you want to know how long a key should live in terms of elapsed time then you should calculate how many encryptions it can be used in (i.e. how many rows of data can the key be used to encrypt) and extrapolate from there.
The calculation for a cryptoperiod must account for a number of factors including the key type, the sensitivity of the data, the amount of time that the data originator requires access to the data, the amount of time that the data recipient requires access to the data as well as environmental factors from the operating environment (how secure is the server, operating system, application?) right up to staff turnover. As a rough guide, for a symmetric data encryption key protecting hundreds of records you could theoretically keep the data encryption key for as long as 3 years. At higher volumes of data you could be getting down to weeks.
I wish I could give even just an example calculation here but as far as I can tell this is an intentionally arbitrary concept used by standards such as PCI, FIPS and NIST to force a thought process and internal discussions. There are rough guidelines – such as the aforementioned weeks-to-years for data encryption keys – and as long as you adhere to these and show your working out then you should be OK.
On avoiding re-encryption, I have come across a number of instances where it has been suggested that you may be able to negate the need to re-encrypt historical data with new DEKs by reducing the amount of data covered by a DEK to as low as 1:1. In theory this does make sense but having discussed this with a PCI QSA it is a non-starter if PCI DSS compliance is your goal. You either re-encrypt your data every 5 years’ as an absolute maximum, preferably within 1-3 years, or you delete it within that time.
One thing is absolutely clear however and that is at the end of a cryptoperiod the key should be securely destroyed and any data protected with that key should be re-encrypted with a new key or the data should itself be securely destroyed, the latter being most preferable if at all possible (datensparsamkeit).
Finally, as an FYI, there is some mention of the concept on Wikipedia but it is not very helpful. If you want in-depth detail on the subject then you are best turning to NIST and the indispensable 800-57 publication. That is a very dry and prolonged read but necessary in this matter, it is even directly referenced by PCI DSS 3.2.
And so we return to key rotation…
Key Rotation reprise
Once you know how long you are going to keep your keys you can implement your key rotation policies. Generally speaking these policies will be different for your DEK and your KEK. Your KEK may only require rotation every year while you will likely require a new DEK every ‘X’ number of encryptions performed as per your cryptoperiod calculation with any long-term records requiring re-encryption again with a new DEK every few weeks to years. For your KEK and DEK the process is similar in that you first create a new key, use that new key to encrypt your protected data then dispose of the old key. Where the processes differ of course is how and when this process is triggered. For your DEK you will likely have to count the number of encryptions it is involved in and renew when it exceeds a threshold while also scanning for historical records that are in need of re-encryption. Your KEK on the other hand will/should be held in a HSM or KMS service and this may or may not automatically cycle your KEK. It may be that you need to count your DEKs and request a new KEK on a threshold or you may need to handle an event message from the HSM/KMS that notifies when a KEK is being cycled and then update your stored (encrypted) DEK material.
One useful pattern to aid your future self is to store metadata about the data encryption context alongside your DEKs. Every row of data encrypted by a DEK will of course need to have a reference to that DEK so that your application know which DEK to use for decryption. Over time the size and type of the DEK used by your application will likely change to accommodate enhancements in encryption APIs and along with this you would expect the ciphers used will also change as computing power grows. Consider what will happen if you keep your protected data for long periods of time. The longer you keep your data, the more likely will be to have to update ciphers, such as moving from AES 256 to AES 512 or to a new algorithm altogether. To help deal with this, your application will benefit from having a record of exactly how each piece of data was encrypted. This can be stored alongside your DEK material as metadata and used by the application to make decisions about how to use the encrypted data and when to update it.
Crypto means ‘Cryptography’
Just needed to take this chance to get this point in: ‘Crypto’ means ‘Cryptography’. Anyone who tries to tell you otherwise is a shyster (I think they call them influencers these days) and they’re trying to sell you something I promise you don’t need or want.