Hi there. In the last lesson, we looked at asymmetric cryptography and public and private keys. For this lesson, we're going to move on to talking about hash functions and cryptographic hash functions. After this lesson you will be able to define hash functions and explain the properties of cryptographic hash functions. So what is a hash function? Hash functions are used for mapping data to other data and you can map an arbitrarily long piece of data to data with a fixed size length. Let me show you what I mean with an example. So, we just looked at asymmetric cryptography and public and private keys. Now we're going to move on to talking about hash functions and cryptographic hash functions. Hash functions are used for mapping data to other data and you can map an arbitrarily long piece of data to data with a fixed size length. For example, if you take the string of bits that represents the string "Hello world" and you put it as input into a hash function, out comes a piece of data with a bit length of in this case 16 bytes, or, qwyjibo, for example. You take that string qwyjibo representing that word qwyjibo and you put it into this hash function. The resulting output is the same number of bit length as what we saw in the "Hello world" case. But it's a totally different set of bits here. Now, hash functions by themselves don't guarantee that the resulting Hash is unique. In other words, hash functions by themselves as they are don't guarantee uniqueness or the output of the hash function. So, you need extra constraints and that's where cryptographic hash functions come in. Now cryptographic hash functions have the following properties. The first property is that the same message always results in the same hash. The next property is that you can't get the message back from a hash unless you try to generate all possible messages. The third property is that two different messages should not result in the same hash value. So in other words, no hash collision. Another property is that it should be relatively fast. The last property is that if you generate a message and a hash, then modifying that message even just a little bit slightly and then generate another hash from that slightly modified message, you should not see any relationship between the old and the new hash. An example of this is the SHA-256 hash function and you can get a little bit more detail if you read the book by Ferguson, Schneier and Kohno. So we just looked at hash functions and cryptographic hash functions. Next we're going to chat about message authentication codes. A message authentication code is a tag or effectively a piece of data that has always the same length of bits. It's used to determine whether a message actually came from the center that we had expected. Now you need to do something special here by using message numbers to ensure proper ordering of message blocks. But that is a detail that I won't get into any further here. But nonetheless, you can also determine whether that message hasn't been tampered with. An example of an algorithm like this is the HMAC. The HMAC uses the SHA-256 hash function as a building block, and that's why we were talking about cryptographic hash functions earlier. So the way this works is that the sender's side has some plain text message and they have a MAC algorithm, and they put this plain text message as input into the MAC algorithm and they also use a private key as input. The resulting operation output is some fixed size MAC value. It's always the same length of bits and it's always different for different messages. Then the sender takes the plain text message and concatenates it with that fixed size MAC value that they got earlier. They take that concatenated piece of data and they send it as input into an encryption algorithm and they have a private key that they use for this encryption operation and outcomes the resulting cipher text. This resulting cipher text is what gets sent to the receiver. So I want to make a note that the private key for the MAC operation and the private key for the encryption operation these two keys are derived using an algorithm from a single secure channel key. Now I'm going to add an additional mentioned here that this process, this ordering of MAC authenticate then encrypt which is what we basically talked about here. This is mentioned in Ferguson Schneier and Kohno's book. When they talk about this ordering of authenticate then encrypt, they also talk about the other way which is to order the operations by first encrypting then authenticating. There are several arguments for and against each of these ways of ordering or doing MAC or doing authenticate then encryption versus encryption then authenticating. In this discussion basically, we will follow the book's suit and choose to talk about authenticate then encryption. So, we're going to talk about authenticating first then encryption second, and since they mentioned that its benefit is that it is simple and it has enough security under their so-called practical paranoia model, that means this is enough. Enough for practical purposes. Now, on the receiver side what they do is they take the cipher texts that they've received and they decrypt that cipher text using the same private key that the sender used and they get as the output the plain text plus the fixed size MAC value. They take this fixed size MAC value and they put it through as input into a MAC verification algorithm and also as input into the MAC verification algorithm, they give it the private key for the map. The resulting output of that operation is a yes or no question that basically says, "Was it tampered with? If yes, then discard the plain text message. If not, it's okay to use a plain text." and now we can do further processes with it. So in summary, we know that the message that the receiver has received has not been tampered with and most likely, it's been sent by the sender. So now I'm going to mention a little bit about what I hinted at earlier about the sender. You need message numbers from the sender in order to really make sure that you definitely received it from he who you expected. So the reason why I bring this up is because if you don't have a message number that basically tags each message, somebody could go in between the sender and the receiver and eavesdrop on the messages and collect those messages. They don't necessarily have to decrypt the messages but what they could do is decide to replay those messages to the receiver. This is what's called a replay attack, what you need in order to defend yourself against replay attacks in this sort of scenario is to add a message number that the sender and the receiver both agree on to use as a starting point when they're communicating with each other. To tag those messages so that the receiver knows that the sender has actually sent it at this point in time. So, now we just looked at message authentication codes and we can wrap it up. In conclusion, we talked about building blocks of cryptography and this is for encryption. So keep pieces of data secret. We also talked about concepts of cryptographic hash functions, and then we also talked about concepts of message authentication codes. These are used for tampering and checking authentication. Now, we know ways to mitigate spoofing, tampering, and some information leaks in our threat models. Again, for more information, see the very excellent text called Cryptography Engineering, Design Principles and Practical Applications by Ferguson, Schneier and Kohno. If you want to get a little bit more detail with the math and theory behind what we were just talking about today. Thank you