[MUSIC] Welcome back. In this group of lessons we will explore the process of checking inputs or validation and verification, how it works, and why it is critical to your code. Let's get started. Welcome to module two lesson one. Validation and verification are simply ensuring that your program works as it's supposed to and it handles strange things correctly. What we're going to explore is why this is necessary, how to do it, and how much you should be doing. Next slide, so why is checking necessary? Well, people make mistakes. People are not perfect. Sometimes people when they're doing things simply don't understand a certain aspect of the program that they're using, and so they guess. Sometimes people really don't care. They're just running the program to complete a task and they couldn't care less about how it works or what they should give it. They just expect it to work. And sometimes it's both. A standard rule of robust programming is phrased in a derogatory way, assume maximum stupidity. But what it really is saying is assume that the user doesn't know or doesn't care about what your program expects or how it should run. They only care about the end result. And so they may have things set up in ways you don't expect. And your program has to be able to handle those things. Now I want to make a distinction between deliberate attacks and mistakes. Attacks really are mistakes that are made quite deliberately in order to compromise the program in some way. The difference between compensating for mistakes and compensating for attacks lies in the nature of what happens. Typically mistakes or problems that are not deliberate arise in the main parts of the program, the parts that most people use. There's something wrong there or they do something wrong. With attacks, attackers deliberately try to exploit the parts of the program that are the least used because that's probably where the most errors in the code occur or the least validation occurs. And as a result it's a good point to exploit. And I've often heard the argument that, but no one would ever do that. If you read the newspapers, you know that that's not true. There are a lot of very nasty people out there, a lot of criminals out there. And while the people you're working with your programs may not be that way. In fact, they're virtually certain not to be otherwise you probably wouldn't be working with them. Accounts can be compromised, attackers can break into systems and as a result, you still need to protect the system by writing good programs. Finally, the program you write for site A may wind up in sites B, C, and D with very different environments. I can speak from experience here, a very small function that I wrote when I was a graduate student in the first year turned out to go into a very widely distributed program in the third year. And it didn't know about it until I was debugging something with that program. So even if you're writing something for a particular site assume that it's going to go to places other than that. So let's go to the next slide. The next question I asked was where do you check? Where should you look for potential problems? The basic rule is if you control a component check what's coming in, check what's going out, and check the environment of the of the component. That way if something goes wrong in the component, you can handle it rather than having the caller or the client handle it. That way it's under your control. And in general handle the problem or the validation as close to the resource you're protecting as possible. And you'll see some examples of this later on. If you control two components and you control the connection between them, then really all you have to do is check the input to the first and the output from the second. You don't have to check the input from the second or the output from the first because you control the channel and so you can make sure that what's sent from the first is received at the second. In most cases you don't control that channel though. So it's always a good idea. Even if you have two components that are linked together to check the inputs and outputs of both. Also, you may control the component now. But someone later on may come along and have to add a special case or need to debug something in which case they may not understand the controls then assumptions you've made about the output going to the input. And so you need to be careful of that. So in general it's a good idea to check everything again. And the goals of this checking is to make sure that whatever happens or whatever exists won't affect the results of the program in a harmful way, okay? Figure out what you want from the components, figure out what dependencies the component has, check the dependencies to be sure that they are what you think. That the values you're getting from the dependencies are reasonable. Check the inputs, because you are not sure what you're getting there. And then validate that the outputs you're getting are reasonable and correct. Now there are times where you can't check certain actions, because of the nature of what you're doing. In that case, check to see what happens when those actions occur. And if what happens is not what you want, it's not within the realm of acceptability, then undo it or block it or give an error return. And the rule of failsafe default applies here. Don't look for bad things, look for good things. And the reason for this is that sometimes what's bad changes. As in I'll give you a classic example of this in a little bit. Now, the next slide talks a little bit about what can cause problems. And obviously, if there's a problem, you're handling the data or the control flow inappropriately. We're focusing on the data here. And if something goes wrong, it typically means that there was an insufficient checking. Or you look through the data. And the data did not contain anything that was bad. Problem here is what's defined as bad may change. So what you should be doing is looking through the data to be sure that it is good. Also, sometimes checking is not complete or consistent. Inconsistency between parameter calls, for example, where the function thinks you're calling, rather, where the function back up to the checking may not be consistent or complete. The other problem is that the checking may not be consistent, and it may be incomplete. The canonical example of this is when a function calls a square root routine and passes in an integer. Square routine, of course, expects double. So in that case, unless you have the prototype declared, there's no conversion. And what you get out is complete garbage. The network also poses very interesting problems, because it's very hard sometimes to validate things. For example, when you have a source address in such a connection, how do you know that source address is the actual address and it's not someone spoofing the packet? Typically, you don't. That's why it's important to do some sort of authentication if the source is important.