[MUSIC] Welcome back. In this group of lessons we will explore
the process of checking inputs or validation and verification, how it works,
and why it is critical to your code. Let's get started. Welcome to module two lesson one. Validation and verification are simply
ensuring that your program works as it's supposed to and
it handles strange things correctly. What we're going to explore is why
this is necessary, how to do it, and how much you should be doing. Next slide, so why is checking necessary? Well, people make mistakes. People are not perfect. Sometimes people when they're doing
things simply don't understand a certain aspect of the program that
they're using, and so they guess. Sometimes people really don't care. They're just running the program
to complete a task and they couldn't care less about how it
works or what they should give it. They just expect it to work. And sometimes it's both. A standard rule of robust
programming is phrased in a derogatory way,
assume maximum stupidity. But what it really is saying is
assume that the user doesn't know or doesn't care about what your program
expects or how it should run. They only care about the end result. And so they may have things set
up in ways you don't expect. And your program has to be
able to handle those things. Now I want to make a distinction between
deliberate attacks and mistakes. Attacks really are mistakes that are made
quite deliberately in order to compromise the program in some way. The difference between compensating for
mistakes and compensating for attacks lies in the nature
of what happens. Typically mistakes or
problems that are not deliberate arise in the main parts of the program,
the parts that most people use. There's something wrong there or
they do something wrong. With attacks, attackers deliberately try
to exploit the parts of the program that are the least used because that's probably
where the most errors in the code occur or the least validation occurs. And as a result it's
a good point to exploit. And I've often heard the argument that,
but no one would ever do that. If you read the newspapers,
you know that that's not true. There are a lot of very nasty people out
there, a lot of criminals out there. And while the people you're working
with your programs may not be that way. In fact, they're virtually certain not
to be otherwise you probably wouldn't be working with them. Accounts can be compromised, attackers
can break into systems and as a result, you still need to protect the system
by writing good programs. Finally, the program you write for
site A may wind up in sites B, C, and D with very different environments. I can speak from experience here,
a very small function that I wrote when I was a graduate student
in the first year turned out to go into a very widely distributed
program in the third year. And it didn't know about it until I was
debugging something with that program. So even if you're writing something for
a particular site assume that it's going to
go to places other than that. So let's go to the next slide. The next question I asked
was where do you check? Where should you look for
potential problems? The basic rule is if you control
a component check what's coming in, check what's going out, and check
the environment of the of the component. That way if something goes wrong in the
component, you can handle it rather than having the caller or the client handle it. That way it's under your control. And in general handle the problem or the validation as close to the resource
you're protecting as possible. And you'll see some
examples of this later on. If you control two components and
you control the connection between them, then really all you have to do is
check the input to the first and the output from the second. You don't have to check
the input from the second or the output from the first because
you control the channel and so you can make sure that what's sent
from the first is received at the second. In most cases you don't
control that channel though. So it's always a good idea. Even if you have two components that are
linked together to check the inputs and outputs of both. Also, you may control the component now. But someone later on may come along and
have to add a special case or need to debug something in which case
they may not understand the controls then assumptions you've made about
the output going to the input. And so you need to be careful of that. So in general it's a good idea
to check everything again. And the goals of this checking is to
make sure that whatever happens or whatever exists won't affect the results
of the program in a harmful way, okay? Figure out what you want from the
components, figure out what dependencies the component has, check the dependencies
to be sure that they are what you think. That the values you're getting from
the dependencies are reasonable. Check the inputs, because you are not
sure what you're getting there. And then validate that the outputs you're
getting are reasonable and correct. Now there are times where you
can't check certain actions, because of the nature
of what you're doing. In that case, check to see what
happens when those actions occur. And if what happens is not what you want, it's not within the realm
of acceptability, then undo it or block it or
give an error return. And the rule of failsafe
default applies here. Don't look for bad things,
look for good things. And the reason for
this is that sometimes what's bad changes. As in I'll give you a classic
example of this in a little bit. Now, the next slide talks a little
bit about what can cause problems. And obviously, if there's a problem,
you're handling the data or the control flow inappropriately. We're focusing on the data here. And if something goes wrong, it typically means that there
was an insufficient checking. Or you look through the data. And the data did not contain
anything that was bad. Problem here is what's
defined as bad may change. So what you should be
doing is looking through the data to be sure that it is good. Also, sometimes checking is
not complete or consistent. Inconsistency between parameter calls,
for example, where the function thinks you're calling,
rather, where the function back up to the checking
may not be consistent or complete. The other problem is that the checking
may not be consistent, and it may be incomplete. The canonical example of this is
when a function calls a square root routine and passes in an integer. Square routine, of course, expects double. So in that case, unless you have the
prototype declared, there's no conversion. And what you get out is complete garbage. The network also poses
very interesting problems, because it's very hard
sometimes to validate things. For example, when you have a source
address in such a connection, how do you know that source
address is the actual address and it's not someone spoofing the packet? Typically, you don't. That's why it's important to do
some sort of authentication if the source is important.