Here's an example of a function with a buffer
overflow that can affect things. It's goal is to take two strings, merge them to actually
concatenate them, and then give them to
a function called doTask, that will do something
that's presumably the strings you're putting
in are for example, command and argument or
something like that. Now the buffer in which the concatenation store
is called buffer, it's 256 characters long. So you're also
parsing in the length of the character strings
in the buffers. Technically, by the way, this is bad. You shouldn't compute that. But we'll go with this
because it's easier. And the first thing
that happens is it adds the two
together and says hey, if it's greater than or
equal to 256, I'm done. Scenario, you are going
to cause an overflow, and that's why the
minus one return. Otherwise, what it
does is it uses the function memcpy to copy the first argument
into the buffer and then the second argument where the first one left off. And so now, you've
got a nice task. So you give
that buffer to doTask. What happens here if I call
this with len1 being say, 250 and len2 to being minus five? Was this the argument on sign, that minus five is
going to be treated as a very large positive number? And all of a sudden, I've bypassed the check. But now, I'm doing a huge copy, and I've caused
a buffer overflow, as you can see right here. Now I should point out that this may or may not be
a security problem because if the program you're trying to break into
is one that you're running, it only is your rights. So you're not going to get anything out of it on
this particular system. However, it certainly robust
this issue because it will almost certainly cause
your program to crash. So we should avoid
these in any case. Now here are a couple
of more bugs. These are a little
bit more subtle. This first one is used in
RPC which is a remote way to execute functions or
procedures on a remote system. RPC, remote procedure call. If you look at this one, it takes in some information in the parameter list
and sets up a bunch of functions then computes
a variable called node size, which is made up of the product of two of
the variables that are stored internally and then parses that in the malloc to
allocate that much space. Problem, of course, is what happens if no size is too big? Well if it wraps around, it's
going to become negative. So how do we make node size fall within
the acceptable ranges? And the answer is you use checks. There's no check here. The node size will not
overflow. Why might it? Well let's say C is, which is computed
earlier, it's fact, it's from a parameter
that's parsed in. Let's say it's two to the 31st, and we're dealing with
a 32-bit machine. el size is the size
of an element which, for example in this case, would probably be four. So when I compute node size now, I get two to the 31st
times four which is two because of
the integer overflow. Now it turns out the code
I just showed you is used in a lot of
privileged servers including a version of Kerberos. And there are a number of
other things you can do as well to mess it up
involving overflows, multiplication for node size
overflows, and so forth. So the impact here is you
can do a number of things. You may be able to execute arbitrary code on
the remote system, you may be able simply
to deny its service which is typically
very effective. And it turns out on
the authentication protocol, our program Kerberos, key
distribution mechanisms. The key distribution
center is stored in memory because this should
be able to respond quickly. As a result, you have problems. Here's another example
of sign punning. This is the routine, simply to get the name
of the peer that's on the other end of
my network connection. And as you can see here,
it's fairly straightforward, copies in the data
that you're getting, checks to see whether or
not there's an error. If there isn't, then it
updates everything and returns the error associated
with that particular buffer. In this case, it would be none. So what's the problem? Well if you go back, if you look at the slide, what happens if for
example len is negative? So if I can set len to negative, the MIN will be negative and now, the result is that when you do the copy out
to write things out, that's going to be interpreted
as an unsigned number so you write out a whole lot more
than you would expect it. This, by the way, was actual
code in a specific Kernel, and you can copy up to four gigabytes of
kernel memory into user space. This includes passwords
and things like that. So you don't, really don't
want people to know. So what are the rules
of thumb here? Check for potential
buffer overflow before you use
the value in operation. By the way, don't say if
x is greater than MAXINT. Please say x is
greater than MAXINT. It's already been rounded down to a more acceptable value. What you should do instead is, if you want to know
whether or not operating on a number
will cause overflow, try dividing the value that
you're worried about into MAXINT and see if the result
will tell you anything. And there's an exercise
later on that has you work through this. The other thing you should do is check your library functions. Make sure that you
have prototypes. And the reason for this is the prototypes will
do coercion for you. But also, be sure that
you're clear on when something is to be signed and
when it's to me unsigned. Check for signed values and unsigned values
in your libraries. The reason is that if you
don't have a prototype, which will do coercion, when you pass something
in that's signed and the function returns interpreted as unsigned,
you get a mess. You get a huge value, as I showed you earlier. On the other hand, if both
are signed, then you're fine. So what you should do is look at the library functions and the system calls and such issue, using and make sure
you understand exactly what they're doing
and how they're doing it. The other thing you can do is use the defined types
whenever possible. In some languages, for example, you can restrict an Int to
be positive. That's fine. Just know the
underlying base types because that will help you determine whether
or not the move is appropriate and if
not, when it would be. You know in a way,
this is a lot like the taint untaint work
that's being done. The whole idea of taint is that when you read from
an untrusted source, you mark the input is tainted. And as it propagates
through the system, everything that comes in
contact with also gets marked. It also gets marked as tainted. And so by looking at how taint flows occur
within the program, we can get a good idea of how data flows around the system. The way to avoid all of this, of course, is to just
check your input. When numbers are expected, check them for size, and do it digit by digit, and don't forget about the sign. You should do this for any conversion of
a string to a number. And scanf and sscanf while tainting don't do
any overflow checking. So you want to avoid them.