Here's an example of a function with a buffer overflow that can affect things. It's goal is to take two strings, merge them to actually concatenate them, and then give them to a function called doTask, that will do something that's presumably the strings you're putting in are for example, command and argument or something like that. Now the buffer in which the concatenation store is called buffer, it's 256 characters long. So you're also parsing in the length of the character strings in the buffers. Technically, by the way, this is bad. You shouldn't compute that. But we'll go with this because it's easier. And the first thing that happens is it adds the two together and says hey, if it's greater than or equal to 256, I'm done. Scenario, you are going to cause an overflow, and that's why the minus one return. Otherwise, what it does is it uses the function memcpy to copy the first argument into the buffer and then the second argument where the first one left off. And so now, you've got a nice task. So you give that buffer to doTask. What happens here if I call this with len1 being say, 250 and len2 to being minus five? Was this the argument on sign, that minus five is going to be treated as a very large positive number? And all of a sudden, I've bypassed the check. But now, I'm doing a huge copy, and I've caused a buffer overflow, as you can see right here. Now I should point out that this may or may not be a security problem because if the program you're trying to break into is one that you're running, it only is your rights. So you're not going to get anything out of it on this particular system. However, it certainly robust this issue because it will almost certainly cause your program to crash. So we should avoid these in any case. Now here are a couple of more bugs. These are a little bit more subtle. This first one is used in RPC which is a remote way to execute functions or procedures on a remote system. RPC, remote procedure call. If you look at this one, it takes in some information in the parameter list and sets up a bunch of functions then computes a variable called node size, which is made up of the product of two of the variables that are stored internally and then parses that in the malloc to allocate that much space. Problem, of course, is what happens if no size is too big? Well if it wraps around, it's going to become negative. So how do we make node size fall within the acceptable ranges? And the answer is you use checks. There's no check here. The node size will not overflow. Why might it? Well let's say C is, which is computed earlier, it's fact, it's from a parameter that's parsed in. Let's say it's two to the 31st, and we're dealing with a 32-bit machine. el size is the size of an element which, for example in this case, would probably be four. So when I compute node size now, I get two to the 31st times four which is two because of the integer overflow. Now it turns out the code I just showed you is used in a lot of privileged servers including a version of Kerberos. And there are a number of other things you can do as well to mess it up involving overflows, multiplication for node size overflows, and so forth. So the impact here is you can do a number of things. You may be able to execute arbitrary code on the remote system, you may be able simply to deny its service which is typically very effective. And it turns out on the authentication protocol, our program Kerberos, key distribution mechanisms. The key distribution center is stored in memory because this should be able to respond quickly. As a result, you have problems. Here's another example of sign punning. This is the routine, simply to get the name of the peer that's on the other end of my network connection. And as you can see here, it's fairly straightforward, copies in the data that you're getting, checks to see whether or not there's an error. If there isn't, then it updates everything and returns the error associated with that particular buffer. In this case, it would be none. So what's the problem? Well if you go back, if you look at the slide, what happens if for example len is negative? So if I can set len to negative, the MIN will be negative and now, the result is that when you do the copy out to write things out, that's going to be interpreted as an unsigned number so you write out a whole lot more than you would expect it. This, by the way, was actual code in a specific Kernel, and you can copy up to four gigabytes of kernel memory into user space. This includes passwords and things like that. So you don't, really don't want people to know. So what are the rules of thumb here? Check for potential buffer overflow before you use the value in operation. By the way, don't say if x is greater than MAXINT. Please say x is greater than MAXINT. It's already been rounded down to a more acceptable value. What you should do instead is, if you want to know whether or not operating on a number will cause overflow, try dividing the value that you're worried about into MAXINT and see if the result will tell you anything. And there's an exercise later on that has you work through this. The other thing you should do is check your library functions. Make sure that you have prototypes. And the reason for this is the prototypes will do coercion for you. But also, be sure that you're clear on when something is to be signed and when it's to me unsigned. Check for signed values and unsigned values in your libraries. The reason is that if you don't have a prototype, which will do coercion, when you pass something in that's signed and the function returns interpreted as unsigned, you get a mess. You get a huge value, as I showed you earlier. On the other hand, if both are signed, then you're fine. So what you should do is look at the library functions and the system calls and such issue, using and make sure you understand exactly what they're doing and how they're doing it. The other thing you can do is use the defined types whenever possible. In some languages, for example, you can restrict an Int to be positive. That's fine. Just know the underlying base types because that will help you determine whether or not the move is appropriate and if not, when it would be. You know in a way, this is a lot like the taint untaint work that's being done. The whole idea of taint is that when you read from an untrusted source, you mark the input is tainted. And as it propagates through the system, everything that comes in contact with also gets marked. It also gets marked as tainted. And so by looking at how taint flows occur within the program, we can get a good idea of how data flows around the system. The way to avoid all of this, of course, is to just check your input. When numbers are expected, check them for size, and do it digit by digit, and don't forget about the sign. You should do this for any conversion of a string to a number. And scanf and sscanf while tainting don't do any overflow checking. So you want to avoid them.