Welcome to video one, the second video of section 4.2. Here we're going to talk
about metacharacters. A metacharacter is simply something that is
interpreted specially. The dollar sign for example
to the command interpreter on Linux and Unix systems
that indicates a variable. As you can see,
there are a number of other special characters. Note especially
the backslash which is used to remove or
demetacharacterize, I guess, the next character. So if you've
backslash dollar sign, the shell will always interpret
that as a dollar sign. Now, the ones at the top
all shells respect. But some shells do
things a little bit differently or have
extra characters. For example, in C Shell, the exclamation point
typically means history. In some shells, you can use printf style functions to have the Shell do certain things. For example, they will print the second argument to the Shell script as
a hexadecimal number. So know which Shell
you're using and what the characters are
that have special meaning. Notice by the way at the top
of the star would simply expands in the shell
to match all files. What's particularly
interesting about that is in the editor
regular expressions; star means repeat
the previous character or pattern zero or more times, which is very different than the start to the
command interpreter. Here the arrow sign, the less-than sign
is redirection. The semicolon is
command separator. The back ticks say, run the command between
the back ticks and replace what's between
the back ticks including the back
ticks themselves of course, with the output. So for example, the top
line would simply send me@here.com
the password file, and then print here.com. So that entire line would appear to the Shell as here.com. You can also have it
do strange editing. For example, in the second line,
you're receiving input. What this input
particularly says is, "Delete everything,
beginning with line one and going to
the first blank line". The third command deals with remote execution on a host
and I'm using remexec here. Even though that's not a command, you can replace it with SSH or a couple of others
that I'll show you, which are older but our archetypal examples
of programming errors. In this case, the remote host will get the command echo
followed by the arguments, and notice the backslash
in front of the back tick. That means the program that is running the remexec
and command, in other words, the host you're typing this on is to treat those back
ticks just as characters. But the remote host, one called host here, will get the command
echo followed by "back tick mail me@h.com. arrow sign etc password; hi back tick". No backslashes. So me@h.com will get a copy
of the password file and the entire thing between the back ticks
will be replaced by hi. So the remote host
will simply print hi. If you've ever heard of
the current command injection, this is an example of it. On the next slide and
the following ones, we're going to talk
a little bit about how to handle things like this. What is some of the problems
that might arise? The first one is, who does
the checking: the client, the server, someone else? The canonical example
of this is a program that thankfully no longer
exists called rexd. Rexd was a demon that would take input from the net
and execute it. The problem was rexd assumed that the client, did all
of the checking. So the client would
authenticate that the user had the authority
to run the command on the remote system and that the user is who
they claimed to be. So it would check authentication
and authorization. So rexd itself said, "Oh gee, everything's already done, so I just have to run the command". The problem of course is
what happens if I write my own client and I do other clients
checking doesn't work. Here's one that is
even more direct. In many Linux and Unix systems, there used to be a set
of commands called YP. It stood for yellow pages. Then it got changed to NIS for network information systems. The basic idea was to centralize authentication information
into one database call, and then things like
YP password and this password would use
that database to validate users. This was common when you
had many workstations in one particular area or for
one particular set of people. Instead of having each one have their own
authentication database, put it on a central node
and then they could query, number of advantages
here of course. Well, the way
the password information was stored was in a file with
fields separated by colons, and the records were
separated by new lines. So each user had one record and there were seven fields
in each record. The first one being the username, the second one being
the hash password, not in the clear
but it was hashed, the third being the UID, the fourth being
the GID and so forth, the fifth one being
identifying information, and the sixth one if I remember correctly was the home directory, and the seventh the shell. In any case, when you log in, your password was checked
against this database, and then you are assigned
the UID and the GID that was in the database in the particular record
associated with you. Well, these were
separated by colons. Okay. So when you change information not on
the central server, it would have to send
a message over the network. So initially, the server did no checking
of what was sent. So when you ran the program, for example to change
your information, which was ypchfn for change firstname or
change information, what you would do
is type what you wanted in that field
that identified you, then put in a colon, then put it in a home directory, then put in a colon,
then put in shell, then enter a new line. Then you would put in a name followed by an empty
password field, followed by a UID of say zero to give you administrator
privileges and so forth. So now there was a beginning of a second line that gave you system administration
privileges without a password. In this case, what she would simply do is embed the new line. Since the server didn't check whether there was
a colon or a new line, it would just write the data into the database
without a problem. The next slide shows
this in more detail. There's an example
of a password file. The hash is the zbcdef part. The field that CHF and changes is the part
that says Matt Bishop. So we call ypchfn and when
it asks for your new name, we put in the line
that's shown down there. Now, the important part
here is the Ctrl+V, Ctrl+J. Ctrl+V says, take
the next character literally Robert of any meeting
to the Shell. The Ctrl+J is a new line which normally would
terminate the entry. But since it's preceded
directly by Ctrl+V, it's treated just as a character just like
the letter a or b. Then, if you look following that, you have MR which
is a new username, and then the two colons. That second field, the one between the colon is where
the password hash goes. Since there's nothing
there, there's no password for the account. So after this is
run by the server, if you take a look
at the next part after the change you have, you will see that there
are now two lines, the second line picking up where the first-line left off instead
of that one single line. So now what I do is I log
into the system as MR. I won't get a patent when he
asks for a password because there is none, and I'll get root. Okay. So how do you handle this? Well, what they did
first of all was changed the client to disallow colons
and new lines in the field. So even if you did what
I showed you before, ypchfn would say, "Hey, no. I'm sorry, you have
a colon there. I'm not going to accept this. So does a client run
with any privileges? It turned out, no. So what people did was they wrote their own client to send over whatever they wanted, and the server made
the assumption that the client was doing the checking.
Bad assumption. The second time
they did it right, they then fixed the server. That's a good example of why you want to do verification and validation as close to the resources you're
protecting as possible. Because the server
controls access to the database and deals
directly with the database. The server dealing directly
with the big database, that's where you do the checking. Doing it at the client
is nice because it will save you sending
things over the network. But doing it at
the server is critical, because whatever you've
got is going into that database and you
want to make sure what goes into
that database is right. So this leads to a couple of requirements that are
shown on the next slide. Know what the server
expects in ypchfn, it expects a well-formed
fifth field. That means no colons,
no new lines, no other special characters that your system may
interpret wrongly. Rexd expected to
authorize command from an authenticated user.
It got neither. So the servers really should
expect anything else. In both cases, the server
should have assumed that the client either lied or made a mistake and the server
should have done the checking. Hence the rule which
you see on this slide, validate as close to the resource being
protected as you can.