Oktane20: How Web Authentication Works
Sara Daqiq: All right. Hey everyone, my name is Sara, and I am a developer at Okta. Today I'm going to be talking to you guys about web authentication and how it works. Let's talk about what we are going to talk about in general today. Our agenda is going to be, I'm going to kind of minimize how we would create a web application, then we're going to talk about data flow, that means HTTP request and response. We're going to talk about hashing, we're going to talk about SSL, and session and cookies, and lastly we're going to talk about OpenIDConnect.
Let's look at what a client's side would look like. Imagine you go to your favorite restaurant, you order a meal. The server is going to get your order, is going to go over to the kitchen, and then is going to ask the kitchen to prepare the food. Now what happens is that the kitchen is going to process the food, and if they feel like they need resources, they're going to go to fridge and get those resources. The same applies to a web application. You have a front end, which is equivalent to a table in a restaurant. You give them the data, or you request data. It's going to go over to internet, and then the server is going to process that data, and if they need more data, or if they need more information, all they need to do is ask the data from the server that they saved before. Also good that we have created some kind of analogy for our web application here.
Now let's look at each page and see how they are communicating the data. This is that register.html page. It has a form. It's a plain HTML file, it has a form, and the form has a bunch of labels and input in it. The label is the value, the name of the value you are requesting from the user. Then you have an input, and the input is going to have the file, the information that you are asking the user to enter. The other most important thing about this application is that it has an action, or URL, /URL. What that means is once I have collected, or the form has collected the data from my forms, then it's going to be sending over the data to /register endpoint. In my server side I need to create a value or a route that will receive that data from this /register endpoint and process it.
It does the same thing with my Signin.html. In my Signin.html all I need to do is to have labels and inputs, and also the most important thing in here is /login. That is where when somebody submits this form, the sign in form, all the information that the form has connected is going to go to the /login page. We somehow kind of breeze through what a client side would look like. For the seller's side we are going to look into kind of... It's depending on the values that you use, or depending on the language that you use, you are going to create your server. Then at the same time you can connect it to the database. Sorry I have a technical difficulty here. Okay, so you can connect to your database with your favorite language as well.
Now your database is going to have some files, so your database is going to have collections. Either it's going to be tables, or it's going to be some kind of collections depending on the database. Your database is going to do that and save... For example, if you have users' information, the user information is going to go in one table, and then if you have credit card information, the credit card information is going to be collected to another table. All right? We are going to do that, and then, finally, once we created the collection, our system is kind of complete.
Let's look at the data flow. How does the data flows through internet from client to the server. Imagine your family has finally convinced you to sign up for Facebook. So you go to Facebook, you add your information here to register, and then Facebook is going to create an HTTP request. HTTP request is going to have a content type, and then it's going to have an object that will have all the details that you want, all the details that Facebook is requesting you. Now what it's going to do is, Facebook is going to kind of create this letter, and put it in an envelope and send it through internet to this URL that we have completed earlier. Now if everything is processed correctly in the server, we can go to the database and check the database, and our values are going to look like this. So the database has assigned an ID to the user information, and then it saves all the information that we want in here, so all the information about the user that was passed from the front end. The one thing that's kind of odd is here is the password. We have saved the password in plain text.
Before we are going to figure out how to not save the password in plain text, let's take a look at some security consideration on passwords. One thing I can do is to make sure that the passwords that I am saving are not the passwords that I've seen before, so that means a dictionary password. If your password is a word that has been found in dictionary, the problem with that is that it's prone to a dictionary attack. Hackers can just compile all the words that are in dictionary, and then keep repeatedly submitting each one of them as your password until they get to the right password, so you don't want to do that. You also make sure that as a developer when you ask for a user's password, the input field should have a type password. That's going to make it safe for the users to enter password, and it just turns into a unreadable character. There's a minimum requirement for passwords of eight characters, so you don't want to have less character password than eight characters. You want to also check against the common passwords. What I mean by that is that if I have a password that the user has created, I'm going to make sure that I compare it against the common passwords, or database that I have for the common passwords.
So far so good. Now what else we need to do? We need to make sure we limit the password attempts. If a person keeps entering the wrong password, you need to make sure that after a limited amount of time their account is locked out. We also want to make sure that the password complexity is not important anymore, actually. NIST is a security organization that kind of produces the rules for our authentication, and they suggested previously what they need to do is to request complex password. For example, you have upper case, lower case, you have periods, complex characters, and things like that. Now that led to a situation where you could not find the password... The user would just find a password, and then they could keep reusing it over and over again over different platforms. If they were forced to change it, they would just increment a character or a value in the password. So they do not require us to have complex password any more.
Now re-authentication before changing the password is important. Imagine your friend logs into your account and they're trying to change your password. If they don't know your password they cannot change it, so you would like to make sure that each time a user tries to change their password they're required to re-authenticate. Lastly, safe error handling. What I mean by this is that you should not give very specific information about what is wrong when the user cannot enter their password. For example, you do not want to say if you require ten character password, the user's password is 10 character and the user only types five characters, your error should not say, "Hey, your password is 10 characters and not five characters." If you give them detailed errors, the hacker could just use those errors to their benefit and manipulate the password to make sure that it's the right password.
Okay, so we have covered general security consideration for password. Now we are going to go back to our hashing. So what is hashing first? I will have a specific set of characters. I am going to ask for number, the length of hash I want, and then send both of those datas to a hash function. At the end the hash function is going to return me a value, it's seemingly random characters. Now if I change this character in any way shape or form... Right now I've added a exclamation point at the end of this characters. I give the same amount of length that I want, and goes through the hash function, the hash is going to be totally different. So the only thing that the hash function guarantees is that whenever you add the exact number of characters inside the hash function as an argument, and you specify the length of the hash characters that you want to get, it's always going to produce the exact result. However, if you change the input very minutely, it's going to change the hash drastically.
So far so good. Now what does hashing do, or what hashing is, and what are some characteristics of it? It's not an encryption. Encryption, we're going to learn about it later. It is also one way process, so it's very, very easy to get a character, and then get a hash of it, but it's very, very hard, and it takes a lot of computational power to get a hash and try to go back to this and get the characters from it. Now there are hash collisions, so because you can take a textbook and put it in the hash function and then get the same length of characters, maybe 15 or 61 or something, and then you can get a word. Then also you could put it through the hash function, and you could still get the same hash length, there is a possibility to have hash collisions, so in order to reduce that people use salt. What that is that for each password there is a unique value that you add at the beginning, and then you hash that, and then that's the salt that you will get. But that's not important if the length of the hash that you're requesting is longer.
We also need to take care of something called rainbow tables. What that means is that there are hackers that have already hacked or used the hash passwords of these big companies. For example there were hacker attacks every day, and they steal the hash passwords of all these companies. Now they have the power and the resources, some of them, to reverse engineer what the hash would look like, or what the passwords for the hash value is. What happens is that at some point there are a database full of the passwords and their hash, and what you can do is when your user creates the hash, or when your server creates the hash of the password, you need to make sure you check that hash against this leaked database as well to make sure that's not in the database that has been leaked by or hacked by other hackers.
Now password truncation. To just reduce some complexity, some engineers will use... If a user is maybe setting a 15 character password, they only use the first 10 character of the passwords to make a hash of it. Now these ways have been deployed in production and big companies, so you don't want to do that, because that's going to introduce more risk and more reverted to you when you're doing password hashing. Okay now that we have covered this, what we need to do is when we deploy our hash, our hash is going to look like this. Our password is not going to be a plain text any more, it's going to be in form of a hash.
Now we are going to talk about how to securely save password. We have also talked about how to save these passwords in the server as well. How do you hash it and how do you safely handle it in the server, but what about connection between the client and the server? We know that it goes through internet and routers. How does that work? How do we keep it secure? That is where SSL comes in. I'm pretty sure you guys have heard of HTTPS, right? HTTPS is a secure form of communication and it's basically HTTP plus SSL. The first thing that SSL does is it encrypts data. What is encrypting data? Imagine we talked about how Facebook creates a letter that has a header, and then it has all the values that you're sending. Now what happens is that your server and database, before they communicate any data, they will talk to each other, and they're going to agree on a encryption process. Then your client side is going to encrypt this data, it's going to take this layer, encrypt the data going through a function that they have agreed on, and then it's going to get the gibberish character. That is going to be sent over internet to your server, your server is going to decrypt the data, and then they're going to get the letter. So, instead of sending the plain text they are sending the encrypted information.
SSL also does something else. It's called identification. When the client and the server talks, they don't know if they are who they say they are, so they have somebody else that both of them trust. That's called certification authority. The certification authority has already established trust with the client, and they have already established trust with the server. Now, if the client is not sure, or want to check if the server is who they say they are, the client is going to go to certification authority and confirm the server's identity. The same goes for the server. If the server wants to test on who the client is, the server is going to go to the certification authority and confirm the client's authentication process. SSL, one, does our authentication or encryption, the other thing that it does is the identification.
Now let's go back to our data flow. I am trying to sign up for Facebook, I put my user information in there, and then Facebook is going to create for me kind of a letter. It has a header the content type, it has a metadata about the user information, and then it has the user information itself. This letter is now put in an envelope that is secure and sealed, instead of earlier which was kind of a see-through envelope. Then it is sent over to internet to the URL that we have requested before. So far so good. One thing that we haven't discussed so far is communication between server and the database. We have talked about communication between client side and the server side, but how do you securely send data between server and database?
There are many ways, for lack of better term, people can screw up, but one of them that is kind of very popular is called SQL injection. What I mean by that is, let's say in your server you have a username that is going to get the user information, and you have another value, it's called user password that will get passwords information. You plug them in and say, hey, whenever this username is added you match those values together, you kind of add them together, [inaudible 00:18:24] them, and then at the end of the day it's going to be like this. So if I'm putting my username and password, my username and password is going to look like this, and that's how the authentication works.
So what happens is, if this is the case, there is a way for a hacker or malicious person to manipulate the data that you have in your database from the client site form. What they could do is that they could have a username variable, and then they could just kind of form something like whatever the username is, or zero equals zero. Then whatever the password is, or zero to equal zero. Now normally what the username and the password is, zero equal to zero is always going to be true, and if that is the case you have already given access to a person without them verifying the username and password, and they can access the user profile.So that's one of the ways a very small missed step, or a very small overlook in our authentication process can have drastic impacts.
So once we have covered this, let's talk about something else that's called sessions and cookies that you guys have heard a lot. What am I trying to solve in sessions and cookies? When I'm requesting a resource, I'm going to be asked to provide my username and password, that's cool, right? Now if I request another resource at this point in time I'm also asked for a username and password. It gets very tedious. Whenever I need a user information, I'm going to be asked about username and password, I have to add it in. So I want to reduce how many times the client or the server is asking for my username and password. The first thing I do is when I first require a resource, I put in my username and password, then the next time I require a resource, somehow I want the client to remember, and the server, to remember that I have already logged in before, and I do not want to provide my username and password. So, how do we do that?
I'm going to walk through an analogy with you guys in here. Let's say John go to a conference and he needs to check in his hotel room. Goes to receptionist, provides his ID, receptionist authenticates him, then receptionist creates a profile for him. The profile has an ID, now receptionist is going to get that ID that is in the profile, mount it in a card, a hotel room key card, and then is going to give it to John. Now what happened is that the card that John has, he can use that to go to receptionist again without giving his ID, that's how he authenticates. He can use that card to open the door at the hotel, as long as his stay is valid. The card that John has is nothing more than just a few values that's important, but all the information that John requires from the hotel, for example how many times he has ordered in, what is his credit card, and information like that, is not saved in the card. It is saved in a session, in a profile in the receptionist's desk.
The card that John has on himself, that's called the cookie. The client has a card, or random values of character that's only valid for this profile. Now the profile that the receptionist saves on John, that's called a session. So we can talk about this in terms of a WAP application, so I have tried to access the resource, I log in, then what I do is as soon as I log in the server is going to send me a token, a cookie. Then I save that cookie inside my browser. Then whenever I need to access any other resource, all I need to do is to, in my HTTP request, remember, each connection I have with server is going to be through and HTTP request, I'm going to send that cookie back. In order for us to be safely handling that cookie, it's very important to consider some few security considerations. One of them is HTTP only, so because your cookie is handled in the browser, you want to make sure that the extensions that you have in browser, any other java script file, cannot access to this cookie, so that's why you want only HTTP to access this cookie. You can also turn on the secure flag. What that means is that unless the connection is through HTTPS, you do not want to communicate the cookie and you do not want to accept the cookie.
Ephemeral is when as soon as the browser closes, normally what you want to make sure that the cookie is needed. Lastly, SameSize, what this means is that whenever whatever site that has provided the cookie is the site that needs the cookie. Some of the attacks or some of the security vulnerability would be... Let's go back to John. John goes to the receptionist, authenticates himself, and then the receptionist creates him a file, or a profile. Receptionist gets him the key, now John goes to a bar and meets some other person named Eric. John and Eric are having a really good time, and then John says, "Hey, there's a lobby in my hotel if you want to continue our discussion." So they go to a hotel, and John is letting Eric... The only person who could let Eric use that hotel resources is John, because he is staying in the hotel with that token, or with his key.
Now what happens is that when Eric leaves, John finds out that he have had resources being stolen from, let's say a hotel, or being stolen from him. Let's say his wallet is stolen. Now it's important to know in this scenario that John did let Eric come to the hotel and authenticate him, or vouch for him, and Eric did not break in. John used his credentials, which would be the card, to do this, so that's called cross-site request forgery attack, where somebody else tricks you into using your authentication system to get access to resources. For example, I'm pretty sure most of us have had those spam emails. One of them could be, "Hey, check out this cute video of a dog." But then as soon as you click on that an HTTP request is formed, and if you're logged into your account for example, the HTTP request is saying that "Okay, send $200 to Eric," or something like that. The user disguises himself as you, and makes you do something for gaining resources.
Now that we have covered cross-site request forgery, how can we avoid it? What we can do is that, when you try to log in, you go from browser to the server and send the information, server is going to send you back say a sort of token. Whenever you communicate with the server again, you are going to have to send that CSRF token back. All you need to do in terms of coding it, is it would be, in the HTL form, you will have an input type of hidden, and then you will have the CSRF token value. So now that we have covered that, let's talk about OpenID Connect. It's a very interest topic to explore so I'm going to go with an example. Imagine you were trying to go to Yosemite with your friends, and one of your friends says, "Hey, I'm going to do DJ since you're busy driving. I'm going to do DJ, and what's your password for your phone?"
Now, you don't want your friend to see all the details that you have in your phone, so you cannot give them your password, or you want to give them your password, but there are some implication to come with that. The same thing happens when you do that with websites. Since 2007 there are so many websites that needs your authentication through someone else. If you have opened Amazon for example, it can say, "Hey, log in with Okta, or log in with Google, log in with Facebook." Amazon is saying, "You don't have to give me your information, you have your information in Google, just give them permission so I can go get your information from Google." They're sending information between each other, so that's what we're trying to solve. We're trying to solve how to send your information between two... How do you give permission to send your information from Google to Amazon?
So what are the issues if you just give them your password? What happens if you give Amazon your password to your Google account? That doesn't seem like a nice thing to do or a healthy thing to do in security wise. First of all, there are some websites that could store your password in plain text, so that's not cool. They will have access to your full account, you cannot give them access, as remember your friend, when you guys were going to Yosemite, if you give them your password, you're not going to be able to stop them from accessing certain resources. Lastly, they can change your password, therefore revoking your right for your own account, so in this case Amazon is probably not going to do that, but Amazon can change your Gmail password, and then you don't have access to your Gmail account.
Okay, so how do we solve it? There's another thing here as well. Remember how we talked about server, and how server is the handling of authentication. What happens is that the server sometimes gets overwhelmed. The server is like, "Okay, am I going to do my job, or handle authentication?" Because handling authentication becomes super complex super fast, so the architects are like, "Okay how about we are going to create another server that only handles authentication. It doesn't do anything, only handles authentication," so kind of separation of concerns. Then you want your server to do whatever it does best, to do their own job.
For that architects, when you talk to, say you're over the internet, you create an authorization server, so there are companies that provides that authorization server, [inaudible 00:30:28] Okta, but also there are open search products that will also provides authentications that are equipped with the... Probably most of the state of the art technology, and they can save that data for you. Once the authentication is done, they will handle all kinds of authentication, now this server, this authorization server is going to send an ID token, which is going to be a token we will discuss later, but it's going to identify that user that they have already authenticated, or an access token to the server. Server now decodes that ID token or access token, checks if they have the access that they were looking for, and then do its thing, whatever they needs to do. So that way you are separate authentication and authorization concern.
Let's look at a little bit detail on how this happens. Heroku.com is a hosting platform, so when I go to heroku.com, it is a plain www.heroku.com. As soon as I click log in, what happens is that I'm going to be redirected to id.heroku.com. So as you can see they are different. I'm going to put my username and password there, then I'm going to be redirected to dashboard.heroku.com/callback. They, ID and dashboard, are going to communicate again, and then get that quote, and exchange it for ID token and access token. Then what happens is once the dashboard has ID token and access token, now it can decode the ID token and get the user information, whatever they need. As you can see, I am going to go through Heroku, and now Heroku redirects you, it doesn't care, it doesn't do any authentication, it redirects you to another server that's called id.heroku.com. They do their authentication process with the last URL that we need to go to, which is dashboard.heroku.com. I have a video of it that will show it as a certain details, so if you can check the URL at the top, it will show you that right now it's heroku.com, and as soon as I click log in, it is id.heroku.com, I click log in, and suddenly it's going to go to callback, so dashboard.heroku.com/callback, and after an exchange of code with the ID token I'm going to be redirected back to the dashboard.
So we are going to do that, we are going to check that in a little bit more detail here. So what happened was that I logged in, and then my authorization server redirected me. I got to the authorization server, now the response type is code. I'm asking authorization server to give the other URL code. Then I'm asking them to give them the profile information as well. I can ask them more information, I can say, "What is the address? What is the email?" Whatever information I want to get them. The scope narrows down the responsibilities that a person could have. For example, in case of one of your guy's friend in the road trip in Yosemite, you can reduce the scope to see only Spotify. You say, "Okay, give this friend a token, and this friend can only access Spotify and nothing else." So it limits the resources and separates the kind of roles.
Now the exchange with code and ID token will happen, so the authorization server is going to send a code to the dashboard, and then the dashboard is going to send a code back and get an ID token. Now the dashboard is going to digest that ID token, and either give you access and give you information... Again, now the server, or my normal dashboard.com, has information about you, and it knows how much access do you have. Okay? So far so good. Because this token is also using session, and also because this token, it needs to be secure, you also need to put all the flags that we discussed there. Maybe you determine the age, you want to have a [inaudible 00:35:06] secure inside and all of that. So now that we have covered that, let's look at a little bit of what these tokens, as we talk about this. We talked about how the authorization server is sending you a token, ID token and access token. What does that entail?
If I have an ID token, an access token, ID token has identifying information about you, so it says you can have a card that says, "Okay, enter your driver license." Has your name, your date of birth and things like that, that's identifying information about you. Now if you want an access token, an access token will not have identifying information about you, but it says, "This person has access to this room." So imagine you go to some garage and warehouse, and all you are carrying is a badge that says, "Whoever has this badge have access to garage number 456." Something like that. So that's called an access token. They're a client side storage, so earlier we were talking about cookies and sessions, and cookies were just a random string of characters that would mean nothing if there was no sessions in the server side, but in this scenario the token is self-containing everything. All the information you need is already in the token, and you can store it in the client side, be it it needs to be safe.
Lastly, it's in form of JWT. So, let's look at what the JWT is. JWT is just a format of data. It's called self-contained JSON object, so it's data the way you store data. It's digitally signed, so make sure that it helps with the authentication process if it is digitally signed. Then lastly it is encoded, so it's not encrypted, it's not hashed, and it's encoded, that just makes it easy to read the data. This is how it looks like, a JWT. As you can see, it has a header, the top part, it has metadata about the values at the bottom, and then it has a pay load. A pay load has information that we have requested, so all the scopes and the data you have requested. Then it has a signature. Signature is used for approval of if the data has been tampered with or not. I can decode the ID token, and it's going to look something like this. Depending on the values that I have requested, I going to have more scopes or less scopes, and that depends on the values that I have requested. As you can see, it has identifying information about me.
Okay, so let's look at how this helps. How does it compare to my given password? What happens is that at this time you can revoke the token whenever you want. You can say, "Hey, I don't want you to have access to my phone any more, I'm going to revoke the token." You can also extend the life of the token, so you guys are still in the trip four hours later, and the token was about to expire, you extend the life of it. Lastly separate the roles, so as we talked about, you don't want your friend to have access to your other sources or other data points, so you separate the roles.
So let's summarize what I've covered so far, and then we'll go from there. We talked about how to securely store passwords, and how the data communication between the client and the server works. We also talked about session and cookies, and how they're connected together. We also talked about OpenID Connect, and why it is important, why is it necessary to have OpenID Connect. OpenID Connect is not a stand alone protocol, and it does not replace the sessions and cookies, but it also use that to minimize the access and increase the security considerations.
Well then, so that being said, this is the scratching of the surface with the authentication process, what happens is that over time these things will change. We are increasingly... We have IoT, we are increasingly increasing communication between devices and between people, so everything keeps updating and changing. But the point is that it's very important if you're creating your own authentication to make sure that you have studied intensively all the authentication from A to Z, but usually it's a good idea to give up authentication systems and ask. For example, you can have open source authentication system or authorization server, and ask other people who are experts to do it, but you're responsible for implementing them correctly.
That being said, thank you for your attention, and have a good Oktane. Bye bye
Curious how web authentication actually works? In this talk you'll learn exactly what happens behind the scenes when a user logs into a website. Along the way, you'll learn about TLS, password hashing, OpenID Connect, OAuth 2, JSON Web Tokens, and more.