Header text

EssayTagger is a web-based tool to help teachers grade essays faster.
But it is not an auto-grader.

This blog will cover EssayTagger's latest feature updates as well as musings on
education, policy, innovation, and preserving teachers' sanity.

Friday, July 1, 2011

Two-legged authentication via OAuth2 to Google Storage: Easier than it seems

First of all, if you're reading this post on purpose and have any clue what the title actually means, then you're in the right place.

If you aren't a tech geek, give up now and make better use of the next five minutes of your life.

It's taken me all week to get this far and to actually get this stuff working! I'm writing this to ease the pain of anyone else trying to do something similar and, more importantly, to help myself out later if I have to go through this again!


Overview: three-legged authentication
Here's my rough understanding of the OAuth system:

Three-legged authentication is the usual use-case for OAuth where: a user (1) permits a client (2) to access the user's account on a third-party's system (3).

Here's an example: I use Adobe Lightroom to process my photos. Lightroom can also automatically publish photos to my SmugMug account. Lightroom uses three-legged authentication to make that happen. Let's say that SmugMug uses the exact same authentication methods as Google (since, uh, that's all I have experience with).

Lightroom (the client) already has an arrangement with SmugMug (the server). SmugMug has established a client ID and secret key (aka "consumer ID", "consumer secret key") for Lightroom to use. Only SmugMug and Lightroom know this client ID/secret combo.

Lightroom uses that client ID (but not the secret key) to set up a new authorization request for my account. Lightroom sends me to SmugMug's authorization Website. SmugMug asks me to approve the connection to Lightroom by entering my SmugMug username and password. This slightly odd user flow is important because Lightroom never gets its hands on my SmugMug login/password. That's the whole point of this crazy, complicated system.

Once I approve the connection, SmugMug sends Lightroom an authorization code.

Now here's the part that seems a bit odd and ultimately redundant: Lightroom can't actually do anything with that authorization code. We went through all that work for a secure code that doesn't do anything yet?!

It turns out that there's an extra layer (i.e. hoop to jump through). Lightroom takes that authorization code and proves that it really is Lightroom by sending that authorization code back to SmugMug along with the client ID again and, this time, the client secret key. Remember: only Lightroom knows those client values.

My guess is that having those three values (authorization code, client ID, client secret key) encrypted together makes it highly unlikely that any one of them can be computed back out of an intercepted stream.

Once SmugMug receives that request, it will send back what we've been waiting for: an access token and a refresh token.

The access token allows Lightroom to interact with SmugMug on my behalf. Upload photos, rename photos, create new galleries, whatever. But every single operation sent by Lightroom must be accompanied by that access token, the client ID, and the client secret key (again, encrypting three values together).

As an additional layer of security, the access token has an expiration timeout. After, say, an hour it will no longer work.
That's why Lightroom was also supplied with a refresh token.

Once the access token expires, Lightroom must request a new access token if it wants to continue to interact with SmugMug on my behalf. So Lightroom will send the refresh token--along with its client ID and client secret key (once again encrypting three values together)--to SmugMug to request a new access token.

Now that we've covered the process end-to-end, I still don't understand why that intermediate authorization code step was necessary; it seems like SmugMug could just send the access token and refresh token immediately and avoid that authorization code altogether.

Every time I think I've figured out the reasoning behind it, I find a new flaw in my thinking. I give up. I don't understand why that authorization code step is necessary, but it's part of the process and we just have to live with it.


Overview: Two-legged authentication
Two-legged authentication means that the transaction only involves you and the third-party server. You are essentially authenticating your own code to access your own account on someone else's server.

This is the case for EssayTagger.com. We are using cloud storage via Google Storage and need to authenticate our own transactions into and out of our own account on Google Storage.

The steps are the same, except we will only have to worry about handling ONE user account authorization request to Google Storage--our own (whereas Lightroom in the three-legged example has to negotiate user account authentication for each and every individual user that wants to setup a link into their own SmugMug account).

And once we go through all the steps and get that refresh token, we're good to go (forever?).


Two-legged authentication with Google Storage
I didn't have to build any code to do the first authorization step; I just manually built the https:// string and ran it pasted it into the address bar of my Web browser. Google's authorization site responded with an authorization code which I temporarily hard-coded into my Java code then used ONCE to do the first access token request (which also returns the oh-so-important refresh token).

And, as far as I can tell, there's no need to worry about the callback part of the authentication process (I think the callback is where the authorization code would normally be sent to, whereas I just manually copied-and-pasted the authorization code).

Now that I have the refresh token my code is able to request a new access token whenever it needs it. As a result, I am now free to make fully-authenticated programmatic calls into and out of EssayTagger.com's Google Storage account!

One other part that really confused me: You do NOT need any special OAuth libraries to do any of this. I'm just using Apache's httpclient library. Each step along the way is just a simple HTTP PUT. All of the IDs and secrets and codes and tokens are just sent as part of the HttpRequest header or are encoded as URL params.

Retrieving the access token and refresh token is easy too: it's just a matter of reading out the HttpResponse. Google's responses are wrapped in a JSON entity which is easy enough to parse manually, but I am using the simple Jackson-mapper library to offload that task.


How to do it: the specifics
Everything I needed for two-legged authentication into Google Storage ended up being right here:
http://code.google.com/apis/accounts/docs/OAuth2.html#IA

  • Set up OAuth in the Google API Console.
    • Establishes your client ID and client secret key that only you and Google should know. 

    • Follow that link above to Google's instructions. See "Getting an access token"
      • This is the step I did through my Web browser without writing any code. 

      • Copy the resulting authorization code.
        • It'll be displayed in the browser window after you login to approve the connection. 

        • Then have your code (or however else you might make an HttpRequest) send the "authorization_code" "grant_type" request as outlined in the next part of the instructions. 
         
          • Google will send back your initial access_token and the real prize: the refresh_token. 

            • STORE YOUR refresh_token !!

              • Write code that inserts the current access_token into the Authorization header of any request into or out of Google Storage:
                • httpPutRequest.addHeader("Authorization", "OAuth " + curAccessToken); 

                • Get a new access_token when the current one expires (1 hour) by performing the Request outlined in the "Getting additional access tokens" section. Again, all you need is your refresh_token and your client ID and client secret key.

                  Seems like a lot of steps, but after all that setup, all your code needs to worry about is using the current access_token and doing that last step to get a new access_token over and over again.

                  However, if the refresh_token ever expires, uh, there'll be trouble.


                  Wrapup
                  I hope this was helpful! I'm not joking when I say it took me ALL FREAKIN' WEEK to figure this all out and get it working. But it really turns out to be fairly straightforward. Good luck!