Header text

EssayTagger is a web-based tool to help teachers grade essays faster.
But it is not an auto-grader.

This blog will cover EssayTagger's latest feature updates as well as musings on
education, policy, innovation, and preserving teachers' sanity.

Tuesday, July 5, 2011

A Big Decision: Use a traditional Web host or commit to Google App Engine (GAE)?

Oh tradeoffs, why must you torture us so?

 

The dilemma:
One of the hundreds of things on my to-do list is to figure out Web hosting. For what we want to do a Virtual Private Server (VPS) or cloud VPS seems like the best option. A standard Tomcat host environment won't work because our current designs rely on having OpenOffice installed as a service that we can make calls into. A VPS solution essentially gives us a fresh server that may or may not even have an OS installed on it yet and we can install and configure it however we like.

A VPS setup gives us that freedom and a cloud-based VPS gives us scalable horsepower. Awesome.

But what happens when the server instance crashes? What happens when the database crashes? Presumably a VPS (and certainly the cloud variety) would have redundant-enough distributed/RAID data storage, but what happens if we do suffer data loss?

Any half-decent dot-com needs to have recovery strategies in place to handle any possible calamity, regardless of how unlikely they are.

And here's our first problem: I'm not much of a hardware guy. I really do NOT want to muck through all of this.

But even if we had the world's greatest server admin, we would merely be in good position to recover from disasters when they happen; no one--no matter how good--can make server disasters disappear altogether.

And a VPS--or any Web host or colo option--requires too much machine/environment maintenance and disaster recovery.

The solution(?): Google App Engine
The best way to offload these concerns seems to be to run your code off of Google App Engine (GAE) instead of a traditional Web host/Web server.

The folks at Google are experts at machine setup and maintenance. They are experts at 24/7/365 availability. They are experts at data storage, data redundancy, and scaling computing power up and down to handle sudden bursts of user traffic.

Their data center is insane.


Jump to 2:55 to get to the good part.

And there are economies of scale; all the work that Google has put in to their hardware can be easily--and cheaply--shared with others. And I do mean "cheap". GAE accounts are only charged once a specific usage quota is reached and those above-quota charges are ridiculously low. If your site is low-load, low-bandwidth, it's entirely possible to stay below the charged level (in other words: free) or just barely cross over. Other than the $0.30/day flat operating fee, EssayTagger.com might generate zero additional usage fees. We might get the benefits of the world's most robust, most stable, most scalable Web host/application server for $9/month. Whoa.

So GAE sounds like a no-brainer, home-run solution, right?


The downsides:
Well, there are tradeoffs, of course.

With all of that scalability and abstraction away from the machine layer, GAE naturally is much more limited in what it can do (and therefore limits what your code/site can do).

Install OpenOffice in GAE? Forget it. Not an option and probably never will be an option (unless they port OpenOffice to run completely within a Java JAR).

Run Apache and Tomcat with your favorite configuration settings? Uh, there is no Apache or Tomcat anymore--you're in the GAE world and only the GAE world.

Connect your code to a MySQL database? Forget it! No databases in GAE! (say wha...?!)

Make any Java call that's part of the standard JRE? Nope! There's a white list of supported Java packages and it definitely does not have full JRE coverage. Not only does that mean that parts of the JRE are off-limits to your code, it means that any support libraries that you're used to using that happen to rely on an unsupported package will not work!

GAE isn't just a Servlet container (ala Tomcat) that's processing your Java code, it's the whole dang world (as far as your code is concerned). And it's a constrained world that the Google developers have slowly carved out piece-by-piece. That world is growing all the time with new features and new support, but the point is that it's way more constrained than you're used to and, in some ways, it feels like a violation of what Java has been all about since its inception.


Wait, back up: No database?!
This part kind of blew my mind. Instead of a database they have a JDO/JPA-ish datastore implementation. You create a Java DTO, attach some JDO/JPA annotations to the member variables, and then you stuff the Java Object into the datastore. Done.

Where does the datastore live? Is there a database living underneath it? Who knows? As a programmer, all you care about is stuffing Objects into the datastore and then querying to get them back out. It's actually quite cool. Well, until you read up on the limitations. And those queries? Not quite what you're used to. You can't do table joins. Seriously.

Oh, and here's another thought: with no traditional database, forget ad hoc queries to check on the state of your site, users, etc. You'll have to code that in as some sort of internal reporting Servlet.

And the JDO/JPA integration is somewhat incomplete. They were meant to be an abstract front-end to any arbitrary relational database source. But this isn't a relational database. So the JDO/JPA model doesn't quite fit and GAE can't support all of the JDO/JPA spec. In other words: some stuff will work as you expect. Some stuff will work, but not as you expected. Some stuff won't work at all.


A new world comes with new paradigms
What does a well-structured, well-designed webapp look like in the GAE world? I don't know. There are a ton of established Java webapp architecture/design paradigms and patterns--and mature frameworks to implement or enforce those patterns--that might no longer make sense in the GAE world. I'm used to the hibernate-spring-struts trinity of super-bloated Java enterprise coding. But what are the common best practices for GAE? Who are the thought leaders, the expert architects to guide the rest of us code monkeys?


Did I mention you're trapped?
There's really no backing out of GAE once you commit to this route. GAE is its own world so your code only makes sense in that world. There's no portability. There's no, "we'll just deploy the site elsewhere if it doesn't work out." That's part of the reason why I say it feels like a violation of Java's spirit. "Code once, run anywhere!" was the battle cry. No longer. Not in the GAE world.


GAE is sounding less and less appealing, but...
Oh, I agree. The benefits of GAE would have to be pretty damn compelling to offset all of these negatives/challenges/frustrations.

And, for now at least, I think the benefits of GAE do outweigh those issues.

Here's what it comes down to for me: the worst-case scenarios for traditional Web hosting, VPS, or colo-ing our own server(s) are really, really bad. Downtime. Lost data. Furious customers. Impossible machine configuration and restoration complexities.

I don't have the expertise to tackle these issues with confidence, nor am I interested in the least in acquiring that expertise.

And for a company that currently consists of one full-time programmer, a part-time Flash developer, and a part-time COO, it would be absurd to bring in another person just to babysit the hardware and work out the disaster recovery mechanisms.

The fact that GAE all but eliminates the worst-case scenarios and frees me to worry about the code (and a thousand other things not having to do with servers!) gives GAE a huge edge.

The decision is simple. I cannot accept the inevitable server disasters. So I must accept the frustrations and code limitations inherent to GAE.

Barring an insurmountable tech hurdle, Google App Engine wins.


ps - what about that OpenOffice requirement? Well it turns out that Google Docs has a reasonably good API. It'll take some testing, but it should be able to replace the functionality that we were relying on from OpenOffice.