Header text

EssayTagger is a web-based tool to help teachers grade essays faster.
But it is not an auto-grader.

This blog will cover EssayTagger's latest feature updates as well as musings on
education, policy, innovation, and preserving teachers' sanity.

Friday, October 26, 2012

Site down: Google's servers experiencing problems

Friday, October 26th, 9:53am: 
The site is currently down. We run on Google's "App Engine" (GAE) infrastructure and they are currently experiencing problems that render our site -- and other prominent App Engine sites like KhanAcademy.org -- inaccessible.

Google App Engine status can normally be viewed here, but even the status page is failing to respond.

Needless to say, this is inconvenient but also rare; Google's infrastructure is among the best in the world and they rarely ever see interruptions of their App Engine service.

Interestingly, google.com search service is still functioning (as is blogger.com -- as evidenced by this post being publishable!). Not surprising that they'd have a separate set of servers for their core business.

Follow the latest updates on #GAE via Twitter:

And my own Twitter account:

UPDATE 11:10am
EssayTagger.com has begun to respond again, but service is intermittent. Google App Engine is not yet stable.

UPDATE 11:35am
From Google's Max Ross: "At approximately 7:30am Pacific time this morning, Google began experiencing slow performance and dropped connections from one of the components of App Engine.  The symptoms that service users would experience include slow response and an inability to connect to services.  We currently show that a majority of App Engine users and services are affected.  Google engineering teams are investigating a number of options for restoring service as quickly as possible, and we will provide another update as information changes, or within 60 minutes."

UPDATE 12:51pm
From Google's Christina Ilvento: "We are continuing work to correct the ongoing issues with App Engine.  Operation has been restored for some services, while others continue to see slow response times and elevated error rates.  The malfunction appears to be limited to a single component which routes requests from users to the application instance they are using, and does not affect the application instances themselves.

We’ll post another status update as more information becomes available, and/or no later than one hour from now."

EssayTagger.com is now responding more consistently. Cautiously optimistic that we're through the worst of it.

UPDATE 1:45pm
The App Engine status board is looking better. The error spike is returning to more sane levels but the system is still in an "elevated" problem state.

EssayTagger.com performance is still a little unpredictable with intermittent reports of documents that couldn't be uploaded to the system. We rely on Google Docs under the hood to process incoming documents so even if our site is working, this integration point with Google might still see issues.

Update 2:07pm

From Google's Christina Ilvento: "At this point, we have stabilized service to App Engine applications. App Engine is now successfully serving at our normal daily traffic level, and we are closely monitoring the situation and working to prevent recurrence of this incident.

This morning around 7:30AM US/Pacific time, a large percentage of App Engine’s load balancing infrastructure began failing. As the system recovered, individual jobs became overloaded with backed-up traffic, resulting in cascading failures. Affected applications experienced increased latencies and error rates. Once we confirmed this cycle, we temporarily shut down all traffic and then slowly ramped it back up to avoid overloading the load balancing infrastructure as it recovered. This restored normal serving behavior for all applications. 

We’ll be posting a more detailed analysis of this incident once we have fully investigated and analyzed the root cause."

So in theory EssayTagger.com and all other affected websites should be back to full power.