amazon web services - designing for failure when polling/retrying/queueing is not possible -
i'm designing website/web service hosted in cloud (specifically aws although that's irrelevant), , i'm spending lot of time thinking "designing failure". want system seamlessly handle node failures, i.e. without significant user impact or engineer intervention.
in cases, it's easy see how handle sudden node failure. if app has api handled 4 servers behind load balancer, polled ajax or iphone app, poller can detect failed tcp/ip transmission , retry... assuming load balancer behaves correctly, hit healthy instance.
if app more processing-oriented, queue service sqs can used allow stateless nodes pick failed nodes left off.
the difficulty see "points of entry", no retry/polling possible because application hasn't been loaded yet, , failure means app never starts. example, index.html on webpage... if node fails while transmitting file, user's browser hang , not automatically retry (they need refresh).
the load balancer single "point of entry/failure". however, in case appears can solve problem creating multiple load balancers, , load balancing them using dns load balancing described here: http://blog.rightscale.com/2012/10/23/dns-load-balancing-and-using-multiple-load-balancers-in-the-cloud/
is solution work simpler index.html case? overall, how can create redundancy polling/retrying/queuing not possible?
edit: idea have index.html hosted statically on cdn, s3, etc (where resource availability more dependable), although prevents using dynamic content. dynamic content added if page populates using js, adds dependency on js latency user.
Comments
Post a Comment