Hacker Monthly #27 by Netizens Media

Hacker Monthly #27 by Netizens Media

Author:Netizens Media [Media, Netizens]
Language: eng
Format: epub, mobi, pdf
Publisher: Netizens Media
Published: 2012-08-01T07:52:01+00:00


Can you see where this is headed? You have just solved the problem of which server to use for resource A. You start where resource A is and head clockwise on the ring until you hit a server. If that server is down, you go to the next one, and so on and so forth. In practice, you’ll want to use more than 1 or 2 points for each server, but I’ll leave those details as an exercise for you, dear reader.

Now, allow me to use bullet points to explain how cool this is:

• Assuming you’ve used a lot more than 1 point per server, when one server goes down, every other server will get a share of the new load. In the case above, imagine what happens when server #2 goes down. Resource A shifts to server #1, and resource B shifts to server #3 (Note that this won’t help if all of your servers are already at 100% capacity. Call your VC and ask for more funding).

• You can tune the amount of load you send to each server based on that server’s capacity. Imagine this spatially: more points for a server means it covers more of the ring and is more likely to get more resources assigned to it.

• You could have a process try to tune this load dynamically, but be aware that you’ll be stepping close to problems that control theory was built to solve. Control theory is more complicated than consistent hashing.

• If you store your server list in a database (2 columns: IP address and number of points), you can bring servers online slowly by gradually increasing the number of points they use. This is particularly important for services that are disk bound and need time for the kernel to fill up its caches. This is one way to deal with the datacenter variant of the Thundering Herd Problem [hn.my/thunder].

• Here I go again with the control theory — you could do this automatically. But adding capacity usually happens so rarely that just having somebody sitting there watching top and running SQL updates is probably fine. Of course, EC2 changes everything, so maybe you’ll be hitting the books after all.

• If you are really clever, when everything is running smoothly you can go ahead and pay the cost of storing items on both their primary and secondary cache servers. That way, when one server goes down, you’ve probably got a backup cache ready to go.

Pretty cool, eh?

I want to hammer on point #4 for a minute. If you are building a big system, you really need to consider what happens when machines fail. If the answer is “we crush the databases,” congratulations: you will get to observe a cascading failure. I love this stuff, so hearing about cascading failures makes me smile. But it won’t have the same effect on your users.

Finally, you may not know this, but you use consistent hashing every time you put something in your cart at Amazon.com. Their massively scalable data store, Dynamo [hn.my/dynamo], uses this technique.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.