21/01/2010

Episode 2: A brief introduction to NoSQL databases

In our second episode (12 minutes long), Alex and Nat talk about the new generation of “NoSQL” databases that have created a lot of interest among web developers; especially those lucky people dealing with thousands of simultaneous users and terabytes of data.

Please feel free to leave a comment below after you’ve listened to the episode. We’re still total newbies at this podcasting thing, so your feedback and encouragement are a big help!

If you want to learn more about NoSQL than what we covered in the show, check out these links:

Nice introduction to all the basic concepts: consistency models, replication, vector clocks.
A comparison of NoSQL alternatives and a good braindump of the subject matter.
Amazon Dynamo paper. Great readable paper introducing the core concepts for massively scalable datastores.
BigTable paper. Another cornerstone paper.
How FriendFeed uses MySQL to store schema-less data

The Big Guys:

Voldemort
Cassandra
HBase — We didn’t get to this one, but it’s modelled on BigTable, and can replicate across geographically separated datacenters (Cassandra needs faster roundtrips). And it’s what Hadoop uses internally.

Midsized:

MongoDB — Great for storing JSON objects.
CouchDB — Erlang based, uses javascript as a query language.

Niche:

Redis — memcached with persistence and useful list/set/ordered-set datatypes.
Redis twitter implementation — simple example of building a twitter-like system on top of redis.

Underlying Technology

Consistent Hashing.
Vector Clocks — See section 4.4 in the Amazon Dynamo paper.
Important relationship between Consistency, Availability and Partition Tolerance, called the CAP Theorem.

The image above is a picture of a Google datacenter in Oregon, where they no doubt run BigTable.

Página web del episodio

Programa

Hacker Medley
Publicación

21 de enero de 2010, 1:29 UTC
Clasificación

Apto

Episode 2: A brief introduction to NoSQL databases

Ficha técnica