NoSQL databases are all the rage lately, with some touting them as the demise of the traditional RDBMS. I don’t share that view, but thought it worthwhile to begin exploring the new technologies. A coworker of mine had gone through a seven week online course for MongoDB through MongoDB University (https://university.mongodb.com/) and was very happy with the outcome, so I thought I’d give it shot, also.
The course was free and took 2 – 4 hours a week to complete. It consisted of a video “lecture” with ungraded quizzes at the end to help reinforce the material. Each week had a few homework assignments to turn in. Some of the material was dated but they made a strong effort to point out those areas. I was able to complete the course with a minimal amount of effort, but felt very good about the knowledge I’d gained.
So why MongoDB? It was created by a group of developers, so one of the first attractions to the technology is that you interact with the data through JavaScript. A developer doesn’t have to worry about learning SQL, they can stick with their “native” language. Additionally, all data is stored in JSON (JavaScript Object Notation) documents, with which most programmers are already familiar. The underlying data structure of the “database” more closely resembles OO programming constructs (a good article here). One more attraction to the technology is that interaction with the data is ‘schema-less’; there is no set structure that needs to be adhered to when getting information out of the JSON documents. But, one of the main advantages of MongoDB is that it scales well horizontally, without the need to purchase high end hardware. This is ideal for dealing with “Big Data” (yes, I used the buzzword).
How does MongoDB scale on commodity hardware? It uses a concept called “sharding“. Think “partitioning”. Data is separated physically through the use of a “shard key” and access is managed through a central process. Each server contains a subset of the data, “partitioned” across each node. Node one would contain IDs 1 through 1,000,000. Node two would have 1,000,001 through 2,000,000. Node three 2,000,001 through 3,000,000, etc:
In addition, MongoDB uses the concept of “replica sets” (think “replication”). There are multiple servers in a replica set which contain a copy of the data, this allows for read only secondaries to help distribute load. It also maintains high availability:
A blending of the two configurations creates a highly available, high performing environment:
Servers can be added or removed without interruption of service. It’s an inexpensive way to scale horizontally. It also allows for maintenance and high availability. I can do a “rolling” patch or upgrade to servers while still allowing access to the data. I’m also not completely down if a server goes offline.
This was part 1 of a series of blog posts, focusing on a brief overview and the architectural advantages of MongoDB. Stay tuned for more posts.