What is MongoDB?
MongoDB is an agile, performant, scalable, and highly available database built to back today’s applications and data.
Things are a little different in MongoDB, rather than columns, rows, and tables that we find in an RDBMS, in MongoDB we have fields, documents and collections instead. A document is similar to a row; inside a document, there are field-value pairs of any of the supported data types, arrays, and even another document that can be used as the value. This promotes the use of rich hierarchical documents that support fast data access patterns.
High availability is provided in MongoDB by running a replica set. A replica set is multiple MongoDB servers, usually three, each containing a copy of our data. Each replica has a role of Primary or Secondary, any writes to the Primary are replicated to the secondaries. In constant communication with each other, the replicas can determine a failure of the Primary and elect a new Primary to take its place, drivers are aware of the replica set topology and can connect to the new primary. When the failed server is back online it can reconnect to the replica set, catch up, and continue to provide high availability. Replica sets also play a critical role in durability with write concern, read replicas, and change streams. We won’t be touching on these topics in this post.
When we reach the limits of what a given server can deliver in terms of resources or the database becomes so large a recovery exceeds your Recovery Time Objective (RTO) we can start to horizontally scale MongoDB. MongoDB horizontally scales by using Sharding. Each shard of a sharded cluster is a replica set that will contain a subset of the data of the sharded collection. For the sharding of the data itself, we can identify fields or a set of fields to act as a shard key. At its most basic you could leave MongoDB at this point to route traffic to the correct shards and balance the collection between them.
There are more advanced methods that can be applied to provide more performant use. Aside from the additional costs required for the shard replica sets, additional components are required in the form of a shard configuration replica set and Mongo’s query router. MongoDB also supports Joins and Transactions. Although most designs will steer away from using these features sometimes they may be a necessary tradeoff with performance.
Using MongoDB vs Using a Relational DB
For a developer not fully versed in normalization concepts compared to an RDBMS MongoDB makes it super easy to get started, no complex relations or normalization are required, and later changes to the schema don’t require time-consuming migrations. MongoDB is easy to scale, starting small and hitting incredible growth. MongoDB is a database that can easily grow with you. MongoDB provides high-speed access to complex objects, perfect for OLTP and general-purpose workloads.
Development can be accelerated with MongoDB which uses application-defined schemas. With a connection string and write permission, you can write data to a collection right away, no need to create a collection and define types in the database beforehand like a rigid RDBMS. The use of idiomatic drivers supports writing code in the style that developers are used to keeping the code clean and developers focused. Basically, MongoDB is extremely developer friendly making it easy to get started and then eventually scale your solution as required.
Conclusions
As with any technology, it is important to educate yourself on the best practices by reading the documentation, investing in training, or working with a trusted partner to get the best out of this fantastic database.
We look forward to posting more on MongoDB’s features, how to use and configure it, and a host of other topics.