Free Wins in Storage and Read Speed: Flat Schemas vs Structured Schemas

When developers or admins who grew up with the “relational way” move to MongoDB, they often design flat documents. That habit makes sense, since relational modeling trains you to think in two dimensions, with data spread across tables.

MongoDB stores data as BSON documents, which are close to a binary form of JSON with a few differences. Because of that format, schemas can have multiple levels. You can read more in the BSON specification and its differences from JSON.

A MongoDB document is a set of key and value pairs. A value can be any BSON type, including nested documents, arrays, or arrays of documents.

Using nested documents or arrays lets you model a structured schema, where one field groups related details. This is an alternative to a flat schema.

Consider the same user record expressed both ways:

schema comparison

Both versions hold identical information. In flatUser everything sits on one level. In structuredUser fields are nested to reflect related data.

Why pick structured instead of flat? The short answer is that structured schemas can use less disk space and they can be quicker to traverse. To see why, it helps to recall how BSON is laid out.

For our purpose, think of a BSON document as a list of items, one per field and value. Each item includes a type byte, the field name as a string, a four byte length for variable sized values, and the serialized value bytes. In a picture, it looks like this:

image1

Now let’s compare storage for the user’s name.

In flatUser, the storage table looks like this:

field-and-valueTypeField NameField LengthField DataTotal
name_first: “john”1 byte10 bytes4 bytes4 bytes19 bytes
name_last: “smith”1 byte9 bytes4 bytes5 bytes19 bytes
name_middle: “oliver”1 byte11 bytes4 bytes6 bytes22 bytes

Summing the totals, the flat approach spends 60 bytes for the name field and value.

For structuredUser, split the accounting into two tables. The first table is the nested document that holds the name. The second table is the field and value for the name itself.

First table, the value of the field name:

field-and-valueTypeField NameField LengthField DataTotal Size
first: “john”1 byte5 bytes4 bytes4 bytes14 bytes
last: “smith”1 byte4 bytes4 bytes5 bytes14 bytes
middle: “oliver”1 byte6 bytes4 bytes6 bytes17 bytes

Those entries add up to 45 bytes for the value of name. Now the second table:

field-and-valueTypeField NameField LengthField DataTotal Size
name: { … }1 byte4 bytes4 bytes45 bytes54 bytes

Together, the structured approach uses 54 bytes for the user name.

The big gap comes from the field name bytes. The flat design spends 30 bytes on field names, while the structured design spends 19 bytes. The repeated substring “name_” in the flat fields drives the extra cost.

When these two full documents are stored in MongoDB, the flat version is 403 bytes and the structured version is 307 bytes. That is about a 24 percent space reduction with only a schema refactor, and the structured document is also easier to read.

Next, consider traversal speed for a lookup like the work address zip code.

In flatUser, reaching address_work_zip from the document start requires 12 field name comparisons.

In structuredUser, reaching address.work.zip takes 8 comparisons. The reduction happens because some values are documents. When the cursor reads a field like name, it can skip any nested fields that clearly cannot contain address.work.zip. The same idea applies when the cursor reads address.home and can skip street, number, zip, state, and country within that branch.

To measure the effect, we ran a focused test with this setup:

  • The MongoDB instance used in-memory storage to isolate document traversal.
  • Flat schemas used documents with 10, 25, 50, and 100 fields.
  • Structured schemas used 2×5, 5×5, 10×5, and 20×5 layouts, where 2×5 means two document fields with five fields each.
  • Each collection contained 10,000 documents generated with faker/npm.
  • Queries searched for a field and value that did not exist, which forced a full scan of every document and field.
  • Each query was run 100 times for every document size and schema.
  • No concurrent workload ran during the tests.

Results:

DocumentsFlatStructuredDifferenceImprovement
10 / 2×5487 ms376 ms111 ms29,5%
25 / 5×5624 ms434 ms190 ms43,8%
50 / 10×5915 ms617 ms298 ms48,3%
100 / 20×51384 ms891 ms493 ms55,4%

As expected, structured documents were faster to traverse in this scenario. Keep in mind that gains vary with how you nest and organize fields.

This walkthrough showed how to get more from your MongoDB deployment by reshaping the schema while keeping the same information. You can also apply common MongoDB schema patterns to decide what belongs in each document. The article Building with Patterns covers widely used approaches and is a strong next step.

All test code is available in the GitHub repository.

Scroll to Top