Mongo DB V1 Engine Architecture

My Mental Model of How MongoDB Works

So, all right. Let’s rewire everything. This is my mental model of how MongoDB is working.

Initial Version of MongoDB (MMAP)

In the initial version of MongoDB, whenever a user wants to enter data, let’s say the user has some data, then the user performs CRUD operations. The first operation is Create.

So, let’s say the user is performing a create operation. The user has some data that he wants to insert into the database. In MongoDB’s initial version, that was mmap. In that, the JSON was converted into Binary JSON, which is also known as BSON, and it was stored into a page.

Obviously, the page was of fixed size length. Along with the BSON data, there were records that were also inserted along with the data to identify the data itself — that this is this data.

Now, whenever new data is inserted, an ID for that inserted data is created, which is _id.

_id and Data Mapping

Now, that _id, internally, like this ObjectId is mapped with the offset of that data, like where this data is stored. Typically, it contains the file name plus the offset of the file. Internally, it is used to find the data.

So, let’s say whenever the user wants to read the data, the _id is used to find the data.

Now, this _id is indexed. It contains, like, the first part in the _id which is a timestamp. So, it is easily sorted, and we can easily find _id-based data in the database itself.

And that is how Create and Read work.

Delete Operation

Similarly, if you want to update or delete the data — let’s say I delete the data — then in that page, like in the particular page where the _id is stored, the record of that BSON is deleted, and there is empty fragmentation there.

Now, MongoDB readjusts — like it does not readjust, my bad — it does not readjust, but that place is left empty.

Now, whenever another record is to be inserted, it will insert into that fragmented area. So, that’s how Delete is working.

Update Operation

And for the Updation part, we may need to readjust the data.

And obviously, all of this will work by using the _id as a filtration.

So, that’s how CRUD operations worked in the initial version. The database engine — or it is search engine — was mapped.

Engine Change and Concurrency

Later, they used the wired engine — I don’t recall the exact correct name, it was something like Wired Engine or whatever.

So, the initial version had a problem with concurrency. Like, it was applying collection-level locking, I guess. So, you cannot have two simultaneous accesses on multiple documents.

Then, in the later version, after version 3.0, they changed the engine and supported document-level locking instead of collection-level locking.

Durability Improvements

Also, in the initial version, the durability was not good. I mean, there was durability like simple logging or whatever, but it was not good.

But after version 3.0, when the engine changed, they introduced Write-Ahead Logging (WAL), which is durable.

So, these two were the main points, and indexing got better as well.

So, this was how it was working before and how it worked after.

That’s it, I guess.

Additional Features (Later Versions)

Then additional features were introduced in the later versions, like joining and whatever, like aggregation pipelines and all. But this is not a concern right now.

SQL vs NoSQL (Basic Difference)

Basically, the difference between SQL and NoSQL is that:

In SQL, you need to handle the data in row-column format, like in a tabular format.
In NoSQL, the schema is flexible.
In SQL, the schema is rigid.

In NoSQL, you just have to put the JSON data that is flexible. In JSON format, it will put the data, convert it into binary format (Binary JSON), and store it in the database.

Compression in Later Versions

In the later versions, like after version 3.0, it also compressed the data.

So, in a single page, you can have multiple different documents. Like before compression, it was storing three documents in a page. And after compression, it can store six to seven documents in a page.

That was a large difference as well.

Transactions and ACID

Another difference is that SQL supports transaction support.

Although in the later version, NoSQL — like MongoDB — also supported the transaction option as well.

But in the initial version, there was no transaction management, like no commit and all. No ACID property, etc.

And there was no joining function as well in the initial version of MongoDB. But SQL has supported it from time to time.

That’s it.

Scaling and Use Case Preference

Mainly, the preference is that:

When you want to scale the database horizontally, you go for NoSQL.
When you want to scale the data vertically, you go for SQL.

Why?

Because NoSQLs are read-heavy. They are read-heavy operations. Writing is not their main use case.

But SQL can be good for write-heavy workloads.

NoSQL is not good for write-heavy; it is good for read-heavy.

That’s all that is the difference between them.

And this is what I understood today.

MongoDB MMAP-V1 Architecture