Should I use Mongo or SQL?

5 min read

5 years ago I co-founded iScout.com – a SAAS which now collects millions of safety reports and training completions. For weeks I struggled to decide between MongoDB and some flavor of SQL for our backend storage. I had spent a decade in MSSQL and a few years in Mongo for several side projects.

I scoured blogs looking for advice on which to choose. Some articles did a deep technical dive while others felt heavily biased. I never found a practical discussion that offered real pros and cons so I swore to myself I’d blog a balanced perspective someday. So here we are. This is my answer to, “Should I use Mongo or SQL”?

Disclaimers

First – I’m not a database expert. Working for a startup, I wear many hats – front end dev, back end dev, network engineer, bathroom mopper. I represent the knowledge level and experience of a full stack developer who has spent nearly a decade working in sql/no-sql as one of many tasks.

Second – my experience is with MongoDB and PostgreSQL which may not translate to other document/sql stores. Also, I’ve worked in C# for much of my career, but iScout was built in NodeJS. MongoDB probably lends itself more to JavaScript than languages like C#.

Third – I recognize it might make more sense to call this a comparison of Mongo and Postgres, or document stores and relational databases. But I’m trying to use phrasing that people are searching for.

Spoiler alert – iScout was built on MongoDB, but we are painstakingly moving it to PostgreSQL.

Does that make Postgres better than Mongo? Not exactly.

MongoDB is Easy and Flexible

It is sooo easy to get going with Mongo.

Need to save users in Mongo? Your code could be this simple:

const db = mongojs('mydb', ['users']);
await db.users.save({ username: 'josh', age: 37 });

It’s difficult to overstate how easy document stores like Mongo are. When you’re building a startup, speed and flexibility are critical. Need to store some complex data real quick? There’s no schema. Just save and move on.

In fact, a lot of people gripe that Mongo is too unstructured, but I say that’s mostly up to you. Mongoose is an extremely popular NodeJS package that lets you create schemas, validation, and indexes with ease. Relational data (like storing a reference to another document) is where things get tricky. It’s difficult to verify foreign key restraints and joins are almost non-existent.

Mongo is for… scale?

You may have also read that Mongo’s native sharding lets you store a ton of data. No doubt, you can dump a ton of data into mongo.

What you don’t read is that running moderately complex queries can be prohibitively slow.

If you need to throw TB’s of logs or other unstructured data into a database and run relatively rare or simple queries against that data – Mongo is your champion – even at scale. If your query needs to touch multiple large collections, you might be in trouble.

Mongo Stinks at Relational Data

Say you are storing users and orders. How do you store that in mongo?

Some say it all goes in a user document. That works but the document can get bulky, difficult to search, and annoying to update. Others say you should put orders in its own collection and store a reference to the user_id. Great – but Mongo doesn’t allow join queries.

Say you need all orders for users in Texas. There are three approaches:

  1. Two Queries • Store user_id in each order document. Fetch all the user_ids for people in Texas. Then search orders matching any of those user_ids.
  2. Aggregate • mongo’s version of a join. Basically this is a query that runs several steps directly on the mongo server.
  3. Partial Documents • Store part of the user in each order and store user in its own collection.

In my experience, option 1 is almost always best. Aggregates are powerful, but slow in my experience. Our codebase currently has 17 aggregates in 414k lines of code – so not worthless but not a silver bullet either. Some people swear by option 3, but for me personally – the idea of storing duplicated data makes me ill. Plus it’s nearly worthless if your queries are more dynamic.

In SQL, you’d store user and order separately, then use a simple join to write this query.

The part I struggle with is, “what data isn’t fundamentally relational?”

Postgres is the Swiss Army Knife of Databases

This is how someone described Postgres to me. They were right.

As a developer, I’ve learned to fear monolithic systems that “do it all”. Normally things that can do everything, do nothing well.

That’s not true for Postgres. Postgres does all the normal SQL database things, plus it has countless bonus features like first class support for JSONB data. In other words, you can store, query, and index complex JSON objects in Postgres!

Originally we stored form responses in Mongo, but two years ago the query performance became prohibitively slow – even with indexes. We spent months moving to Postgres using JSONB columns. That’s how we store millions of reports – and query performance is great. Our indexes are pretty huge though.

Recently we also rebuilt our queueing system off RabbitMQ. Since we were already using Postgres, we leveraged its NOTIFY feature and easily built a very simple queuing system. Postgres is amazing for relational, complex, dynamic data.

The Drawbacks of Postgres

SQL is fundamentally harder and slower. Despite the abundance of schema builders and auto-migrations, the reality remains. Adding a field in mongo is one line of code. Adding a field in Postgres is updating the schema and migrating a production database.

However… one thing I prefer about SQL is that it will fill in default values. In mongo, you have to do a manual update to old documents to add any default value. For example, if you add a deleted field then you can’t query {deleted:false} in mongo unless you manually backfill that data. Instead you have to query {deleted:{$ne:true}}.

Postgres row updates are slow – even if you drop indexes and constraints. There are workarounds with temp tables, VACUUM, etc but it’s an annoying hurdle.

The NodeJS packages for Postgres are limited. Sequelize is a popular ORM but it’s not my jam. PG is another popular package, but it basically just lets you execute raw SQL. Comparatively, mongo has countless npm packages available.

Postgres’ max table size is 32 TiB. At our current rate, we’ll hit that in 4,000 years so I think I’m good.

Real Life Mongo Problems

I mentioned we moved our reporting data to Postgres because of performance issues. To be honest, I always wondered if there was something else we could have done (aggregates, indexes, etc) to make Mongo more performant. I may have got that answer last year.

Last Spring we hired a certified Mongo expert to spend a week with our team to review security, performance, etc. We learned a lot – especially about an aggregate visualization tool in Compass.

Our most pressing question was regarding a training report that took almost 40 seconds for our largest customer. After two days of indexing/aggregates/restructuring it hadn’t improved. Ultimately the recommendation was to create a cached view that would be updated every 10 minutes or so. This was a non-starter given people often complete training and expect to see their status updated instantly.

Later we took the same data and put it in Postgres. That report now runs in 4 seconds.

Again – this is a problem with storing a large amount of data relationally in Mongo with high query demands. Mongo is not a relational database.

When would I use Mongo?

This may feel like a Mongo bash, but there are scenarios where I’d recommend it.

It comes down to using the right tool for the job.

Small projects. Query performance won’t be a problem if your collections stay relatively small. Every year our friends play a prop bets game while we watch the Superbowl. I lost the source code and had to rebuild it in just a few days. Dealing with SQL schemas would have been a huge waste – mongo saved the day.

Large data with single collection query needs. I also have a fantasy football app that has several hundred thousand teams each year. It’s a lot of data but there are only 3 collections: users, emails, and leagues. I basically never have to do any complex queries. Mongo works extremely well. Messing with SQL schemas would be pointless. Although Postgres’ JSONB columns would be a decent alternative.

If you are considering mongo for a large data set, spin up a quick server, throw in some fake data, and try the queries you’ll need. If I’ve learned anything about query performance in any database, it’s that you never know until you try it.

If you need a Mongo host, I highly recommend MongoDB Atlas. They are expensive but easily the most reliable service I’ve used.

If you need a Postgres host, I haven’t found one I love. We’ve used Compose and Heroku in the past. Heroku is ok, but very quirky and the web management tool is extremely thin – almost no monitoring. You have unusually limited access to your server so you can’t do things like setup a 3rd party replica. Digital Ocean has a brand new Postgres offering. AWS for PostgreSQL (or Aurora) will probably be our next try.

Leave a Reply

Your email address will not be published.