Now, I understand that these were, almost exclusively, set up by front end devs (nothing wrong with that), but the issue lies with just assuming that MongoDb is a Relational database. We all feel warm and cuddly about a Relational DB, its Master /Slave (Primary/Secondary) approach, and the fact that you write to a master (primary), and read from a slave (secondary). So Mongo gets set up in exactly the same way, and all is wonderful… up until we have squeezed that last ounce of performance out of it, and things still dont work well.
“Isn’t MongoDb one of the fastest most efficient Dbs on the Market? So why is my site dying?”
Lets look at why PSA is not a great idea
First off.. if you are using MongoDb on a site that you know for certain will never, in the lifetime of you , your kids and grandkids, require more than one copy of the data, on a single machine.. please ignore this part of the post, and carry on setting up as PSA… or PSS.
Ok.. so lets look at a typical scenario. It has to be said, that most issues surround the usage of memory/cache by wiredTiger (WT), Mongodbs default storage engine. WT is the main reason why MongoDb is fast…. blindingly fast.. if used properly.
WT will use the max of 50% of memory – 1Gb or 256Mb memory whichever is bigger. What it does with this is:
1. Load all indexes that it can into memory
2. Assign memory to connections. You should never have to have more that 5k connections. If you do, something is wrong with your
application.
3. Load as much of recently used data into memory as it can, with what is left over.
The remaining memory, less whatever the OS is taking up, will be used by MongoDb for dealing with query results (aggregation/sorting/etc) , and communicating to your application (this is important).
So lets assume you have gone large on this, and have a 128 Gb memory box. WT will take 63.5 Gb. This means that all your indexes will need to take up 63.5Gb as a maximum, preferably less. Any extra space will be used as data space from the most recently used data. MongoDB will itself use the remaining 63.5Gb.
You now check your usage, and see that indexes are taking no more than 23Gb. Thats great.. plenty of room for expansion, plenty of room for data... all is going swimmingly well. Phew!
Then one day, it all starts slowing up. The Primary keeps swapping out, and needs restarting, queries that were fine are now hitting the slow logs, and all manner of weird stuff is occurring. No real reason for this. You know data has increased, but you have done no physical changes to the setup, COLLSCANS are not happening, IDXSCANS are not happening. So what has changed?
The chances are that your indexes have increased, and the amount of data has increased as well (obviously) , and essentially nothing actually ‘fits’ anymore. This leads to WT having to swap out old data, then also swap out indexes that are least useful, in order to reread those that are used. This can be confirmed by looking at the stats, or, if you are using a valid logging system (Atlas, Datadog
etc etc) .. that shows this.
So what happens now?
In no particular order, I have seen the following, and none of these provide a permanent fix for the situation...
Add to the cache
More memory can be taken from the OS, and added to the actual memory used by the mongod. This enables the mongod (data) unit to use more memory, but reduces the amount available to the Mongo unit for sorts, and pipelines.
It is a valid ’fix’, but will only be temporary until the data increases some more... when we are back to square one.
It is a valid ’fix’, but will only be temporary until the data increases some more... when we are back to square one.
Increase the machine size
If you are in the cloud, this is relatively simple, if not, it involves provisioning new hardware, either on premise or through your hosting service, in which case, it is not a ‘quick fix’. It also buys you time only, at extra cost.
Neither actually solve the issue of a growing business. Sure, you now have extra memory, and probably extra disk space, but you have not actually solved the issue for the long term.
Split the data into different databases
This comes form the mistaken idea that some client data being accessed more than others will reduce the index usage, and data usage, allowing your more important data to stay in memory, reducing the actual memory usage.
What this actually does is slow the loading of indexes and data, by not loading into cache until the database is used. However, typically, all your databases will be used at some point by the system (if they aren't... why are they there?) .
Next, what this does is increase your filehandle usage at the OS level. Each index is opened and read... and the filehandle is kept open so that the refreshed data can be read in quicker, when changes are made. So for every index there is an open file.
At the OS level this can be bad also.
Lets say you have, to keep it simple, one database, with 10 collections, and 15 clients. Each collection has 8 indexes. So for this one database you have 80 index files open. You then split that to one database per client. Now we have 15 versions of the above, so 80*15 or 1200 index files open.
This is a small example. However, if you look at this Mongo Jira ticket I found, you will see that some people struggle because they have, for example, 280 Databases, 47,891 Collections and 270,078 Indexes.
Generally there are 65000 max file handles allowed.. so you can see the issue. With 1 database this user would have, 1 database, 171 collections and 965 indexes, or similar. A whole different story.
Split the collections into multiple collections
This is similar to the effect of splitting into multiple databases. I have seen this on more than one occasion with stats type data. Due to issues with the database, the collections were split into ‘daily’ or ‘weekly’ collections.
All this does is increase the complexity, multiply the number of indexes used by 7, 14.. whatever, and leave you with a collection management issue, with no real gain.
Further, this leads to having monthly / yearly aggregation collections, as its no longer possible to use a single collection to get this data. So the storage and index usage increases further, which is exactly what we are trying to reduce.
Multiple Clusters
Finally, because all of the above have failed to lead to a stable system, the decision is made to have multiple clusters, with sets of data completely separated from each other.
This solves the issue, as essentially the data is being split completely, similar to each cluster representing all the data on it as though it was the total amount of data.
However, this is also an issue elsewhere. What you now need to do is execute queries against the correct cluster for that dataset, miss out on the ability to have a central source for all your companies data, and a need to query more than one cluster to get a single result across the company.
Think of a simple question like:
How many clients do we have?
How many clients do we have?
What was the client with the highest actvity last week?
Add to this, that the pattern this leads to is having multiple clusters, all doing the same thing, but with added maintenance and complexity for your sysadmins to deal with... multiple backups.. all manner of costs for no particular gain.
Ok, so how do we solve this from the get go?
What we should do is set MongoDb up as a single sharded cluster. This involves the following units, and please bare with me.
Mongod : Data/Shard nodes.
Arbiter : At least one arbiter.
Mongos : Query Nodes... these are called by the application, and call the data nodes, and config servers, directly... see below.
Config server(s) : Ideally a cluster of 3 for larger systems, if small you could get away with 1 or 2... but what happens if they die?
MongoDB single sharded cluster |
Arbiters
These do what they always do, keep the primaries and secondaries in order, and help Mongodb work out which shard server should be a primary.
Config Servers
These hold metadata, and the distribution of the shards across the shard servers .. your current data nodes.
Shard/Data nodes
These hold your actual data. For a PSA / PSS setup, you will have one set of these, which can also be replicated.
Mongos nodes
These issue the queries. They take information from the config servers, send the ‘data gathering’ part of the query to the relevant shard nodes, and deal with any aggregations, sorting and extra work locally to the Mongos.
Importantly, the Mongos can reside on the same machine as your application/web servers, if they are big enough. No data on these, only memory requirements.
Extra work required.
Each collection will need a ‘shard’ key, which should be as unique as possible, and NOT the primary key. Further, the major use indexes should have the shard key as the first field in the index... it just makes life easier for data retrieval.
What does this achieve?
Separation of Concerns
From the outset, we have split the workload on the data node(s) from performing data retrieval and aggregation, to just performing the data retrieval, with any very basic aggregation and/or sorting for data on the relevant node.
The Mongos nodes perform any aggregations/sorting locally to itself , once it has received all the data from the data node(s).
This means that the data nodes are now limited to just data retrieval, allowing more memory for this task.
Further it means that you have a choice. If the application slows up due to more users, or more aggregations, then you can simply add more load balanced mongos units.
If your data is now the cause of any issues, you simply add more shard nodes, splitting the data down again. This is ‘horizontal sharding’ at its best.
Separation of Concerns and Expand your Cluster |
Expand your Cluster where it is needed
As your data grows, and the system starts to become overused, you simply add more shard/data nodes.
For each new shard node, Mongodb will automatically split your data across the new nodes. So, if it is in a single shard (PSA) mode, adding a new data node will halve the data on the current node (effectively doubling the available memory), more than likely giving you the space you need, without increasing the individual node size.
Remember, if you have your shard servers as a replica set, you will need to add a server per member, so if you have a Primary and a Secondary, you will need to add 2+ new servers, so the new shards work as Primary and Secondary also.
If your user base grows, or more time is taken with aggregations, you add more load balanced Mongos units, as mentioned above.
This setup actually solves all the issues mentioned in the first part of this blog. If you have split your databases by client, then you should really put them back into one, with the client field as a part of each collection.
You wanted Speed?
Well, now you can have it. The Primaries can all now be relatively smaller units, and can be set up as ‘In Memory’ databases.
This depends on the total size of your data and indexes, which should now be a lot smaller.
Having done this with a 15-20 node cluster in the past, this made a huge difference to performance. Ensuring that all your data and indexes.. per node... fit into the nodes memory on the Primaries, increased the speed of throughput on writes. We were also able to add ‘important quick queries’ back to the primaries, to make them even faster.