Implement server side pagination in mongodb with total count
As a developer you may have come across a situation where you are required to show some enormous amount of data to be displayed in the UI, in tabular format/grid without a drop in performance. If you are a developer who spend a decent amount of time coding, you might have guessed it already. Server Side Pagination! yeah you are right. It is the most optimum way to show large data.
Why is that?
Consider the following scenario. Say that your application sells products online and you have a product base with more than one million items, which are added to the db using a master form. It is obvious now the end user should have provision to delete and edit the already added products. To achieve this you should have to give them an interface, presumably a grid, where the user can browse the particular product he/she intent to edit/delete. Now think about a scenario where this interface is getting populated with all the 1 million products altogether. Messy. isn’t it? No one could ever agree with what you have given because it is messy, slow and inconvenient. So in these situations intellectuals suggest you to paginate the grids with proper page size and page numbers.
Yeah you are right again. You just need client side pagination to solve the said problem. Then why should we go for server side pagination?
The reason is performance. If you are using client side pagination, even if it is implemented using any third party library(like jquery datatable or kendo grid) or your custom logic, somewhere somehow the data has to be kept in the client side. There are 5 problems in this approach.
- High bandwidth is required since the data(1 million products we are talking about) has to travel through the internet all the way from the database to the client side.
- In case of relational databases, the data has to be mapped to a DTO before returning which costs a huge computation as well as memory consumption in a single go.
- High probability of the database server to timeout the request since it is needed to fetch a huge data, may be with filtering as well.
- If anything goes wrong in between, all the computation so far spent is wasted. Think about a situation where a single product which is at the 99999th position has some missing value and it could not be mapped in the DTO due to a mandatory validation, throws an exception. All the work so far done by the database engine, till the 99998th product goes in vain. All the effort put by the mapping method so far is also wasted. The end user gets nothing even after all the struggle by the system.
- Say in an hypothetical situation, all went fine, still the huge data has to be stored either in the DOM or the browser memory. I bet that much of data processing in the client side cause the browser to perform incorrigibly wrong
MongoDB implementation of server side pagination
There are a lot of articles and tutorials which cover this particular topic available in the internet. But most or all of them misses a very basic thing required for the processing of server side pagination, i.e:- they miss to return the Total Count of data that we fetch. Yeah lot of them doesn’t cover that. So what is the problem in that?
Okay let me explain. The things we need to send as parameter to the server from the front-end are page-size(how many data to be shown in a single page) and page-number(starting from zero). Without the server sending the total count of data along with the first page’s result, we never can determine how many pages need to be shown in the UI. What does it mean? It means you never can let the user jump from page-1 to page-3 or page-3 to page-n. In this case we can only give the user to go to the next page or the previous one. Hence, if the user need to go to the 30th page, he/she has to go through 1 to 29 first. I don’t think it would be a good idea to give demo of this to any client especially to those who irks for everything.
We will use the same products example which I mentioned earlier. I will keep this simple as possible so that we all will be in the same page going forward. In step by step we will try to implement a mongodb aggregation pipeline to return the products in server side manner with an optional filter of product-category.
This is the basic structure of all the products which is in the Products collection. Assume that this collection has more than 1 million such products in different categories.
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf5"),
"Name" : "Product-1",
"Price" : 25.3,
"Category" : "Food"
}
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf6"),
"Name" : "Product-1",
"Price" : 263.0,
"Category" : "Non-Food"
}
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf7"),
"Name" : "Product-1",
"Price" : 159.0,
"Category" : "Stationary"
}
Lets start from the basics. First we will try to get all the products in a single go along with the optional filter parameter, which is product category. For this we only need to use the $match stage in the pipeline
//in real world these are parameters from the code
var category = null; //the optional filter for categorydb.getCollection('Products').aggregate
([
{
$match:
{
$or:
[
{ null:category },
{ "Category":category }
]
}
}
])
In the above code, the $match stage matches all the products under a specified category if any category is mentioned as the filter(if category has value) and returns all the products under all the categories if no category is mentioned as the filter(if category is null)
In the $match stage, value at the left hand side of a match condition is always considered as property path of document. So if you change { null:category } to { category : null} our expectation wont meet
So we know how to get all the data at a single go even if it has any filter applied or not. Now we can do server side pagination in the above code. For this the aggregate pipeline gives us two stages:- $skip and $limit
//in real world these are parameters from the code
var category = null; //the optional filter for category
var pageSize = 20;
var pageNumber = 0;
db.getCollection('Products').aggregate
([
{
$match:
{
$or:
[
{ null:category },
{ "Category":category }
]
}
},
{
$skip: pageSize * pageNumber
},
{
$limit:pageSize
}
])
In addition to the previous code, we are passing the output from the $match stage to the next stage in the pipeline, i.e the $skip stage. The skip stage will remove the first n products from all the products it received as input, where n is the value given in the $skip stage. The next stage in the pipeline is $limit, which does exactly what it says, excepts the first m products others will be removed, where m is the value given in the $limit and the result will be passed to the next stage in the pipeline if any. In our case there are no more stages in the pipeline and hence returned out.
Example cases
pageNumber=0 & pageSize=20 will return the first 20 products
pageNumber=1 & pageSize=20 will return the next 20 products after skipping the first 20(pageNumber*pageSize)
pageNumber=25 & pageSize=20 will return the next 20 products after skipping the first 500(pageNumber*pageSize)
But still we could not able to return the total count of products along with the result which is needed for the grid libraries in order to render page numbers in the UI. From the below figure, it is plausible to give the end user an idea of how many pages are available in total for him\her to surf through.
So we will now try to get the total count of products along with the products itself. In earlier days, it was a practice by the developers to get the count and the data using two separate calls to the db because of some limitations mongodb had. This approach will end up in filtering and scanning through the huge data twice. Almost same computation will have to be used by the database engine to get both the results. However by the arrival of version 3.4 we can use the so sophisticated $facet stage in the pipeline.
Processes multiple aggregation pipelines within a single stage on the same set of input documents. Each sub-pipeline has its own field in the output document where its results are stored as an array of documents.
Basically what it says is that the input of a $facet stage can be passed through multiple separate pipelines and the results from each sub pipelines will be saved as a property of the $facet’s output document. In our case we can use the $facet as the first stage with two sub pipelines inside. One for getting the required data and the other to get the total count. Lets refactor the code we wrote so far.
//in real world these are parameters from the code
var category = null; //the optional filter for category
var pageSize = 20;
var pageNumber = 0;
db.getCollection('Products').aggregate
([
{
$facet:
{
"Products":
[
{
$match:
{
$or:
[
{ null:category },
{ "Category":category }
]
}
},
{
$skip: pageSize * pageNumber
},
{
$limit:pageSize
}
],
"Count":
[
{
$group:
{
_id:null,
"Total":{ $sum:1 }
}
}
]
}
}
])
If we execute the above code, the result will have a single document with two properties:- Products which contains the resultant data and Count which has the total count of actual data. In this case the count will be the total number of products which are in the db, which will be same for every page we return. This count can be used by the third party libraries or by your custom code to determine the number of pages to be rendered in the UI.
Intermediate Result
{
"Products" : [
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf5"),
"Name" : "Product-1",
"Price" : 25.3,
"Category" : "Food"
},
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf6"),
"Name" : "Product-1",
"Price" : 263.0,
"Category" : "Non-Food"
},
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf7"),
"Name" : "Product-1",
"Price" : 159.0,
"Category" : "Stationary"
}
],
"Count" : [
{
"_id" : null,
"Total" : 3.0
}
]
}
A little cleanup is required in the above result which is to flatten out the products and move the total property inside the products. We can use $unwind, $addFields and $replaceRoot
//in real world these are parameters from the code
var category = null; //the optional filter for category
var pageSize = 20;
var pageNumber = 0;
db.getCollection('Products').aggregate
([
{
$facet:
{
"Products":
[
{
$match:
{
$or:
[
{ null:category },
{ "Category":category }
]
}
},
{
$skip: pageSize * pageNumber
},
{
$limit:pageSize
}
],
"Count":
[
{
$group:
{
_id:null,
"Total":{ $sum:1 }
}
}
]
}
},
{
$unwind:"$Products"
},
{
$addFields:
{
"Products.Total": { $arrayElemAt:["$Count.Total",0]}
}
},
{
$replaceRoot:
{
newRoot:"$Products"
}
}
])
Final Result
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf5"),
"Name" : "Product-1",
"Price" : 25.3,
"Category" : "Food",
"Total" : 3.0
}
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf6"),
"Name" : "Product-1",
"Price" : 263.0,
"Category" : "Non-Food",
"Total" : 3.0
}
{
"_id" : ObjectId("5fb152f2bcc0ee5eb068ccf7"),
"Name" : "Product-1",
"Price" : 159.0,
"Category" : "Stationary",
"Total" : 3.0
}
Further performance enhancements
There are some pressure points you could see while implementing the above mongodb query in a real world application having humongous data. For instance, the $skip stage in the pipeline becomes so costly especially when end users try to navigate to farther pages. Either you could simply ignore it knowing how often the user might need to go to the 300th page or you could perfectly avoid the $skip stage by taking leverage on the natural ordering property of the _id field, which will be unique for each document by default. A nice implementation is found in this article by Arpit Bhayani. But please do note that, if you contemplate to use this approach, end users will lose their ability to jump from one page to some other pages which are not adjacent to the current one i.e.; the user has to go from 1 to 29 pages first, if he\she needs to go to the 30th page. So take this approach only if you don't plan that flexibility to be given.
In addition, mongodb provides a very efficient indexing mechanism that you can index the data in such a way that it can be used as an advantage for $match and $sort stages. However it is all depended on how you implement the pagination since the application is all yours.
Happy coding!