Blog

200 million MongoDB document inserts per hour with Kubernetes

Wednesday, 21 April 2021
Comfone Blog

The word ‘Kubernetes‘ is creating a buzz throughout the world of technology right now.

It seems that everyone is interested in discussing the theory and concept, but we had not heard of anyone else putting any theory into practice.

That is, until now.

Here at Comfone, we asked ourselves the following question:

Is Kubernetes a viable solution for running high-performance MongoDB use cases?

The answer, in short, is yes.

We have proved that 200 million MongoDB document inserts and reads per hour on Kubernetes is possible.

In this blog, we will describe how we got this project off the ground and discuss how Kubernetes performs in such high- volume and high- performance environments.

First of all, let’s explain some technical jargon…

What is Kubernetes?

Kubernetes is an open-source container-orchestration system for automating computer application deployment, scaling, and management.It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation.

What is MongoDB?

MongoDB is a document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents, unlike the traditional tables.

The name mongo derives from the word humongous. This is very fitting for our use case as we need to be able to handle billions of documents per day!

The scaling of the MongoDB is done via MongoDB’s sharding concept, dividing the data in manageable chunks and distributing them evenly based on a predefined shardkey across all shards.

What did we achieve?

Switching to a Kubernetes environment has allowed us to save time, create more memory and have more power to future-proof our business.

How did we do this?

Let’s start by setting the scene…

The Comfone use case

Comfone provides roaming services. In essence, we ensure that a mobile phone remains connected to a mobile network whether the subscriber is at home or abroad.

In order for a subscriber to use a visited network (the network of a mobile operator in the visited country) while abroad, both the home and visited networks need to exchange various types of information. To do this, multiple interconnections need to exist between the home and visited network.

Comfone enables all interconnections between the two mobile network operators to ensure the necessary information can be exchanged.

There are over 900 mobile network operators worldwide and they all need to be connected to one another in order to offer roaming services and a maximum coverage to their subscribers.

Gathering roaming data

The Signal Statistics Generator (SSG) gathers subscriber roaming data from locations around the world on SIGTRAN, a mobile network. It then aggregates this data into a MongoDB database in a central location and performs analytics.

The gathered data records are called TDRs (Transaction Data Records) from various roaming related protocols.

What happens with that data?

The data is accessible for the customer through Comfone’s business portal, Pulse.

The gathered records are conveniently displayed in Comfone’s household apps such as Hawkeye, enabling customers to filter per country, send welcome-SMS and much more!

If you are interested in finding out more about how we use TDRs, check out this blog post…

The compute and MongoDB environment

An enormous volume of roaming-related data (Packet level TCP dumps) is generated and aggregated at a high velocity resulting in more than 200,000,000 MongoDB document inserts per hour.

In order to handle the volume and speed of these transactions, we need highly redundant shared MongoDB databases. The previous environment that we set up at Comfone included 16 MongoDB shards running across 52 very large Virtual Machines, each with 1TB of storage.

Each shard consists of 3 members (Primary – Secondary – Secondary), where the secondaries are used as replication and ready to take over the role of the primary if needed.

A cluster needs at least one router (entry point for the application to the DB) and a configuration replicaset which holds the information on which shards the chunks (actual data records) are.

Switching to a Kubernetes compute environment

We switched to a Kubernetes compute environment using Platform9 Managed Kubernetes. We set up 12 nodes Kubernetes clusters, running 3 masters with calico as CNI, a Kubernetes network solution. This enabled us to have the whole configuration of the cluster conveniently in YAML files, which are very readable configuration files.

What are the advantages?

From an operational perspective, this saves us time and provides us with more memory and power.

Setting up a new cluster of this size (16 Shards) in a traditional VM-setup would mean having to stage at least 52 virtual machines for a minimal setup with one router (48 Nodes, 3 configDb & one router).

Saving time

This would take several days with our old setup, as every VM must be staged and configured via ansible playbooks. With the YAML file configuration, setting up a new cluster is as easy as copying the already existing YAML files into a new folder in the git-project, changing some variables and deploying!

Now, we can setup MongoDB-clusters in less than an hour, as the only thing besides changing the YAML files is creating the volumes in our storage system.

Managing memory

Tasks like adding more memory also become easier, as our setup offers multiple ways to manage our resource consumption. We now have the option on container level to limit the memory and CPU usage.

Adding more memory to the cluster can be achieved by adding an additional worker or add memory to the existing workers. Compared to reconfiguring and restarting 48 VMs this is way speedier.

Distributing containers

Kubernetes also handles the distribution of the containers automatically. However, for our MongoDB cluster we worked with Kubernetes-taints to “bind” a container to a location or worker, as Kubernetes doesn’t know how MongoDB works and balances solely based on resource consumption. This was our way of closing this gap.

Results of the high performance MongoDB test with Kubernetes

We were able to successfully demonstrate that Kubernetes is a viable solution for running high-performance MongoDB use cases at Comfone.

The “performance/insert test” was a success and reached 200 Million documents inserted into the DB per hour running for weeks without any manual interaction from the operations teams.

We take pride in adapting new technologies and staying up to date with the market.

The launch of 5G and the rapidly growing IOT-Market (including devices that have a SIM card) make a scalable platform like Kubernetes a must- have.

Using Kubernetes enables Comfone to scale fast and on demand in an ever-changing market landscape and brings high quality services to our customers.

Rated 0 out of 5

About the author

Alessandro Catale

Alessandro was a Data Engineer at Comfone.

Pulse Login

Blog

Categories

Tags

200 million MongoDB document inserts per hour with Kubernetes

What is Kubernetes?

What is MongoDB?

What did we achieve?

How did we do this?

The Comfone use case

Gathering roaming data

What happens with that data?

The compute and MongoDB environment

Switching to a Kubernetes compute environment

What are the advantages?

Saving time

Managing memory

Distributing containers

Results of the high performance MongoDB test with Kubernetes

About the author

Alessandro Catale

Recent Posts

A Milestone for 5G SA Roaming: First movers and real devices

5G SA: What’s taking so long?

Securing your IoT Business: Protecting Networks from Signalling Storms

5G SA: How to strike the right balance between security measures and business objectives

The importance of Cybersecurity in 5G

Transforming global connectivity in the age of IoT and AI

Services

Comfone

Contact

Enter a search term

Pulse Login