Technological Innovation today is taking shape in different ways and at different paces around the world very rapidly. To cope up with this fast pace and growing trends, businesses do have opportunities to be innovative for a sustainable growth.
The next-generation cloud applications require huge data volumes with rapid data velocity, and Cassandra has emerged as a ground-breaking database framework that works extremely well in dealing with large volumes of records spread over different servers.
Why did Yagna choose Cassandra database?
Yagna has an installed base monetization and sales acceleration platform and we use Cassandra as our core database component. This blog is an effort to outline our database journey and selection of Cassandra as our preferred database.
Yagna, since it’s inception, has been an RDBMS house – MySQL. MySQL was open source, free and was relatively easier to install and maintain hence it was a natural choice during the early years of Yagna. It was our platform of choice and it worked reasonably well during the early years of Yagna.
The summer of 2017 brought rapid growth of our user base and our systems were put under extreme stress due to a sudden spurt of traffic. We noticed that while our App servers could scale reasonably well, our database could not scale that well to the growing demands of our product. We tried tweaking, indexing, sharding and various other approaches on our RDBMS but that did not bear much fruit.
Based on the challenges we faced around database scaling, we took the decision to go with a NoSQL database – Cassandra, as we felt that it was the right database for our unique set of requirements. Some of the challenges and decision factors are listed below:
- Scalability: We wanted our database to scale horizontally without developers writing code around scaling. We could not use MySQL solutions of partitioning (due to scaling up only within the single node), sharding (due to code changes involved) and read-replicas (due to higher writes in our case). One of the reasons we chose Cassandra was because of its auto-scaling features. If you think you are running out of capacity, just add an extra node on to your Cassandra cluster and you are all set for linear scaling. With Cassandra scaling becomes an operational/release activity rather than Dev cycles.
- Faster query response: With a growing number of transactions, we wanted faster query response time. We found that with each passing day and growing data tables sizes our existing database was taking more and more time to respond. One of the main problems we identified was the use of foreign keys (FK) extensively in our application. While FKs are purist’s thing, in real life, it does not work all that well for RDBMS faster query responses. Also, we noticed that joining the complex tables slows things down considerably. Our initial lab results found Cassandra to be many order faster than our RDBMS system. It was due to Cassandra’s inherent architecture and because of how it stores/retrieves data. Even though Cassandra does not support searches on non-key columns and joining of tables while querying; we still went with it because the performance was non-negotiable for us. Another thing we were losing with Cassandra was transaction management, for this we identified tables were we really cared for transactions (example order management related tables) and kept those tables (around 5% of all tables) in the existing database though we got rid of the foreign keys.
- Distributed database and reliability: Yagna is on-cloud platform and is deployed across geographies on AWS. We were running separate databases instances for each of the geographies with absolutely no run-time fault tolerance/availability for the database instances. These instances could not talk to each other and we ended up creating silo systems for each geography. With Cassandra, we could deploy an across the geography cluster which was not only fault-tolerant but was also highly available. This not only allowed us to deploy common services across geographies but also made the system lot more reliable with centralized monitoring capabilities. The automatic workload and data balancing ensured that the load was evenly distributed across different nodes on the clusters.
- Learning Curve for Developers: Yagna team had absolutely no experience in a NoSQL database like Cassandra. The fundamental schema designing was totally different from our experience in designing the RDBMS system where Normalization was the key focus. For the uninitiated, Normalization is a database technique of splitting a large table into smaller tables and defining relationships between these tables using Foreign Keys (FK) to increases the clarity and to reduce the redundancy of data. Cassandra preached exactly the opposite – de-normalize the data i.e. have a fewer relationship among the tables and repeat data where required. With RDBMS system we defined our schema first and then did all the application design while with Cassandra we had to do the application design first, and depending on required queries we had to design the final schema (Keyspaces). While RDBMS gives mighty flexibility of querying on any field while joining multiple tables, we found Cassandra to be pretty restrictive in that aspect, later we solved that problem by adding Solr searches on top of Cassandra. It was a little steeper learning curve for us from the development perspective but there were many advantages which allowed us to endure the pain of migration.
Overall it has been a very exciting journey moving to Cassandra, we have made our share of mistakes, learned while on the road and today our product is in a happy space – scaling the peaks quietly!