A Database Migration Story: 99% Cost Reduction and 100% Hindsight

Shaun Chaudhary
The BetterCloud Tech Blog
5 min readApr 29, 2019

--

Background

Recently, we closed the book on a lengthy migration of our business intelligence database from Microsoft Azure SQL Server → Google BigQuery. This post attempts to be a retro of the process: We will discuss what happened, where we encountered issues, and what we learned overall.

I joined BetterCloud three years ago as the second person on a then two-person data team. Our mission statement was the following: Create a centralized database where we could improve efficiencies and disseminate insights across every vertical in the business. The wrinkle was in our company layout: Product, Engineering, and Platform Services lived in Atlanta, Georgia, while the rest of the company (Sales, Marketing, Support, Finance, and IT) resided in New York City. We lacked direct support from our technology team in Atlanta, so we needed to pick a solution that allowed us to be nimble, but could be supported by our one-person IT team in NYC. For these reasons, we initially chose Microsoft Azure SQL Server.

Azure Inception

For the next two years, we grew our Azure database to almost 1 terabyte of data. We would remote desktop into our Windows Virtual Machine (VM) and execute ad-hoc queries. We went from purchasing a desktop computer so I could run Python scripts overnight (yikes) to deploying Airflow on a production environment VM that communicated with Azure. Overall, our scale and sophistication grew tremendously in the span of two years — to the point where we needed to reassess how our current infrastructure was set up.

Migration Attempt #1: Azure SQL Server → MySQL

The data team was growing quickly, and a singular IT person could not manage the vast infrastructure we were growing at BetterCloud. Additionally, costs began spiraling out of control, with an average bill of over $20K per month to host our database in Azure. We needed an alternative solution to our current setup, so we decided to engage directly with our technology team in Atlanta. Their first requirement was that we needed to be on a Google Cloud Platform (GCP) product for data co-location and increased security.

We decided to migrate to MySQL in GCP because we had an experienced DBA who could do the fine-tuning for views, schemas, and configurations. We spun up MySQL and began migrating the 1 TB SQL database via Python (yikes).

When we were almost finished, we started to run our Python scripts keying off the new database but noticed that runtime had increased by 4–5x despite extensive fine-tuning from our DBA. What was going on?

What we didn’t realize was that Azure was using automatic tuning to create indexed views based on our current queries to the database. The size of the database was so large in Azure because of all the duplicated, reindexed tables/views that Azure ML was creating under the hood (the actual size of the database was 250 GB but automatic tuning increased it to 1 TB). Despite our best efforts and months of planning and execution, the MySQL solution was dead on arrival. We could not achieve minimum parity using MySQL compared to our current setup. Back to the drawing board.

Migration Attempt #2: Azure SQL Server → Google BigQuery

We decided we needed to look into OLAP (online analytical processing) solutions. If we wanted to remain on Google Cloud Platform, Google BigQuery was an obvious choice. Our team had some initial aversion to using BigQuery. The classic UI was clunky and I had some less than stellar experiences with its functionality. Our DBA, however, was excited about using another Google tool. Our technology team in Atlanta was concurrently deepening our integration with Google products (Google Cloud SQL, Google Cloud Storage, etc.). BigQuery would fit nicely into this pipeline/workflow, so we decided to give it a shot.

The big difference in our migration the second time around was that we did a very detailed PoC (proof of concept). We identified a handful of our largest tables and most expensive queries and attempted to replicate part of our workflow with BigQuery. Not only did we achieve parity, but we exceeded our baseline tests and noticed huge improvements in runtime! The next steps were fully migrating the database to BigQuery and redirecting all Python scripts to use it as the source of truth. Here is a detailed guide of what needed to happen during each migration:

  1. Recreate schema in destination database
  2. Ensure data type encoding translated correctly from source → destination database
  3. Write/execute code in Python that would pull data from source database and write to destination database
  4. Create valid connection string and verify successful connection from Python to new destination database
  5. Have Python scripts write to both source and destination databases until we could confirm parity had been achieved
  6. Ensure that permissioning in destination database allowed users to access the same data without roadblocks
  7. Confirm that the new destination database could be appropriately managed/optimized by Platform Services (i.e. using table partitioning where necessary in BigQuery)

We had to do this twice! This migration took nine months and had many moments of frustration, joy, and exhaustion. We have been on the new setup for three months and it is easy to say it was all worth it.

We went from a $20K → $200 monthly bill after the migration was complete. That is a 99% reduction in cost simply by migrating from Azure SQL Server → BigQuery.

Key Takeaways

  • Be Agile. You can roadmap all you want, but until you partially execute your plan and are operating in the new environment, you will never know the negative externalities
  • Free can have big costs. Open source solutions are cheaper, but managed solutions abstract away much of the complexities and ongoing maintenance (i.e. OLAP is preferred if you are operating in dynamic environments)
  • Measure twice, cut once. Understand your existing queries and views, and check if your new solution can accommodate those requirements
  • Do your homework. Investigate the full feature set of a proposed solution before pulling the trigger (i.e. accounting for products like BigQuery BI Engine)

Stay tuned for Part Two of this blog post where my colleague Michael Matmon dives into how we migrated our Tableau Server from Windows VM → Linux GCP.

--

--