Where Should an Analytics Database Sit and What Data is Allowed?

Shaun Chaudhary
The BetterCloud Tech Blog
2 min readNov 2, 2018

--

This post serves as Part 2 to a recently published post Bridging the Gap Between Business Analytics & DevOps.

When we first started our analytics team three years ago, we decided to use Azure and Microsoft SQL Server to house our business analytics data. Even though our production database sits in Google Cloud Platform, we figured in the short term we could leave the security and database maintenance to our IT administrator to stay quick and lean.

In the last two years, our database became more tightly connected to our production environment as we started leveraging product data within the suite of analytics we provide for our company. This caused headaches for both DevOps and our Analytics teams, and they had to build creative solutions to securely connect our Azure environment to the BetterCloud production environment so we could access and store that data. To better secure and connect our data, however, we eventually decided to migrate our 1 TB database from Azure SQL Server back into MySQL GCP.

Because of the company’s daily reliance on dashboards that the analytics team has built, downtime during this migration needed to be minimal to nonexistent. This involved a coordinated effort across Analytics/DevOps/Security to ensure that teams were moving together throughout the migration. Security rebuilt the zoning and architecture of the key components, ensuring that customer and business data was treated with care.

The tricky part was developing a security zoning schema that allowed all the necessary databases to communicate with each other. We use Tableau for our dashboarding and have our own Tableau server on a VM that we migrated to GCP, so we needed to devise a way to expose that endpoint to our company while creating a closed connection to the data populating the Tableau dashboards.

Airflow is the key component connecting all the disparate systems in the above diagram. We chose this open source tool to manage our scripts and create Directed Acyclic Graphs to ensure specious data did not populate in our database.

If these tools look like the stack of a classic data engineering organization, that’s because it is! The mistake that we made was prioritizing speed and simplicity instead of security and scalability. We were a team of two people attempting to prove out the value of business intelligence and analytics at a small tech company. The lesson here is that DevOps should be building solutions that treat the analytics team as an engineering team. Isolated solutions will work in the short term, but in the long term, scalable solutions will help prevent tricky scenarios where entire databases need to be migrated without any downtime because of company reliance.

--

--