A Database Migration Story: Extracting Value from Tableau

Michael Matmon
The BetterCloud Tech Blog
5 min readMay 20, 2019

--

Tableau (NYSE: DATA) is a powerful business intelligence tool used for data visualization. What makes it so powerful is the ability to connect to multiple databases, the flexibility in creating interactive dashboards, and the functionality of a web application offering that can be accessed from anywhere. At BetterCloud, we consider Tableau to be best-in-class for data visualization and have deployed it across the company.

Our data team is ingesting information from internal systems like Salesforce and JIRA as well as the BetterCloud product itself. However, it can be hard to make sense of all this data. Therefore, we use Tableau for all front-end reporting, and it can be thought of as the analytical layer for our data. The reporting functionality offers features such as email subscriptions, alerting, user-specific permissioning, and filtering across a variety of dimensions.

Currently, there are about 50 workbooks with over 300 different views, accessible by more than 70 people. People rely on Tableau for critical business information and given this, performance is of utmost importance.

When we think of Tableau performance, two priorities come to mind: (1) data is not stale or out of date, and (2) visualizations load quickly. As part of our migration from Azure to Google Cloud Platform (GCP), improving overall performance of Tableau was top of mind.

Correcting mistakes from the past

Prior to migrating, our Azure Tableau server was a mess. When the team first started, we prioritized speed over scale to meet the needs of the business and surface insights. We created a 1:1 relationship with datasources and workbooks, duplicating datasources and creating an unnecessary load on the server. This setup also made it difficult to troubleshoot because we could not easily identify individual workbook/datasource relationships. We knew there was a better solution, but we didn’t have the time or bandwidth to deploy best practices before the problems compounded.

The migration presented a unique opportunity to correct our mistakes from the past and enforce best practices. Today there are no duplicated datasources, which reduces load on the server, makes it easier to troubleshoot, and removes any data disparity between workbooks. Knowing that we’re following best practices is important, but let’s dive in to what this really means.

There are three main practices we learned during the migration: (1) how to migrate datasources, (2) how to publish datasources independent of a workbook, and (3) how to update datasources which are used across multiple workbooks.

Best practice #1: Migrate & publish datasources (Live vs. Extracted connections)

First, we needed to repoint all of our workbooks from Azure to BigQuery by creating local copies of every datasource. Local copies “untie” the connection of a datasource to a database, allowing us to connect to datasources in BigQuery. In theory, this was the simplest step in the process, but due to the duplicated datasources mentioned earlier, it took longer than necessary.

Now that we were connected to BigQuery, the next step was publishing datasources to the new Tableau server. This is a critical component to the Tableau infrastructure, and getting this right could lead to a more efficient and performant server. This was also the area with the most room for improvement during the migration. We didn’t know how to do this the “right way” when we first started, which created unnecessary load on the old server.

In order to get it right, we needed to decide how we wanted to connect datasources to Tableau and the server. Tableau offers two options: a Live connection or an Extracted connection. A Live connection means data in Tableau will update when new data is available in the database. Live connections are beneficial because they capture real-time data; however, they can slow performance on the server.

Extracts, on the other hand, are snapshots of the datasource which update on a scheduled cadence. While the real-time benefits of Live connections are great, we almost exclusively use Extracts so we can optimize performance. Our extracts are scheduled to refresh either every two or four hours throughout the day, so there is no major data lag compared to a live, real-time connection.

Best practice #2: Publish datasources independent of a workbook

After following these steps, we had everything in place to build dashboards and then publish to the server. When publishing from Tableau Desktop to Tableau Server, both the workbook and the datasources need to be published. There are two publishing options: one that embeds the datasource into the workbook, and another where the two are published independently of one another.

Prior to the migration, if we wanted to use the same datasource in a new workbook, we would reconnect to the database, follow the steps listed above, and publish it again. This approach was flawed for two reasons though: It created unnecessary load on the server, and data was out of sync. The first flaw was obvious. More datasources living on the server meant more work was required to keep them refreshed and up to date. Additionally, since we were embedding datasources, one might refresh before another copy, meaning two workbooks connected to the same datasource would show different outputs. This put data integrity into question.

We wanted to fix these issues as part of the migration, so the first thing we figured out was how to publish datasources independent of workbooks. Today when a datasource refreshes, the updated information is pushed to every workbook that it is connected to, and there is no longer a syncing problem. This also reduced load on the server, which in turn allowed us to increase how often our datasources would refresh, getting us even closer to a real-time connection.

Best practice #3: Update datasources across multiple workbooks

The final benefit from this new approach was related to updating datasources. Previously, we would need to go into individual workbooks to make updates because datasources were embedded. If we wanted to update the logic of one field, we would need to repeat this process several times in the other workbooks the data was connected to. Furthermore, we would create copies of logic fields used in Tableau because we were unable to update them after they were published. This made it extremely difficult to know which field was the most recently updated one to use, which could jeopardize reporting accuracy.

Now that we have datasources independent of workbooks, there is a much more sensible way to correct these problems. We learned how to make an update to a datasource that would populate across all workbooks connected to the same source, helping avoid duplication of efforts. Moreover, we can make updates to the logic fields added in Tableau, and they will also update across every connected workbook.

This is especially beneficial when, for example, we input things like sales goals into Tableau. Since these goals did not exist in the underlying datasource, we needed to write logic to create them. Since goals occasionally change or are extended to future periods, it was critical we had a way to make one update that would populate to all workbooks, which we can accomplish now.

Overall, the benefits from these improved processes are immense. They’ve allowed us to clean up our infrastructure, improve server performance, and ensure data accuracy and integrity.

For an in-depth walkthrough of these best practices, you can review this deck.

--

--