Tableau releases new versions (major releases: e.g. 2021.3) of its products four times per year. For each version, they then create maintenance releases (minor releases: e.g. 2021.3.1) for 24 months – the maintenance window. Key new features are well documented, but with each upgrade also come slight adjustments and small improvements.

While working with a client we stumbled on one of those small changes:

Storage used per site previously only showed storage used for ‘Live’ content. It now encompasses all content, including revisions.

How did we discover this? Keep reading and we’ll explain:

  • How to connect to the Tableau Server repository
  • How to see exactly how much space content uses per site, including a Custom SQL query for re-use
  • Where exactly is content stored on Tableau Server’s hard disk?

The challenge

We recently helped a client upgrade their Tableau Server from version 2018.3 to 2021.2. Following Tableau’s recommended preparation steps, we had recorded information from the different admin pages. When reviewing the Sites Overview (Server Admin page: Manage All Sites) after the upgrade, something strange showed up:

Before the upgrade the Business site used 2.0 GB of storage:

The same page shows 7.3 GB usage after the upgrade.

Without any changes to content, the space usage for the Business Site had gone up from 2.0 to 7.3 GB. How is that possible? The Site Admin pages can give us more detailed information about space usage. Let’s look at the view Stats for Space Usage (available to Site Admins):

So as a Server Administrator we see 7.3 GB in use, but a Site Admin sees 1.96 GB in use. So what is going on? How much space are we using for content?

Enter the repository

Tableau Server stores its own data in the Tableau Server Repository database. This PostgreSQL database can be accessed like any other database. The repository holds information about content, users, usage and much more.

So how to use the repository to answer our question? Full instructions can be found here. In summary:

  1. Enable repository access. The TSM user can enable database access by generating a password. Use the following command in the CLI: tsm data-access repository-access enable –repository-username readonly –repository-password <PASSWORD>
  2. Connect to the repository. The default port is 8060:

Use the following SQL query:

SELECT 
  sites.id AS id, 
  sites.name AS site, 
  CustomSQLQuery.location as location, 
  CustomSQLQuery.sum AS size, 
FROM 
  (
    SELECT 
      SUM(size), 
      r.site_id, 
      'flow versions' as location 
    FROM flow_versions r 
    WHERE r.content_key IS NOT NULL 
    GROUP BY r.site_id 
    UNION 
    SELECT 
      SUM(size), 
      r.site_id, 
      'workbook versions' as location 
    FROM workbook_versions r 
    WHERE r.content_key IS NOT NULL 
    GROUP BY r.site_id 
    UNION 
    SELECT 
      SUM(size), 
      r.site_id, 
      'datasource versions' as location 
    FROM datasource_versions r 
    WHERE r.content_key IS NOT NULL 
    GROUP BY r.site_id 
    UNION 
    SELECT 
      SUM(size), 
      r.site_id, 
      'flows' as location 
    FROM flows r 
    GROUP BY r.site_id 
    UNION 
    SELECT 
      SUM(size), 
      r.site_id, 
      'workbooks' as location 
    FROM workbooks r 
    GROUP BY r.site_id 
    UNION 
    SELECT 
      SUM(document_file_size + packaged_files_size), 
      r.site_id, 
      'flow drafts' as location 
    FROM flow_drafts r 
    GROUP BY r.site_id 
    UNION 
    SELECT 
      SUM(size), 
      d.site_id, 
      'datasources' as location 
    FROM datasources d 
    WHERE d.parent_workbook_id IS NULL 
    GROUP BY d.site_id
  ) CustomSQLQuery 
  INNER JOIN public.sites sites ON 
    CustomSQLQuery.site_id = sites.id

Running this query in a SQL client resulted in the following:

Looking at the Business Site:

  • workbooks + datasource = 1.9 GB. This is what the Site Admin page measures.
  • Site total = 7.3 GB. This is what the Server admin page measures.

We ran this query using a SQL client called dBeaver, but the same query can be used right inside Tableau.

Conclusion

  • Stats for Space Usage shows only current content.
  • In the Sites Overview, previous versions of Tableau Server counted only current content
  • In the Sites Overview, newer versions of Tableau Server count current content as well as revisions

Finally, where does content get stored on the hard disk?

Tableau Server content takes up space in two folders in ProgramDataTableauTableau Serverdatatabsvc

  • dataengine – actual content
  • pgsql – repository

Files are stored in a secure format, so they are not accessible from outside of the Tableau Server application.

In August this year Tableau announced an early end-of-maintenance for versions 2019.3 to 2020.3. These versions will not receive minor releases after November 11th. This announcement triggered some of our clients to accelerate their upgrade schedule, upgrading all their Tableau products to a newer version. And that’s a good thing! This way they make sure to be up-to-date on all security updates and users get to use the latest features including Relationships, Ask Data for Viewers and multiple map layers, just to name a few.

The above is just a quick example of something that came up during a Tableau Server upgrade project. After years of experience using our different partner tools, we still learn little things almost every day. Do let us know if this was interesting and if you’d like to see more content like this. And of course, feel free to get in touch regarding Tableau Server installation, maintenance, upgrades or training.