Skip to main content
All CollectionsData guideBest Practices
Best Practices: Data Sources
Best Practices: Data Sources

Gtmhub best practices for connecting and using Data Sources

Neli Ivanova avatar
Written by Neli Ivanova
Updated over 3 years ago

Overview

This article contains the Gtmhub best practices for connecting data sources to Gtmhub, including virtual data sources.

Connecting Data Sources

  • Only connect the data sources that you actually need

  • When initially testing a connection, test on a known smaller table (Jira Projects in Jira, for example)

  • For larger systems, consider using custom instead of standard entities if you only need a few specific columns of information

  • When connecting to databases, we recommended using a view instead of directly connecting to tables, especially if you need bits of data from multiple different tables

  • When connecting a data source, it is highly recommended to use a service account. This will prevent the data source from needing to be re-authenticated if someone leaves an organization or loses access to the system

    • This also provides another way to limit data coming into Gtmhub to only the data needed by restricting the service account’s access to the system

Data Source Naming

  • After selecting a table to connect, you will be asked for a name for the data source and the table name will be suggested. We recommend prepending the name of the data connector (I.E. Jira, Hubspot, SalesForce, etc.) to the suggested name

    • The SQL table name will be auto-generated from this display name replacing spaces with underscores and changing all characters to lower case

    • This will make it easier to discern in SQL which data source the table is coming from

    • For longer source names, an abbreviation is OK (“SF” for SalesForce, for example), just be consistent for all data sources within that data connector

❌Avoid This

ApplicationRoles

✔️Do This

Jira Application Roles
  • If you have connected multiple instances under the same connector (I.E. multiple Jira instances) adjust your naming to account for which instance

    • “Jira Production Application Roles” and “Jira Dev Application Roles” for example, when connecting to production and dev Jira instances

  • When creating a filtered data source, the name should indicate what filtering is happening

    • “Jira Issues CD and RA Projects” for example if filtering to the CD and RA projects

Sync Period

  • Syncing data can be a very intensive process for the source systems with potential performance impacts for end users. Limit syncs to daily for larger systems.

    • Using hourly for large syncs may cause performance issues, especially on the source system if it is underpowered

    • Remember that syncs can be triggered manually if needed from the Data Sources page if a one-off more frequent sync is needed

  • When connecting large data sources for testing and not actually using the data yet, set the sync to manual until ready

Schema Changes

  • If a data source is failing to sync due to data type mismatches, a schema changes run is most likely required

    • This is usually caused by a configuration change in the source system

Virtual Data Sources

  • Virtual data sources should follow all SQL Best Practices except for the 5000 result limit, which does not apply to virtual data sources

  • Virtual data sources should be named using the rules listed above

    • If the virtual data source uses data from only one system, use that system’s name in the virtual data source name and key

    • If the virtual data source merges data from multiple systems, it is not necessary to include all source names, but it should be clear that it is merged data

  • It is OK to use virtual data sources to create filtered smaller tables of data from large data sources. This may help certain insight boards perform better that reference larger data sources.

  • If a table exists in the data connector, do not use a virtual data source to replicate that table

    • For example, if you need a list of Jira Issue statuses, use the “statuses” table from the Jira connector instead of creating a virtual data source with a SELECT DISTINCT on Jira Issues

    • The exception to this is if you want to filter to only specific values or values only with specific properties, then a virtual data source is appropriate

  • Do not use virtual data sources to create a column-limited set of data (I.E. only 5 columns out of 200 from a particular data source). Use a custom entity instead

Custom Entities

  • Use custom entities to avoid syncing columns from a data source that contain sensitive or unneeded information

  • Custom entities can also be used to avoid frequent required schema changes if standard entities are changing too often

  • Custom entities should be named using the rules listed above, but the name should also indicate that it is a custom entity (I.E. “Custom Jira Issues” or "Jira Issues Custom")

See Also

Did this answer your question?