This article contains the Gtmhub best practices for connecting data sources to Gtmhub, including virtual data sources.
Connecting Data Sources
Only connect the data sources that you actually need
When initially testing a connection, test on a known smaller table (Jira Projects in Jira, for example)
For larger systems, consider using custom instead of standard entities if you only need a few specific columns of information
When connecting to databases, we recommended using a view instead of directly connecting to tables, especially if you need bits of data from multiple different tables
When connecting a data source, it is highly recommended to use a service account. This will prevent the data source from needing to be re-authenticated if someone leaves an organization or loses access to the system
This also provides another way to limit data coming into Gtmhub to only the data needed by restricting the service account’s access to the system
Data Source Naming
After selecting a table to connect, you will be asked for a name for the data source and the table name will be suggested. We recommend prepending the name of the data connector (I.E. Jira, Hubspot, SalesForce, etc.) to the suggested name
The SQL table name will be auto-generated from this display name replacing spaces with underscores and changing all characters to lower case
This will make it easier to discern in SQL which data source the table is coming from
For longer source names, an abbreviation is OK (“SF” for SalesForce, for example), just be consistent for all data sources within that data connector
❌Avoid This ApplicationRoles |
✔️Do This Jira Application Roles |
If you have connected multiple instances under the same connector (I.E. multiple Jira instances) adjust your naming to account for which instance
“Jira Production Application Roles” and “Jira Dev Application Roles” for example, when connecting to production and dev Jira instances
When creating a filtered data source, the name should indicate what filtering is happening
“Jira Issues CD and RA Projects” for example if filtering to the CD and RA projects
Sync Period
Syncing data can be a very intensive process for the source systems with potential performance impacts for end users. Limit syncs to daily for larger systems.
Using hourly for large syncs may cause performance issues, especially on the source system if it is underpowered
Remember that syncs can be triggered manually if needed from the Data Sources page if a one-off more frequent sync is needed
When connecting large data sources for testing and not actually using the data yet, set the sync to manual until ready
Schema Changes
If a data source is failing to sync due to data type mismatches, a schema changes run is most likely required
This is usually caused by a configuration change in the source system
Virtual Data Sources
Virtual data sources should follow all SQL Best Practices except for the 5000 result limit, which does not apply to virtual data sources
Virtual data sources should be named using the rules listed above
If the virtual data source uses data from only one system, use that system’s name in the virtual data source name and key
If the virtual data source merges data from multiple systems, it is not necessary to include all source names, but it should be clear that it is merged data
It is OK to use virtual data sources to create filtered smaller tables of data from large data sources. This may help certain insight boards perform better that reference larger data sources.
If a table exists in the data connector, do not use a virtual data source to replicate that table
For example, if you need a list of Jira Issue statuses, use the “statuses” table from the Jira connector instead of creating a virtual data source with a SELECT DISTINCT on Jira Issues
The exception to this is if you want to filter to only specific values or values only with specific properties, then a virtual data source is appropriate
Do not use virtual data sources to create a column-limited set of data (I.E. only 5 columns out of 200 from a particular data source). Use a custom entity instead
Custom Entities
Use custom entities to avoid syncing columns from a data source that contain sensitive or unneeded information
Custom entities can also be used to avoid frequent required schema changes if standard entities are changing too often
Custom entities should be named using the rules listed above, but the name should also indicate that it is a custom entity (I.E. “Custom Jira Issues” or "Jira Issues Custom")
See Also