Connectors → Google Drive
About Google Drive
Google Drive is Google’s cloud-storage service that allows you to store, share, and collaborate on files and folders from any mobile device, tablet, or computer. Google Drive comes with a Google Account or G Suite Account.
About the Google Drive connector
With the Google Drive connector, you can create a data source for a Google Drive file or folder. The Google Drive connector supports the following file extensions:
.csv
.tsv
.tab
.txt
.xlsx
You can access all folders and files that you own and any folders or files that someone shares with you.
If you want to create a data source for Google Sheets, you must use the dedicated Google Sheets connector. The Google Sheets connector also supports both the selection of a specific folder or file in Google Drive. For that reason, you must enable and configure the Google Drive connector to support the Google Sheets connector.
In this release, you are not able to access folders or files in a Shared Drive.
The Google Drive connector supports the following Incorta specific functionality:
Feature | Supported |
---|---|
Chunking | ✔ |
Data Agent | |
Encryption at Ingest | |
Incremental Load | ✔ |
Multi-Source | ✔ |
OAuth | ✔ |
Performance Optimized | ✔ |
Remote | |
Single-Source | ✔ |
Spark Extraction | |
Webhook Callbacks | ✔ |
Google Drive supports storing files with the same name and multiple versions of the same file. The Google Drive connector supports accessing a file in My Drive
that has a unique file name and current file version.
The Google Drive connector requires the following:
- Security configurations
- Default and Tenant Configurations
Some configurations may differ if you are deploying the Google Drive connector in an Incorta Cloud instance. For example, an Incorta Cluster in Cloud natively supports HTTPS.
Security configurations for the Google Drive connector
Security and system administrators typically address the security requirements for the Google Drive connector. The connector uses the Google API, and as such, requires the following:
- HTTPS for the Incorta Cluster
- G Suite account
- Google API Project with both the Google Drive API and the Google Sheets API enabled
HTTPS for the Incorta Cluster
In order to use the Google Drive connector or Google Sheets connector, you must configure your Incorta Cluster to use HTTPS (Hypertext Transfer Protocol Secure). Typically, a System Administrator for the operating system with root access configures an Incorta Cluster for HTTPS.
The Google APIs do not accept self-signed security certificates. You must use a valid certificate for a known public domain.
To learn more about how to configure HTTPS with TLS/SSL for your Incorta Cluster using Let’s Encrypt, Certbot and OpenSSL, please review Security → HTTPS for Apache Tomcat with OpenSSL.
Client Credentials
A Security Administrator or System Administrator who manages your organization’s G Suite accounts as well as your Incorta Cluster creates the required Google API project. The G Suite Account must sign in to the Google Developers Console, create a project, create an OAuth consent screen, and then create the client credentials.
To learn more about how to create client credentials for a Google API project, please review Security → Client credentials for a Google Drive API project.
Default and Tenant Configurations for the Google Drive Connector
A Cluster Management Console (CMC) administrator for your Incorta Cluster must configure each tenant to use the client credentials.
After configuration, you must restart the Analytics Service, Loader Service, and any add-ons such as the Notebook Service.
Specify the client credentials for the Default Tenant Configuration
Here are the steps to specify the required properties for the Default Tenant Configuration:
- Sign in to the CMC.
- In the Navigation bar, select Clusters.
- In the cluster list, select a Cluster name.
- In the canvas tabs, select Cluster Configurations.
- In the panel tabs, select Default Tenant Configurations.
- In the left pane, select Integration.
- In the right pane, specify
- Your Client ID in Google Drive Client ID.
- Your Client Secret in Google Drive Client Secret.
- Select Save.
Specify the client credentials for a Tenant Configuration
Here are the steps to specify the required properties for a specific tenant:
- Sign in to the CMC.
- In the Navigation bar, select Clusters.
- In the cluster list, select a Cluster name.
- In the canvas tabs, select Tenant.
- For the given tenant, select Configure.
- In the left pane, select Integration.
- In the right pane, specify:
- Your Client ID in Google Drive Client ID.
- Your Client Secret in Google Drive Client Secret.
- Select Save.
Restart the Incorta Services
Here are the steps to restart the various services in an Incorta Cluster from the Cluster Management Console (CMC).
- As the CMC Administrator, sign in to the CMC.
- In the Navigation bar, select Clusters.
- In the cluster list, select a Cluster name.
- In the Details canvas tabs, in the footer bar, select Restart.
Steps to Connect Google Drive and Incorta
To connect your Google Drive and Incorta, here are the high level steps, tools, and procedures:
- Create an external data source
- Create a schema with the Schema Wizard
- or, Create a schema with the Schema Designer
- Load the schema
- Explore the schema
Create an external data source
Here are the steps to create an external data source with the Google Drive connector:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Data.
- In the Action bar, select + New → Add Data Source.
- In the Choose a Data Source dialog, in File System, select Google Drive.
- In the New Data Source dialog, specify the applicable connector properties.
- To test, select Test Connection.
- Select Ok to save your changes.
If you select the lowest folder in the tree, you will see No Data in the Select Directory from dialog. You will have access to the files in this folder upon schema creation. However, you will not be able to select the parent folder.
Google Drive connector properties
Here are the properties for the Google Drive connector:
Property | Control | Description |
---|---|---|
Data Source Name | text box | Enter the name of the data source |
Authorize | button | Select this button to authenticate your Google account and grant Incorta read access to your Google Drive. Choose an account to use to access your Google Drive and select the Allow button. The New Data Source dialog will reappear, and the Authorize button will change to Authorized with the name of the Google account to the right. |
Browse | button | Select a folder from the directories shown that contains the folder or file you would like to connect to. If you do not choose a folder, you will have access to all folders and files found in My Drive and Shared with me. It is not possible to select a parent folder for a table data source. |
Create a schema with the Schema Wizard
Here are the steps to create a Google Drive schema with the Schema Wizard:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Schema Wizard
- In (1) Choose a Source, specify the following:
- For Enter a name, enter the schema name.
- For Select a Datasource, select the Google Drive external data source.
- Optionally create a description.
- In the Schema Wizard footer, select Next.
- In (2) Manage Tables, in the Data panel, navigate the directory tree as necessary to select your file.
- In the Schema Wizard footer, select Next.
- In (3) Finalize, in the Schema Wizard footer, select Create Schema.
Create a schema with the Schema Designer
Here are the steps to create a Google Drive schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Create Schema.
- In Name, specify the schema name, and select Save.
- In Start adding tables to your schema, select File System.
- In the Data Source dialog, specify the various properties table data source properties.
- Select Add.
- In the Table Editor, in the Table Summary section, enter the table name.
- To save your changes, select Done in the Action bar.
Google Drive table data source properties
You can specify a single file or folder in the Data Source dialog. Both the Schema Designer and Table Editor represent a single file and folder data source as a single-source table. In order to select a folder in your My Drive, you must enable Union Files.
This release has limited support for Union Files for Excel (.xlsx
) Workbook files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties.
Common properties for a file and folder
Here are some of the common properties for both the selection a file and a folder:
Property | Control | Description |
---|---|---|
Type | drop down list | Default is File System |
Data Source | drop down list | Select the Google Drive external data source |
File Type | drop down list | Select the Text (.csv , .tsv , .tab , .txt ) or Excel (.xlsx ) |
Has Header? | toggle | Select if first row contains column header values |
Callback | toggle | Enables the Callback URL field |
Callback URL | text box | This property appears when the Callback toggle is enabled. Specify the URL. |
Common file properties
Here are some of the common properties specifically related to selecting a file of either type Text (.csv
, .tsv
, .tab
, .txt
) or Excel (.xlsx
):
Property | Control | Description |
---|---|---|
Incremental | toggle | Enable to support incremental loading. For a single file, you must specify both a File and Update file. |
File | button | Select a file opens the Add File from dialog. The dialog shows the files from your Google Drive data source. Select a single file and select Add. |
Update File | button | With Incremental enabled, Update File is available. Select a file opens the Add File from dialog. The dialog shows the files from your Google Drive data source. Select a single file and select Add. |
With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.
Properties for an Excel Workbook file
Here are the specific properties for an Excel Workbook (.xlsx
) file:
Property | Control | Description |
---|---|---|
Worksheet | drop down list | Select a given worksheet for the Excel Workbook |
Properties for a Text file
Here are the properties specific to a Text (.csv
, .tsv
, .tab
, .txt
) file:
Property | Control | Description |
---|---|---|
Date Format | drop down list | Select a specific format for date columns. Date formats are Java date format conventions. With Automatic , Incorta will determine the format by sampling the first few rows. |
Timestamp Format | drop down list | Select a specific format for timestamp columns. Timestamp formats are Java data and time format conventions. With Automatic , Incorta will determine the format by sampling the first few rows. |
Character Set | drop down list | Select a supported character set. |
Separator | drop down list | Available when the selected File Type is Text. Specify a separator for columns in the row values. Comma and Tab are standard delimiters. Other requires that you specify a value such as : . |
Other | text box | Available when the Separator is Other . Enter one or more characters to specify the column separator or delimiter between values in a row. |
Enable Chunking | toggle | Enable for large file sizes |
Chunk Size (MB) | text box | Enter a value in megabytes (MB) to specify the chunk size |
Common folder properties
Folder properties are available when you enable Union Files. It is not possible to select a parent folder.
Here are the properties specifically related to selecting a folder:
Property | Control | Description |
---|---|---|
Incremental | toggle | Enable to support incremental loading |
Union Files | toggle | Enable to select all files within a given folder. When enabled, you will only be able to select a folder from your Google Drive data source. |
Directory | button | Select a folder from your Google Drive data source. It is not possible to select a parent folder. |
Include | text box | Enter a keyword with a wildcard * symbol to include specific named files within the folder |
Exclude | text box | Enter a keyword with a wildcard * symbol to exclude specific named files within the folder |
Include Sub-Directories Files | toggle | Enable to include files from sub-folders |
Add Filename as a column | toggle | Enable to add the filename of the file as a column. You will then need to specify a column name. |
Filename column | text box | Enter a column name for the filename such as source_file_name |
With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.
Folder properties for Excel Workbook files
This release has limited support for Union Files for Excel Workbook (.xlsx
) files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties. For this reason, each Excel Workbook file in the selected folder must have a common Worksheet tab name. You must select this common Worksheet name in the drop down list.
Here are the properties specifically related to selecting a folder with a file type as Excel Workbook (.xlsx
) files:
Property | Control | Description |
---|---|---|
Worksheet | drop down list | Select a tab for a worksheet |
View the schema diagram with the Schema Diagram Viewer
Here are the steps to view the schema diagram using the Schema Diagram Viewer:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Schema.
- In the list of schemas, select the Google Drive schema.
- In the Schema Designer, in the Action bar, select Diagram.
Load the schema
Here are the steps to perform a Full Load of the Google Drive schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Schema.
- In the list of schemas, select the Google Drive schema.
- In the Schema Designer, in the Action bar, select Load → Load Now → Full.
- To review the load status, in Last Load Status, select the date.
Explore the schema
With the full load of the Google Drive schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.
To open the Analyzer from the schema, follow these steps:
- In the Navigation bar, select Schema.
- In the Schema Manager, in the List view, select the Google Drive schema.
- In the Schema Designer, in the Action bar, select Explore Data.