Connectors → Box
About Box
Box is a cloud computing business which provides file sharing, collaborating, and other tools for working with files that you upload to its servers. You can determine how you can share your content with other users. You may invite others to view and edit your shared files, upload documents and photos to a shared files folder, and thus share those documents outside Box, and give other users rights to view shared files.
About the Box connector
With the Box connector, you can create a data source for a Box file or folder. The Box connector supports the following file extensions:
.csv
.tsv
.tab
.txt
.xlsx
You can access all folders and files that you own, in addition to any files and folders that someone shares with you.
The Box connector supports the following Incorta specific functionality:
Feature | Supported |
---|---|
Chunking | ✔ |
Data Agent | |
Encryption at Ingest | |
Incremental Load | ✔ |
Multi-Source | ✔ |
OAuth | ✔ |
Performance Optimized | ✔ |
Remote | |
Single-Source | ✔ |
Spark Extraction | |
Webhook Callbacks | ✔ |
Box supports storing multiple versions of the same file. The Box connector can access only the current version of a file.
The Box connector prerequisites
In the case of deploying the Box connector in an on-premises environment of Incorta, it requires the following:
- Security configurations
- Default and Tenant Configurations
Some configurations may differ if you are deploying the Box connector in an Incorta Cloud instance. For example, an Incorta Cluster in Cloud natively supports Hypertext Transfer Protocol Secure (HTTPS). In addition, you have the Default and Tenant configurations already defined. You only need to provide the credentials of your Box account, if you are not already signed-in, when authorizing Incorta to connect to it.
Security configurations for the Box connector
Box requires authentication using the OAuth 2.0 protocol to authorize external applications to access user files. Security and system administrators typically address the security requirements for the Box connector. The Box connector uses the Box API, and thus it requires the following:
- HTTPS for the Incorta Cluster
- A Box developer account
- A Box application to connect to your Box account, and you have to configure it with the OAuth Redirect URI that Incorta uses at the authorization time
HTTPS for the Incorta Cluster
In order to use the Box connector, you must configure your Incorta Cluster to use HTTPS . Typically, a System Administrator for the operating system with root access configures an Incorta Cluster for HTTPS.
To learn more about how to configure HTTPS with TLS/SSL for your Incorta Cluster using Let’s Encrypt, Certbot and OpenSSL, please review Security → HTTPS for Apache Tomcat with OpenSSL.
Client Credentials
A Security Administrator or System Administrator who manages your organization’s Box accounts as well as your Incorta Cluster creates the required Application on the Box platform.
- Use the Box Account to sign in to the Box Developer Console
- Create an application
- Register the OAuth 2.0 Redirect URI that Incorta uses at the authorization time
- Get the client credentials (Client ID and Client Secret)
To learn more about how to create a Box application using the Developer Console, refer to Guides → Applications.
Default and Tenant Configurations for the Box connector
A Cluster Management Console (CMC) administrator for your Incorta Cluster must define the default tenant configuration, and each tenant, if required, to use the client credentials (Client ID and Client Secret).
After configuration, you must restart the Analytics Service, Loader Service, and any add-ons such as the Notebook Service.
Specify the client credentials for the Default Tenant Configuration
Here are the steps to specify the required properties for the Default Tenant Configuration:
- Sign in to the CMC.
- In the Navigation bar, select Clusters.
- In the cluster list, select a Cluster name.
- In the canvas tabs, select Cluster Configurations.
- In the panel tabs, select Default Tenant Configurations.
- In the left pane, select Integration.
- In the right pane, specify:
- Your Client ID in Box Client ID.
- Your Client Secret in Box Client Secret.
- Select Save.
Specify the client credentials for a Tenant Configuration
Here are the steps to specify the required properties for a specific tenant:
- Sign in to the CMC.
- In the Navigation bar, select Clusters.
- In the cluster list, select a Cluster name.
- In the canvas tabs, select Tenants.
- For the given tenant, select Configure.
- In the left pane, select Integration.
- In the right pane, specify:
- Your Client ID in Box Client ID.
- Your Client Secret in Box Client Secret.
- Select Save.
Restart the Incorta Services
Here are the steps to restart the various services in an Incorta Cluster from the Cluster Management Console (CMC).
- As the CMC Administrator, sign in to the CMC.
- In the Navigation bar, select Clusters.
- In the cluster list, select a Cluster name.
- In the Details canvas tab, in the footer bar, select Restart.
Steps to Connect Box and Incorta
To connect Box and Incorta, here are the high level steps, tools, and procedures:
- Create an external data source
- Create a schema with the Schema Wizard
- or, Create a schema with the Schema Designer
- Load the schema
- Explore the schema
Create an external data source
Here are the steps to create an external data source with the Box connector:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Data.
- In the Action bar, select + New → Add Data Source.
- In the Choose a Data Source dialog, in File System, select Box.
- In the New Data Source dialog, specify the applicable connector properties.
- To test, select Test Connection.
- Select Ok to save your changes.
If you select the lowest folder in the tree, you will see No Data in the Select Directory from dialog. You will have access to the files in this folder upon schema creation. However, you will not be able to select the parent folder.
Box connector properties
Here are the properties for the Box connector:
Create a schema with the Schema Wizard
Here are the steps to create a Box schema with the Schema Wizard:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Schema Wizard
- In (1) Choose a Source, specify the following:
- For Enter a name, enter the schema name.
- For Select a Datasource, select the Box external data source.
- Optionally, create a description.
- In the Schema Wizard footer, select Next.
- In (2) Manage Tables, in the Data panel, navigate the directory tree as necessary to select your file.
- In the Schema Wizard footer, select Next.
- In (3) Finalize, in the Schema Wizard footer, select Create Schema.
Create a schema with the Schema Designer
Here are the steps to create a Box schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Create Schema.
- In Name, specify the schema name, and select Save.
- In Start adding tables to your schema, select File System.
- In the Data Source dialog, specify the table data source properties.
- Select Add.
- In the Table Editor, in the Table Summary section, enter the table name.
- To save your changes, select Done in the Action bar.
Box table data source properties
You can specify a single file or folder in the Data Source dialog. Both the Schema Designer and Table Editor represent a single file and folder data source as a single-source table. In order to select a folder in your Box storage, you must enable Union Files.
This release has limited support for Union Files for Excel (.xlsx
) Workbook files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties.
Common properties for a file and folder
Here are some of the common properties for both the selection a file and a folder:
Property | Control | Description |
---|---|---|
Type | drop down list | Default is File System |
Data Source | drop down list | Select the Box external data source |
File Type | drop down list | Select Text (.csv , .tsv , .tab , .txt ) or Excel (.xlsx ) |
Incremental | toggle | Enable to support incremental loading. For a single file, you must specify both a File and Update File. |
Has Header | toggle | Select if the first row contains column header values |
Callback | toggle | Enable post extraction callback, that is, enable callback on the data source data set(s) by invoking a certain callback URL with parameters containing details about the load job |
Callback URL | text box | Enable Callback to configure this property. Specify the URL. |
Some points you need to consider when you enable incremental load.
- If the table does not have a defined Key column, the Loader Service will append new rows to the table. This scenario may result in duplicate rows and no updates to existing rows.
- When you enable Incremental load for a single file, you must specify both a File and Update File. In this case, when you perform a full load for the first time, the Loader Service neglects the Update File. And if you perform a full load after an incremental load, the Update File data will be removed from the table.
Common file properties
Here are some of the common properties specifically related to selecting a file of either type: Text (.csv
, .tsv
, .tab
, .txt
) or Excel (.xlsx
).
Property | Control | Description |
---|---|---|
File | button | Select to open the Add File from dialog. The dialog shows the files from your Box data source. Select a single file and select Add. |
Update File | button | When you enable Incremental, Update File is available. Select this button to open the Add File from dialog. The dialog shows the files from your Box data source. Select a single file and select Add. |
Properties for an Excel Workbook file
Here are the specific properties for an Excel Workbook (.xlsx
) file:
Property | Control | Description |
---|---|---|
Worksheet | drop down list | Select a given worksheet for the Excel Workbook |
Update Worksheet | drop down list | When you enable Incremental, Update Worksheet is available. Select a worksheet from the Update File you select |
Properties for a Text file
Here are the properties specific to a Text (.csv
, .tsv
, .tab
, .txt
) file:
Property | Control | Description |
---|---|---|
Date Format | drop down list | Select a specific format for date columns. Date formats are Java date format conventions. With Automatic , Incorta will determine the format by sampling the first few rows. |
Timestamp Format | drop down list | Select a specific format for timestamp columns. Timestamp formats are Java date and time format conventions. With Automatic , Incorta will determine the format by sampling the first few rows. |
Character Set | drop down list | Select a supported character set. |
Separator | drop down list | Specify a separator for columns in the row values. Comma and Tab are standard delimiters. Other requires that you specify a value such as : . |
Other | text box | Available when the Separator is Other . Enter one or more characters to specify the column separator or delimiter between values in a row. |
Enable Chunking | toggle | Enable for large file sizes |
Chunk Size (MB) | text box | Enter a value in megabytes (MB) to specify the chunk size |
Common folder properties
Folder properties are available when you enable Union Files. It is not possible to select a parent folder.
Here are the properties specifically related to selecting a folder:
Column names are case-sensitive. When creating a table from multiple sources or from files in a directory, make sure that the same column has the same name in the different files; otherwise Incorta will extract and load them as different columns.
Folder properties for Excel Workbook files
This release has limited support for Union Files for Excel Workbook (.xlsx
) files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties. For this reason, each Excel Workbook file in the selected folder must have a common Worksheet tab name. You must select this common Worksheet name in the drop down list.
Here are the properties specifically related to selecting a folder with a file type as Excel Workbook (.xlsx
):
Property | Control | Description |
---|---|---|
Worksheet | drop down list | Select the common worksheet |
View the schema diagram with the Schema Diagram Viewer
Here are the steps to view the schema diagram using the Schema Diagram Viewer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the list of schemas, select the Box schema.
- In the Schema Designer, in the Action bar, select Diagram.
Load the schema
Here are the steps to perform a Full Load of the Box schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the list of schemas, select the Box schema.
- In the Schema Designer, in the Action bar, select Load → Load Now → Full.
- To review the load status, in Last Load Status, select the date.
Explore the schema
With the full load of the Box schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.
To open the Analyzer from the schema, follow these steps:
- In the Navigation bar, select Schema.
- In the Schema Manager, in the List view, select the Box schema.
- In the Schema Designer, in the Action bar, select Explore Data.
For more information about how to use the Analyzer to create insights, see Analyzer and Visualizations.