Connectors → Box

About Box

Box is a cloud computing business which provides file sharing, collaborating, and other tools for working with files that you upload to its servers. You can determine how you can share your content with other users. You may invite others to view and edit your shared files, upload documents and photos to a shared files folder, and thus share those documents outside Box, and give other users rights to view shared files.

About the Box connector

With the Box connector, you can create a data source for a Box file or folder. The Box connector supports the following file extensions:

  • .csv
  • .tsv
  • .tab
  • .txt
  • .xlsx

You can access all folders and files that you own, in addition to any files and folders that someone shares with you.

The Box connector supports the following Incorta specific functionality:

FeatureSupported
Chunking
Data Agent
Encryption at Ingest
Incremental Load
Multi-Source
OAuth
Performance Optimized
Remote
Single-Source
Spark Extraction
Webhook Callbacks

Important

Box supports storing multiple versions of the same file. The Box connector can access only the current version of a file.

The Box connector prerequisites

In the case of deploying the Box connector in an on-premises environment of Incorta, it requires the following:

  • Security configurations
  • Default and Tenant Configurations
Note

Some configurations may differ if you are deploying the Box connector in an Incorta Cloud instance. For example, an Incorta Cluster in Cloud natively supports Hypertext Transfer Protocol Secure (HTTPS). In addition, you have the Default and Tenant configurations already defined. You only need to provide the credentials of your Box account, if you are not already signed-in, when authorizing Incorta to connect to it.

Security configurations for the Box connector

Box requires authentication using the OAuth 2.0 protocol to authorize external applications to access user files. Security and system administrators typically address the security requirements for the Box connector. The Box connector uses the Box API, and thus it requires the following:

  • HTTPS for the Incorta Cluster
  • A Box developer account
  • A Box application to connect to your Box account, and you have to configure it with the OAuth Redirect URI that Incorta uses at the authorization time

HTTPS for the Incorta Cluster

In order to use the Box connector, you must configure your Incorta Cluster to use HTTPS . Typically, a System Administrator for the operating system with root access configures an Incorta Cluster for HTTPS.

To learn more about how to configure HTTPS with TLS/SSL for your Incorta Cluster using Let’s Encrypt, Certbot and OpenSSL, please review Security → HTTPS for Apache Tomcat with OpenSSL.

Client Credentials

A Security Administrator or System Administrator who manages your organization’s Box accounts as well as your Incorta Cluster creates the required Application on the Box platform.

  • Use the Box Account to sign in to the Box Developer Console
  • Create an application
  • Register the OAuth 2.0 Redirect URI that Incorta uses at the authorization time
  • Get the client credentials (Client ID and Client Secret)

To learn more about how to create a Box application using the Developer Console, refer to Guides → Applications.

Default and Tenant Configurations for the Box connector

A Cluster Management Console (CMC) administrator for your Incorta Cluster must define the default tenant configuration, and each tenant, if required, to use the client credentials (Client ID and Client Secret).

Important

After configuration, you must restart the Analytics Service, Loader Service, and any add-ons such as the Notebook Service.

Specify the client credentials for the Default Tenant Configuration

Here are the steps to specify the required properties for the Default Tenant Configuration:

  • Sign in to the CMC.
  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select Cluster Configurations.
  • In the panel tabs, select Default Tenant Configurations.
  • In the left pane, select Integration.
  • In the right pane, specify:
    • Your Client ID in Box Client ID.
    • Your Client Secret in Box Client Secret.
  • Select Save.

Specify the client credentials for a Tenant Configuration

Here are the steps to specify the required properties for a specific tenant:

  • Sign in to the CMC.
  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the canvas tabs, select Tenants.
  • For the given tenant, select Configure.
  • In the left pane, select Integration.
  • In the right pane, specify:
    • Your Client ID in Box Client ID.
    • Your Client Secret in Box Client Secret.
  • Select Save.

Restart the Incorta Services

Here are the steps to restart the various services in an Incorta Cluster from the Cluster Management Console (CMC).

  • As the CMC Administrator, sign in to the CMC.
  • In the Navigation bar, select Clusters.
  • In the cluster list, select a Cluster name.
  • In the Details canvas tab, in the footer bar, select Restart.

Steps to Connect Box and Incorta

To connect Box and Incorta, here are the high level steps, tools, and procedures:

Create an external data source

Here are the steps to create an external data source with the Box connector:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Data.
  • In the Action bar, select + NewAdd Data Source.
  • In the Choose a Data Source dialog, in File System, select Box.
  • In the New Data Source dialog, specify the applicable connector properties.
  • To test, select Test Connection.
  • Select Ok to save your changes.
Note

If you select the lowest folder in the tree, you will see No Data in the Select Directory from dialog. You will have access to the files in this folder upon schema creation. However, you will not be able to select the parent folder.

Box connector properties

Here are the properties for the Box connector:

PropertyControlDescription
Data Source Nametext boxEnter the name of the data source
AuthorizebuttonSelect this button to authenticate your Box account and grant Incorta read access to your Box storage . Sign in to your Box account, if you are not signed-in, and then select the Grant Access to Box button. The New Data Source dialog will reappear, and the Authorize button will change to Authorized with the name of the Box account to the right.
BrowsebuttonSelect a folder from the directories shown that contains the folders or files you would like to connect to. If you do not choose a folder, you will have access to all folders and files found in your Box storage, including folders and files that someone shares with you. It is not possible to select the root folder for a table data source.

Create a schema with the Schema Wizard

Here are the steps to create a Box schema with the Schema Wizard:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + NewSchema Wizard
  • In (1) Choose a Source, specify the following:
    • For Enter a name, enter the schema name.
    • For Select a Datasource, select the Box external data source.
    • Optionally, create a description.
  • In the Schema Wizard footer, select Next.
  • In (2) Manage Tables, in the Data panel, navigate the directory tree as necessary to select your file.
  • In the Schema Wizard footer, select Next.
  • In (3) Finalize, in the Schema Wizard footer, select Create Schema.

Create a schema with the Schema Designer

Here are the steps to create a Box schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the Action bar, select + NewCreate Schema.
  • In Name, specify the schema name, and select Save.
  • In Start adding tables to your schema, select File System.
  • In the Data Source dialog, specify the table data source properties.
  • Select Add.
  • In the Table Editor, in the Table Summary section, enter the table name.
  • To save your changes, select Done in the Action bar.

Box table data source properties

You can specify a single file or folder in the Data Source dialog. Both the Schema Designer and Table Editor represent a single file and folder data source as a single-source table. In order to select a folder in your Box storage, you must enable Union Files.

Note

This release has limited support for Union Files for Excel (.xlsx) Workbook files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties.

Common properties for a file and folder

Here are some of the common properties for both the selection a file and a folder:

PropertyControlDescription
Typedrop down listDefault is File System
Data Sourcedrop down listSelect the Box external data source
File Typedrop down listSelect Text (.csv, .tsv, .tab, .txt) or Excel (.xlsx)
IncrementaltoggleEnable to support incremental loading. For a single file, you must specify both a File and Update File.
Has HeadertoggleSelect if the first row contains column header values
CallbacktoggleEnable post extraction callback, that is, enable callback on the data source data set(s) by invoking a certain callback URL with parameters containing details about the load job
Callback URLtext boxEnable Callback to configure this property. Specify the URL.

Important

Some points you need to consider when you enable incremental load.

  • If the table does not have a defined Key column, the Loader Service will append new rows to the table. This scenario may result in duplicate rows and no updates to existing rows.
  • When you enable Incremental load for a single file, you must specify both a File and Update File. In this case, when you perform a full load for the first time, the Loader Service neglects the Update File. And if you perform a full load after an incremental load, the Update File data will be removed from the table.
Common file properties

Here are some of the common properties specifically related to selecting a file of either type: Text (.csv, .tsv, .tab, .txt) or Excel (.xlsx).

PropertyControlDescription
FilebuttonSelect to open the Add File from dialog. The dialog shows the files from your Box data source. Select a single file and select Add.
Update FilebuttonWhen you enable Incremental, Update File is available. Select this button to open the Add File from dialog. The dialog shows the files from your Box data source. Select a single file and select Add.

Properties for an Excel Workbook file

Here are the specific properties for an Excel Workbook (.xlsx) file:

PropertyControlDescription
Worksheetdrop down listSelect a given worksheet for the Excel Workbook
Update Worksheetdrop down listWhen you enable Incremental, Update Worksheet is available. Select a worksheet from the Update File you select

Properties for a Text file

Here are the properties specific to a Text (.csv, .tsv, .tab, .txt) file:

PropertyControlDescription
Date Formatdrop down listSelect a specific format for date columns. Date formats are Java date format conventions. With Automatic, Incorta will determine the format by sampling the first few rows.
Timestamp Formatdrop down listSelect a specific format for timestamp columns. Timestamp formats are Java date and time format conventions. With Automatic, Incorta will determine the format by sampling the first few rows.
Character Setdrop down listSelect a supported character set.
Separatordrop down listSpecify a separator for columns in the row values. Comma and Tab are standard delimiters. Other requires that you specify a value such as :.
Othertext boxAvailable when the Separator is Other. Enter one or more characters to specify the column separator or delimiter between values in a row.
Enable ChunkingtoggleEnable for large file sizes
Chunk Size (MB)text boxEnter a value in megabytes (MB) to specify the chunk size

Common folder properties

Folder properties are available when you enable Union Files. It is not possible to select a parent folder.

Here are the properties specifically related to selecting a folder:

PropertyControlDescription
Union FilestoggleEnable this property to select all files within a given folder. You will only be able to select a folder from your Box data source.
DirectorybuttonSelect a folder from your Box data source. It is not possible to select a parent folder. Make sure that the files you want to union have the same column names; otherwise, the Loader Service will load them as different columns.
Includetext boxEnter a keyword with a wildcard * symbol to include specific named files within the folder
Excludetext boxEnter a keyword with a wildcard * symbol to exclude specific named files within the folder
Include Sub-Directories FilestoggleEnable to include files from sub-folders
Add Filename as a columntoggleEnable this property to add the filename of the file as a column. You will then need to specify a column name.
Filename columntext boxEnter a column name for the filename such as source_file_name

Important

Column names are case-sensitive. When creating a table from multiple sources or from files in a directory, make sure that the same column has the same name in the different files; otherwise Incorta will extract and load them as different columns.

Folder properties for Excel Workbook files

This release has limited support for Union Files for Excel Workbook (.xlsx) files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties. For this reason, each Excel Workbook file in the selected folder must have a common Worksheet tab name. You must select this common Worksheet name in the drop down list.

Here are the properties specifically related to selecting a folder with a file type as Excel Workbook (.xlsx):

PropertyControlDescription
Worksheetdrop down listSelect the common worksheet

View the schema diagram with the Schema Diagram Viewer

Here are the steps to view the schema diagram using the Schema Diagram Viewer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the Box schema.
  • In the Schema Designer, in the Action bar, select Diagram.

Load the schema

Here are the steps to perform a Full Load of the Box schema using the Schema Designer:

  • Sign in to the Incorta Direct Data Platform™.
  • In the Navigation bar, select Schema.
  • In the list of schemas, select the Box schema.
  • In the Schema Designer, in the Action bar, select LoadLoad NowFull.
  • To review the load status, in Last Load Status, select the date.

Explore the schema

With the full load of the Box schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.

To open the Analyzer from the schema, follow these steps:

  • In the Navigation bar, select Schema.
  • In the Schema Manager, in the List view, select the Box schema.
  • In the Schema Designer, in the Action bar, select Explore Data.

For more information about how to use the Analyzer to create insights, see Analyzer and Visualizations.