Connectors → Data Files
About the Data Files connector
The Data Files connector allows you to connect to a local data file or a local data folder that has one or more local data files.
A local data file is a file that has been uploaded to a specific tenant in an Incorta Cluster. A local data folder is a folder that has been uploaded to a specific tenant in an Incorta Cluster. Using the Data Manager, you can upload one or more data files and folders to Shared Storage.
The Local Files connector is similar in name to the built-in LocalFiles data source. Unlike the Data Files connector, you can create an external data source using the Local Files connector. With the Local Files connector, you specify a host directory with files or subdirectories. For example, you can specify a shared mount as the directory path. In addition, the Local Files connector supports the Remote tables configuration and file types such as Parquet (.parquet
) and Optimized Row Columnar (.orc
). For this reason, the Local Files connector is in the category of Data Lake connectors.
The Data Files connector supports the following file extensions:
File Extension | File Type | Example | Notes |
---|---|---|---|
.csv | comma separated values | sales.csv | Can contain a header row |
.tsv | tab separated values | sales.tsv | Can contain a header row |
.tab | tab separated values | sales.tsv | Can contain a header row |
.txt | custom delimiter for separated values | sales.txt | Can contain a header row |
.xlsx | Microsoft Excel 2000 and above | sales.xlsx | Must be .xlsx .Supports Worksheet selection. |
.kml | tag-based structure with nested attributes based on the XML standard | sales.kml | All tags are case-sensitive. For more information on using the KML file to render an insight, refer to the example of an Advanced Map insight using a KML file. |
.kmz | zipped version of a KML file | sales.kmz | |
.zip | zipped file | sales.zip | |
.gzip | GNU zipped file | sales.gzip |
You can access all folders and files that you own and any folders or files that someone shares with you.
The Data Files connector supports the following functionality:
Feature | Supported |
---|---|
Chunking | ✔ |
Data Agent | |
Encryption at Ingest | |
Incremental Load | ✔ |
Multi-Source | ✔ |
OAuth | |
Performance Optimized | ✔ |
Remote | |
Single-Source | ✔ |
Spark Extraction | ✔ |
Webhook Callbacks | ✔ |
Steps to use Data Files connector
Here are the high level steps, tools, and procedures to use the Data Files connector with the LocalFiles data source:
- Upload one or more data files and folders, including subfolders and files
- Create a physical schema with the Schema Wizard
- or, Create a physical schema with the Schema Designer
- Load the physical schema
- Explore the physical schema
Upload one or more data files and folders, including subfolders and files
A folder can contain zero or more files with zero or more subfolders. Incorta preserves the hierarchy of folders. Incorta only uploads files with the following supported file extensions. After upload, Incorta will unzip compressed folders and files.
Here are the steps to create and one or more local data folders and/or local data files, including subfolders and files:
- In the Navigation bar, select Data.
- In the Action bar, select + New → Add Data Source.
- In the Choose a Data Source dialog, in Data Files, select Upload Data Folder.
- In the Upload Data Folder dialog, in Upload Options, optionally select Overwrite existing file.
- Drag and drop one or more files or parent folders to the Upload Data Folder dialog.
The Upload Data Folder option and dialog enable you to upload both data files and folders. In case of uploading a duplicate files or folders, a warning message is displayed and you are prompted whether to cancel or overwrite existing files or folders.
Create a physical schema with the Schema Wizard
Here are the steps to create a Data Files physical schema with the Schema Wizard:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Schema Wizard
- In (1) Choose a Source, specify the following:
- For Enter a name, enter the physical schema name.
- For Select a Datasource, select LocalFiles.
- Optionally create a description.
- In the Schema Wizard footer, select Next.
- In (2) Manage Tables, in the Data panel, navigate the directory tree as necessary to select your folder, file, or if an
.xlsx
file, select a worksheet. - In the Schema Wizard footer, select Next.
- In (3) Finalize, in the Schema Wizard footer, select Create Schema.
Create a physical schema with the Schema Designer
Here are the steps to create a Data Files physical schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Create Schema.
- In Name, specify the physical schema name, and select Save.
- In Tables tab, select +.
- In the Table Data Source dialog, specify the Type as File System and Data Source as LocalFiles.
- Specify various properties table data source properties.
- Select Add.
- In the Table Editor, in the Table section, enter the table name.
- To save your changes, select Done in the Action bar.
Data Files table data source properties
You can specify a single file or folder in the Data Source dialog. Both the Schema Designer and Table Editor represent a single file and folder data source as a single-source table. In order to select a folder, you must enable Union Files.
This release has limited support for Union Files for Excel (.xlsx
) Workbook files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties.
Common properties for a local data file and a local data folder
Here are some of the common properties for both the selection a file and a folder:
Property | Control | Description |
---|---|---|
Type | drop down list | Select type as File System |
Data Source | drop down list | Select data source as LocalFiles |
File Type | drop down list | Select the Text (.csv , .tsv , .tab , .txt ), Excel (.xlsx ), or Keyhole Markup Language (.kml ) |
Has Header? | toggle | Select if first row contains column header values |
Callback | toggle | Enables the Callback URL field |
Callback URL | text box | This property appears when the Callback toggle is enabled. Specify the URL. |
Common local data file properties
Here are some of the common properties specifically related to selecting a file of either type Text (.csv
, .tsv
, .tab
, .txt
) or Excel (.xlsx
):
Property | Control | Description |
---|---|---|
Incremental | toggle | Enable to support incremental loading. For a single file, you must specify both a File and Update file. |
File | button | elect a file to open the Add File from Local dialog. The dialog shows the files from your local data files and local data folders in shared storage. Select a single file and select Add. |
Update File | button | With Incremental enabled, Update File is available. Select a file to open the Add File from Local dialog. The dialog shows the files from your local data files and local data folders in shared storage. Select a single file and select Add. |
With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.
Properties for an Excel Workbook file
Here are the specific properties for an Excel Workbook (.xlsx
) file:
Property | Control | Description |
---|---|---|
Worksheet | drop down list | Select a given worksheet for the Excel Workbook |
Properties for a Text file
Here are the properties specific to a Text (.csv
, .tsv
, .tab
, .txt
) file:
Property | Control | Description |
---|---|---|
Date Format | drop down list | Select a specific format for date columns. Date formats are Java date format conventions. With Automatic , Incorta will determine the format by sampling the first few rows. |
Timestamp Format | drop down list | Select a specific format for timestamp columns. Timestamp formats are Java data and time format conventions. With Automatic , Incorta will determine the format by sampling the first few rows. |
Character Set | drop down list | Select a supported character set. |
Separator | drop down list | Available when the selected File Type is Text. Specify a separator for columns in the row values. Comma and Tab are standard delimiters. Other requires that you specify a value such as : . |
Other | text box | Available when the Separator is Other . Enter one or more characters to specify the column separator or delimiter between values in a row. |
Enable Chunking | toggle | Enable for large file sizes |
Chunk Size (MB) | text box | Enter a value in megabytes (MB) to specify the chunk size |
Enable Spark Based Extraction | toggle | Configure Apache Spark to parallelize the ingest of the file |
Max Number of Parallel File Extractors | text box | Enter the a value for the number of Extractors which typically reflects up to the number of available cores. |
Memory Per Extractor | text box | Enter a value for memory in Gigabytes. This is typically the amount of dedicated memory divided by the number of available cores. |
Common folder properties
Folder properties are available when you enable Union Files. It is not possible to select a parent folder.
Here are the properties specifically related to selecting a folder:
Property | Control | Description |
---|---|---|
Incremental | toggle | Enable to support incremental loading |
Union Files | toggle | Enable to select all files within a given folder. When enabled, you will only be able to select a folder from LocalFiles. |
Directory | button | Select a folder from your LocalFiles. It is not possible to select a parent folder. |
Include | text box | Enter a keyword with a wildcard * symbol to include specific named files within the folder |
Exclude | text box | Enter a keyword with a wildcard * symbol to exclude specific named files within the folder |
Include Sub-Directories Files | toggle | Enable to include files from sub-folders |
Add Filename as a column | toggle | Enable to add the filename of the file as a column. You will then need to specify a column name. |
Filename column | text box | Enter a column name for the filename such as source_file_name |
With Incremental enabled, if there is not a Key column defined, new rows will be appended and no existing rows will be updated.
Folder properties for Excel Workbook files
This release has limited support for Union Files for Excel Workbook (.xlsx
) files. The Loader Service only loads Worksheets with the same name as defined in the table data source properties. For this reason, each Excel Workbook file in the selected folder must have a common Worksheet tab name. You must select this common Worksheet name in the drop down list.
Here are the properties specifically related to selecting a folder with a file type as Excel Workbook (.xlsx
) files:
Property | Control | Description |
---|---|---|
Worksheet | drop down list | Select a tab for a worksheet |
View the schema diagram with the Schema Diagram Viewer
Here are the steps to view the schema diagram using the Schema Diagram Viewer:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Schema.
- In the list of physical schemas, select the Data Files physical schema.
- In the Schema Designer, in the Action bar, select Diagram.
Load the physical schema
Here are the steps to perform a Full Load of the Data Files physical schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform.
- In the Navigation bar, select Schema.
- In the list of physical schemas, select the Data Files physical schema.
- In the Schema Designer, in the Action bar, select Load → Load Now → Full.
- To review the load status, in Last Load Status, select the date.
Explore the physical schema
With the full load of the Data Files physical schema completed, you can use the Analyzer to explore the physical schema, create your first insight, and save the insight to a new dashboard.
To open the Analyzer from the physical schema, follow these steps:
- In the Navigation bar, select Schema.
- In the Schema Manager, in the List view, select the Data Files physical schema.
- In the Schema Designer, in the Action bar, select Explore Data.