Data Applications → Configure Incorta Data Delivery for Azure Synapse

About Incorta Data Delivery for Azure Synapse

Following is the high-level Incorta Data Delivery process for Microsoft Azure Synapse Analytics:

  • Connect Incorta to your data source with a data application, and load pre-existing physical schema fact and dimension tables.
  • Copy the physical schema tables to the Parquet that resides on Azure Data Lake Storage (ADLS) Gen 2.
  • Load Synapse data marts from the parquet.
  • Visualize the data with Power BI.

Get Started

Following are the steps to get started:

Recommendation

When naming tenants or data destinations it is best practice to use a generic name. For example, "Incorta" rather than "IncortaDev". Doing so can be helpful when and if you need to import/export tenants or perform data destination integrations.

All tables with keys will use the Hash distribution option. By default, it uses the first key. All tables are indexed using Clustered columnstore indexes by default. Optional configurations for table level settings are available in the Optional table level settings section.

Configure Your Azure Synapse Analytics SQL Pool

Perform the following configuration steps in your Azure Synapse Analytics SQL Pool before you load data into Synapse from Incorta:

Note

It is assumed that you already have a Synapse workspace and dedicated SQL Pool from the Azure blade.

Create the Incorta login for your SQL pool for the master database

Here are the steps to create an Incorta login for your SQL pool for the master database. You must replace <\PASSWORD> with a secure password.

  • In Synapse Analytics, for your workspace, select NewSQL script.
  • In the SQL editor, in the SQL Script tab, in Connect to, first select your dedicated SQL Pool and then, in Use database, select master.
  • In the SQL editor, run the following CREATE LOGIN command, replacing <\PASSWORD> with a secure password for the new login.
CREATE LOGIN incorta WITH PASSWORD = '<PASSWORD>';
Important

You will need to provide this username and password when you create an Azure Synapse Data Destination.

Create the Incorta user with grants for the SQL pool database

Here are the steps to create an Incorta user with grants for the selected SQL pool database:

  • In Synapse Analytics, in Develop, select +SQL script.
  • In the SQL editor, in the SQL Script tab, in Connect to, first select your dedicated SQL Pool and then, in Use database, select your SQL pool database.
  • In the SQLeditor canvas, run the following SQL commands:
CREATE USER incorta FOR LOGIN incorta;
EXEC sp_addrolemember 'db_ddladmin', 'incorta';
EXEC sp_addrolemember 'db_datawriter', 'incorta';
EXEC sp_addrolemember 'db_datareader', 'incorta';
GRANT ALTER ANY EXTERNAL FILE FORMAT to incorta;
GRANT ADMINISTER DATABASE BULK OPERATIONS to incorta;

Create an Azure Synapse Data Destination

When you create an Azure Synapse Data Destination, you are able to seamlessly ingest physical schema tables from Incorta to your Azure Synapse instance.

Here are the steps:

  • ​​In the Navigation bar, select Data.
  • In the Action bar, select + NewAdd Data Destination.
  • In the Choose a Data Destination dialog, in Schema Destination select Synapse Analytics.
  • In the New Data Destination dialog, specify the following Azure Synapse schema destination properties:
PropertyControlDescription
Name your data destinationtext boxEnter your data destination name.
SQL Pool Endpointtext boxEnter the dedicated SQL endpoint.
Database nametext boxEnter the SQL pool name.
SQL Usernametext boxEnter the username created in the Create the Incorta login for your SQL pool for the master database section above.
SQL Passwordtext boxEnter the password created in the Create the Incorta login for your SQL pool for the master database section above.
Porttext boxEnter the SQL port number. The default port value is 1433.
Connection Pooltext boxEnter the maximum number of concurrent tables being sent. The default value is 30.
Authentication Typedropdown listChoose an authentication type from the following options:
  ●  Managed Identity (default)
  ●  Service Principal
  ●  Storage Account Key
For more information, refer to the Azure Synapse documentation.
Connection Propertiestext boxOptionally enter connector properties for a custom connection to Synapse Analytics in the format: propertyName=propertyValue, where each connector property is on a new line.
For example:
  ●  loginTimeout=30
  ●  ApplicationName = "Incorta"
  • To test the connection, select Test Connection.
  • To save, select Ok.

For more information on how to manage Data Destinations, refer to Tools → Data Manager.

Set a Data Destination for a Physical Schema

Set a data destination for physical schemas that will push tables to Azure Synapse. Here are the steps:

  • ​​In the Navigation bar, select Schema.
  • In the Action bar, select Settings (gear icon) → Set Data Destination.
  • In the Set Schema Destination for <SCHEMA_NAME> dialog:
    1. Select a Schema Destination or select + New Schema Data Destination to create a new one.
    2. Optionally, enter Target Schema Name, which is the schema name you want to create in the data destination. The default value is schema name in Incorta.
      Note

      If the schema name already exists in the destination, the newly sent schema will overwrite the existing one.

    3. Optionally, enter a value in the Target Table Prefix to add as a prefix to the table names.
    4. Enter Default String Length to set the maximum column string length, knowing that the maximum value is 4000.
      Warning

      If you entered a default string length less than what your schema have, the destination will reject the sent schema.

    5. (Optional) Starting 6.0.2, you can select specific table to send to the data destination.
    6. Select Done.

A Synapse Analytics icon will appear to the left of the schema name to indicate a data destination has been set for the schema.

Loading data

When you load your schema using full or incremental load, Incorta sends the schema to the data destination automatically. Hence, adding a new stage to the load job details viewer: Send to Destination. For more details, refer to the Load Job Details Viewer.

You can schedule load jobs that trigger data ingest into Synapse to run in a specific order. For more information, refer to Tools → Scheduler.

If you tried loading multiple schemas in one job and one of them has a data destination set, the stage Send to Destination will always appear.

Optional configurations

Throughout the configuration process there are several optional configurations you can add or modify from the default settings.

Optional table level settings

You can change some settings at the table level, which include the table hash distribution key or indexing type of columns.

If you want to change the key, you can add the following to the synapse_mappings.csv file:

<SCHEMA_NAME>,<TABLE_NAME>,<HASH_COLUMN>,false

If you want to change the indexing type of a table to Heap, you can add the following to the synapse_mappings.csv file:

<SCHEMA_NAME>,<TABLE_NAME>,,true

Limitations

Data Delivery limitations

  • Encrypted columns are not supported and will be ingested into Synapse as a string, as it exists in Parquet.

Synapse limitations

  • Hashing, the methodology used for deduplication and key definition, supports a single key only. Composite keys are not supported.
  • A client error will be thrown if a date or timestamp column with a value less than 1.1.1970 is ingested.