Data Applications → Configure Incorta Data Delivery for Google BigQuery

About Incorta Data Delivery for Google BigQuery

The following is the high-level overview of the Incorta Data Delivery process for Google BigQuery:

  • Connect Incorta to your data source with a data application, and load pre-existing physical schema fact and dimension tables
  • Load Google BigQuery Cortex CDC Layer from the parquet

Get Started

Following are the steps to get started:

Recommendation

When naming tenants or data destinations it is best practice to use a generic name. For example, "Incorta" rather than "IncortaDev". Doing so can be helpful when and if you need to import/export tenants or perform data destination integrations.

Create a BigQuery Data Destination

When you create a BigQuery Data Destination, you are able to seamlessly ingest physical schema tables from Incorta to your BigQuery instance.

Here are the steps:

  • ​​In the Navigation bar, select Data
  • In the Action bar, select + NewAdd Data Destination
  • In the Choose a Data Destination dialog, in Schema Destination select Google BigQuery
  • In the New Data Destination dialog, specify the following BigQuery schema destination properties:
PropertyControlDescription
Name your data destinationtext boxEnter your data destination name.
Project Identifiertext boxEnter the BigQuery project ID.
Connection Pooltext boxEnter the maximum number of concurrent tables being sent. The default value is 30.
Service Accounttext boxIncorta service account that you need to grant the following roles in BigQuery:
  ●  BigQuery Data Editor
  ●  BigQuery Job User
For more information, refer to the Google BigQuery documentation.
  • To test the connection, select Test Connection
  • To save, select Ok

For more information on how to manage Data Destinations, refer to Tools → Data Manager.

Set a Data Destination for a Physical Schema

Set a data destination for physical schemas that will push tables to BigQuery. Here are the steps:

  • ​​In the Navigation bar, select Schema

  • In the Action bar, select Settings (gear icon) → Set Data Destination

  • In the Set Schema Destination for <SCHEMA_NAME> dialog:

    • Select a Schema Destination or select + New Schema Data Destination to create a new one

    • Optionally, enter Target Schema Name, which is the table name you want to create in the the data destination. The default value is schema name in Incorta

      Note

      If the schema name already exists in the destination, the newly sent schema will overwrite the existing one.

    • Optionally, enter a value in the Target Table Prefix to add as a prefix to the table names

    • (Optional) Starting 6.0.2, you can select specific table to send to the data destination

    • Select Done

A BigQuery icon will appear on the left of the schema name to indicate the data destination type set for the schema.

Loading data

When you load your schema using full or incremental load, Incorta sends the schema to the data destination automatically. Hence, adding a new stage to the load job details viewer: Send to Destination. For more details, refer to the Load Job Details Viewer.

You can schedule load jobs that trigger data ingest into BigQuery to run in a specific order. For more information, refer to Tools → Scheduler.

If you tried loading multiple schemas in one job and one of them has a data destination set, the stage Send to Destination will always appear.

In the Incorta UI, navigate to Scheduler Load Plans and schedule the load plan for your application.

By default, the load plans are set to incremental, allowing you to load just the latest data changes in your source data. You can schedule the timing of your incremental loads by navigating to SchedulerLoad Plansselecting your Data applicationLoad Plan. Click on the Change link to open the Schedule UI, allowing you to schedule the interval for your Incremental loads.

The data is loaded in two steps:

  • First, load the full data history
  • Second, configure ongoing incremental loads

Load the full data history

The first time a load plan runs, it will execute a full load of each included schema, even if configured to run incrementally. This happens because there is no data in Incorta to append to, so it will pick up everything that is available per the definition of each dataset. This will take care of loading historical data. It is also possible to run Full Loads directly. Note that full loads will remove all data from the tables that are being populated before reloading them fully.

To monitor a load plan, navigate to SchemaLoad Plan. You can check the row counts from the various stages of the load - Extracted, Rejected , Loaded, and Sent to Destination.

Incremental loads

After the initial load, Incorta will continue to run the following loads in Incremental mode. Incremental loads handle upserts, i.e. updated records and inserted records, which are pulled into Incorta and then immediately pushed to BigQuery whenever an incremental load executes.

Load failures

If a load from a source table fails in Incorta, incremental data will not write to BigQuery for Cortex to pick up. Incorta can send an alert to a designated email address to notify an administrator of a failure.

Configure the Incorta tenant to be able to send an email if you wish to receive alerts. Then follow these instructions to set up notifications for the Incorta schemas you wish to track.

Make sure to select Notify Upon Failure and populate the Recipients field with an appropriate email address to monitor load jobs.

Incorta will not attempt to reload the data until the next scheduled load in case of a load failure. You can review the logs in Incorta to determine the cause of the issue and take any necessary steps to address it. If further assistance is required, you can work with Incorta Support to resolve the issue. Once the issue is resolved, data will be caught up the next time a scheduled load occurs. Alternatively, you can manually run a load at the schema or individual table level to catch up on the data.

Handling source deletes

When loading data incrementally, Incorta inserts new records and updates existing records that have already been loaded, but by default it does not delete records. Source systems like Oracle EBS, however, may allow users to delete data. This means that if records are deleted in the source, those records may still exist in Incorta. Fortunately, there are options for how to manage source deleted records in Incorta and subsequently BigQuery.

Note

Starting with version 2024.7.0, Incorta has a feature that allows you to purge records from Incorta that have been deleted in the source system. The purge feature is implemented by creating an exclusion set to compare against the table in Incorta that has deletes to track and then running a purge job. The purge job will physically delete records from parquet and memory and then push the corresponding delete statements to BigQuery.

Delete handling setup

For each table that needs delete handling, the first step is to create a corresponding skinny table that contains only the primary key column(s) for the table and that is always loaded in full. It is recommended that these tables be placed into a separate schema whose only purpose is to hold these types of tables.

The second step is to create a Materialized View (MV) that will compare the PK table against the base table to create the exclusion set which represents the records that have been deleted in the source table. The MV can be placed in the schema with the PK table. Here is an example of the MV code you would write to create the table that holds your exclusion set.

SELECT I.INVOICE_ID
FROM EBS_AP.AP_INVOICES_ALL I
ANTI JOIN PK_SCHEMA.AP_INVOICES_ALL_PK P
ON I.INVOICE_ID = P.INVOICE_ID

Configure delete handling

Next, configure your table to use the exclusion set table that you have created by opening the Advanced Settings tab.

AdvancedSettings Toggle on the Synchronizing delete operations option and select the exclusion set identifier schema and table.

Syncronize

Define the column mappings between the target table and the exclusion set and save.

Mapping

Purge deleted records

The final step to delete records from Incorta is to run a purge job. This can be run table by table as needed or can be scheduled at the schema level from a separate Load Plan. The purge job will remove records from Incorta and will send delete statements to BigQuery to sync up the corresponding table in the CDC layer.

Purge

Pre-Seeded delete handling configuration

Incorta delivers pre-seeded schemas for delete handling with sample PK tables and Exclusion Set MVs that can be enabled and used as models for setting up delete handling with additional tables if needed.

To enable the delivered configuration for the tables used, follow the instructions in the Configure delete handling section above. To add delete handling for new tables, follow the instructions above starting with the Delete handling setup section and using the provided samples as models.

Limitations

  • Encrypted columns are not supported and will be ingested into BigQuery as a string, as it exists in Parquet