Data Applications → Configure Incorta Data Delivery for Google BigQuery
About Incorta Data Delivery for Google BigQuery
Following is the high-level process Incorta Data Delivery process for Google BigQuery:
- Connect Incorta to your data source with a data application, and load pre-existing physical schema fact and dimension tables.
- Load Google BigQuery Cortex CDC Layer from the parquet.
- Visualize data .
Get Started
Following are the steps to get started:
- Review prerequisites.
- Create an Google BigQuery Destination
- Set a Data Destination for a Physical Schema
- Loading data into Incorta and Bigquery
When naming tenants or data destinations it is best practice to use a generic name. For example, "Incorta" rather than "IncortaDev". Doing so can be helpful when and if you need to import/export tenants or perform data destination integrations.
Create a BigQuery Data Destination
When you create an BigQuery Data Destination, you are able to seamlessly ingest physical schema tables from Incorta to your BigQuery instance.
Here are the steps:
- In the Navigation bar, select Data.
- In the Action bar, select + New → Add Data Destination.
- In the Choose a Data Destination dialog, in Schema Destination select Google BigQuery.
- In the New Data Destination dialog, specify the following BigQuery schema destination properties:
Property | Control | Description |
---|---|---|
Name your data destination | text box | Enter your data destination name. |
Project Identifier | text box | Enter the BigQuery project ID. |
Connection Pool | text box | Enter the maximum number of concurrent tables being sent. The default value is 30 . |
Service Account | text box | Incorta service account that you need to grant the following roles in BigQuery: ● BigQuery Data Editor ● BigQuery Job User For more information, refer to the Google BigQuery documentation. |
- To test the connection, select Test Connection.
- To save, select Ok.
For more information on how to manage Data Destinations, refer to Tools → Data Manager.
Set a Data Destination for a Physical Schema
Set a data destination for physical schemas that will push tables to BigQuery. Here are the steps:
In the Navigation bar, select Schema.
In the Action bar, select Settings (gear icon) → Set Data Destination.
In the Set Schema Destination for <SCHEMA_NAME> dialog:
Select a Schema Destination or select + New Schema Data Destination to create a new one.
Optionally, enter Target Schema Name, which is the table name you want to create in the the data destination. The default value is schema name in Incorta.
NoteIf the schema name already exists in the destination, the newly sent schema will overwrite the existing one.
Optionally, enter a value in the Target Table Prefix to add as a prefix to the table names.
Select Done.
A BigQuery icon will appear on the left of the schema name to indicate the data destination type set for the schema.
Loading data
When you load your schema using full or incremental load, Incorta sends the schema to the data destination automatically. Hence, adding a new stage to the load job details viewer: Send to Destination. For more details, refer to the Load Job Details Viewer.
You can schedule load jobs that trigger data ingest into BigQuery to run in a specific order. For more information, refer to Tools → Scheduler.
If you tried loading multiple schemas in one job and one of them has a data destination set, the stage Send to Destination will always appear.
In the Incorta UI, navigate to Scheduler → Load Plans and schedule the load plan for your application.
By default, the load plans are set to incremental, allowing you to load just the latest data changes in your source data. You can schedule the timing of your incremental loads by navigating to Scheduler → Load Plans → selecting your Data application → Load Plan. Click on the Change link to open the Schedule UI, allowing you to schedule the interval for your Incremental loads.
The data is loaded in two steps:
- First, load the full data history
- Second, configure ongoing incremental loads
Load the full data history
The first time a load plan runs, it will execute a full load of each included schema, even if configured to run incrementally. This happens because there is no data in Incorta to append to, so it will pick up everything that is available per the definition of each dataset. This will take care of loading historical data. It is also possible to run Full Loads directly. Note that full loads will remove all data from the tables that are being populated before reloading them fully.
To monitor a load plan, navigate to Schema → Load Plan. You can check the row counts from the various stages of the load - Extracted, Rejected , Loaded, and Sent to Destination.
Incremental loads
After the initial load, Incorta will continue to run the following loads in Incremental mode. Incremental loads handle upserts, i.e. updated records and inserted records, which are pulled into Incorta and then immediately pushed to BigQuery whenever an incremental load executes.
Load failures
If a load from a source table fails in Incorta, incremental data will not write to BigQuery for Cortex to pick up. Incorta can send an alert to a designated email address to notify an administrator of a failure.
Configure the Incorta tenant to be able to send an email if you wish to receive alerts. Then follow these instructions to set up notifications for the Incorta schemas you wish to track.
Make sure to select Notify Upon Failure and populate the Recipients field with an appropriate email address to monitor load jobs.
Incorta will not attempt to reload the data until the next scheduled load in case of a load failure. You can review the logs in Incorta to determine the cause of the issue and take any necessary steps to address it. If further assistance is required, you can work with Incorta Support to resolve the issue. Once the issue is resolved, data will be caught up the next time a scheduled load occurs. Alternatively, you can manually run a load at the schema or individual table level to catch up on the data.
Handling source deletes
When loading data incrementally, Incorta inserts new records and updates existing records that have already been loaded, but by default it does not delete records. Source systems like Oracle EBS, however, may allow users to delete data. This means that if records are deleted in the source, those records may still exist in Incorta. Fortunately, there are options for how to manage source deleted records in Incorta and subsequently BigQuery.
Starting with version 2024.7.0, Incorta has a feature that allows you to purge records from Incorta that have been deleted in the source system. The purge feature is implemented by creating an exclusion set to compare against the table in Incorta that has deletes to track and then running a purge job. The purge job will physically delete records from parquet and memory and then push the corresponding delete statements to BigQuery.
Delete handling setup
For each table that needs delete handling, the first step is to create a corresponding skinny table that contains only the primary key column(s) for the table and that is always loaded in full. It is recommended that these tables be placed into a separate schema whose only purpose is to hold these types of tables.
The second step is to create a Materialized View (MV) that will compare the PK table against the base table to create the exclusion set which represents the records that have been deleted in the source table. The MV can be placed in the schema with the PK table. Here is an example of the MV code you would write to create the table that holds your exclusion set.
SELECT I.INVOICE_IDFROM EBS_AP.AP_INVOICES_ALL IANTI JOIN PK_SCHEMA.AP_INVOICES_ALL_PK PON I.INVOICE_ID = P.INVOICE_ID
Configure delete handling
Next, configure your table to use the exclusion set table that you have created by opening the Advanced Settings tab.
Toggle on the Synchronizing delete operations option and select the exclusion set identifier schema and table.
Define the column mappings between the target table and the exclusion set and save.
Purge deleted records
The final step to delete records from Incorta is to run a purge job. This can be run table by table as needed or can be scheduled at the schema level from a separate Load Plan. The purge job will remove records from Incorta and will send delete statements to BigQuery to sync up the corresponding table in the CDC layer.
Pre-seeded delete handling configuration
Incorta delivers pre-seeded schemas for delete handling with sample PK tables and Exclusion Set MVs that can be enabled and used as models for setting up delete handling with additional tables if needed.
To enable the delivered configuration for the tables used, follow the instructions in the configure delete handling section above. To add delete handling for new tables, follow the instructions above starting with the delete handling setup section and using the provided samples as models.
Limitations
- Encrypted columns are not supported and will be ingested into BigQuery as a string, as it exists in Parquet.