References → Advanced SQLi and Null Handling
Overview
When first introduced, the Null Handling feature was coupled with the Advanced SQL Interface (Advanced SQLi). Enabling Advanced SQLi automatically enabled Null Handling. However, starting with 2024.7.3, Null Handling has been decoupled from Advanced SQLi, allowing both features to be enabled or disabled independently.
While you can now control Null Handling separately, disabling it when using Advanced SQLi may lead to data inconsistencies. To ensure consistent query results across tools, including third-party BI tools and Incorta tools such as insights, Incorta Copilot (Nexus), and Business Notebooks, it is strongly recommended to enable Null Handling when leveraging Advanced SQLi.
The Advanced SQLi selects the best execution path for queries by utilizing either the Engine or Spark for performance optimization. While Spark naturally respects null values, Incorta Engine requires the Null Handling option to be enabled to correctly treat nulls as true nulls, rather than zeros or empty values.
Inconsistency example
The following example illustrates potential inconsistencies when Advanced SQLi is used while the Null Handling option is disabled in the CMC.
Raw data example
Consider the following raw data in the ProductSalesSchema.ProductData
table:
ProductID | ProductName | Category | Price |
---|---|---|---|
100 | Webcam | Accessories | null |
101 | Monitor | Accessories | 100 |
102 | Mouse | Accessories | 50 |
103 | Tablet | Electronics | null |
104 | Laptop | Electronics | null |
Spark SQL view query
The following is the query of a Spark SQL view that calculates the average and percentile of the Price
grouped by Category
:
SELECT Category, avg(Price) as avgPrice, percentile(Price , 0.3) as percentilePriceFROM ProductSalesSchema.ProductDatagroup by CategoryORDER BY Category;
Execution in the Query Editor
Since this query includes the percentile
function, Advanced SQLi directs it to Spark. When executed in the Query Editor, Spark handles null values correctly, producing the following result:
Category | avgPrice | percentilePrice |
---|---|---|
Accessories | 75.0 | 65.0 |
Electronics | null | null |
Exploring the entire view in the Analyzer
Exploring the entire view in the Analyzer and creating a listing table containing all columns also sends the query to Spark, resulting in the same output:
Category | avgPrice | percentilePrice |
---|---|---|
Accessories | 75.00 | 65.00 |
Electronics | null | null |
Inconsistent behavior without Null Handling
If you remove the percentilePrice
column and only explore avgPrice
, the query is routed to the Incorta Engine. When the Null Handling option is disabled, the Engine treats null values in the Price
column as zeros, resulting in the following incorrect output:
Category | avgPrice |
---|---|
Accessories | 50.00 |
Electronics | 0.00 |
Consistent behavior with Null Handling enabled
Enabling the Null Handling option ensures the Engine treats null values as true nulls, resulting in consistent output with Spark:
Category | avgPrice |
---|---|
Accessories | 75.00 |
Electronics | null |