Concepts → Incorta Spark SQL View

In Incorta 2024.1(Cloud) / 2024.1.3(On-Prem), Incorta released the Advanced SQL Interface. The Advanced SQL Interface is a generational improvement to the standard SQL interface due to its incomplete adherence to SQL standards. Now, with the Advanced SQL Interface, Incorta achieves full Spark SQL compliance, which in turn improves compatibility with external tools and performance by leveraging the latest advancements in the Spark Framework.

In the 2024.7 release, Incorta continued to build on the success of the Advanced SQL interface by extending accessibility beyond external tools and into the experience of building views within Incorta.

Note

You may hear the terms SQLX Views and Spark SQL views. These terms are synonymous and labeled as Spark SQL Views in the platform.

Pre-requisites

CloudOn-Prem
Release Number2024.1.x+2024.1.3
Configure Spark Metastore
Advanced SQL Interface Enabled
Restart Service After Enabling

When creating a business view in Spark SQL, it’s intended use is to aggregate and analyze data. As a result, the query written should result in a dataset with fewer records in input. It is recommended that for any select * views that contain no aggregation to be facilitated through the advantages of the drag and drop view building experience.

By default, when the Advanced SQL interface is launched, Spark SQL views inherit a default setting to limit view results to one million (1,000,000) rows. This limit is in place to prevent non-aggregated questions that can degrade system performance. Cloud customers can contact Support to change the setting. For On-Premise customers, the property ​​sqlxviews.max.rows can be added to the node.properties or service.properties. If the property is changed, please ensure the Analytics service is restarted.

Spark SQL View as a Premium Feature

Starting 2024.7.2, Spark SQL views are available in Premium clusters only and won’t be available by default with the Advanced SQLi. After upgrading to 2024.7.2, you must enable Incorta Premium and turn on the Spark SQL View toggle in the Server Configurations > Incorta Labs to be able to create new, explore, or edit Spark SQL views.

Benefits of the SparkSQL View

Perform Complex Data Transformations

Easily execute advanced data transformations, such as moving averages, rankings, and cohort analysis, using Spark SQL Views. Only the necessary columns from business views are used, optimizing performance and simplifying data handling.

Apply Filters Using Variables

Dynamically filter data with session, presentation, or global variables in Spark SQL Views. The presentation variables adjust in real time based on the dashboard’s current context for more accurate insights.

Filter Based on Source Data

Enable dashboards to filter data using the source columns from Spark SQL Views, including verified business views. This ensures a smooth experience when working with a mix of insights from Spark SQL Views and traditional data sources.

Visualize Your Results

View transformed data directly within Incorta dashboards for further analysis and reporting.

Creating an Incorta Spark SQL View

Once the services have restarted, a user with ownership to edit a business schema will now have the ability to add a new view type of Spark SQL.

Go to a Business Schema Edit New > Select Create via Spark SQL.

When creating a business view in Spark SQL, its intended use is to aggregate and analyze data. As a result, the query written should result in a dataset with fewer records in input. It is recommended that any select * from views that contain no aggregation be facilitated through the advantages of the drag-and-drop view-building experience.

Note: The SQL View has been renamed to PostgreSQL view inside the Business schema and in the CMC to clearly delineate the view types.

Supported SQL

The Supported SQL syntax is Spark SQL (version 3.4.1).

https://spark.apache.org/docs/latest/sql-ref.html

When writing in the Spark SQL View editor, it is mandatory to provide an alias for any columns involved in operations such as SUM(), AVG(), etc. Failing to use an alias will result in query execution failure like the below error code:

Error code INX_17010101: Cannot execute the querty without explicitly assigning an alias for expression {list}

Example of code that will fail

WITH
q1 AS (SELECT * FROM Online_Store.OrderDetails)
SELECT StateProvinceName state, SUM(LineTotal)
FROM q1
WHERE StateProvinceName = "California"
GROUP BY 1

Example of code that will succeed

Note that in the below code SUM(LineTotal) has an alias of sales.

WITH
q1 AS (SELECT * FROM Online_Store.OrderDetails)
SELECT StateProvinceName state, SUM(LineTotal) sales
FROM q1
WHERE StateProvinceName = "California"
GROUP BY 1

Referencable objects

In the FROM clause of a Spark SQL View, you can reference the following physical schema objects:

  • Physical_Schema.Table
  • Physical_Schema.Alias
  • Physical_Schema.Analyzer_table
  • Physical_Schema.Incorta_SQL_Table
  • Physical_Schema.Materialized_View
  • Business_Schema.View that meets the following conditions
    • Is a verified business view
    • The view was created via through drag and drop creation

It is not possible to reference the following objects:

  • Business_Schema.Analyzer_View
  • Business_Schema.Postgres_SQL_View
  • Business_Schema.Spark_SQL_View

An additional benefit of the Spark SQL view is that data in non-optimized tables are referenceable (tables not loaded into memory) and can be queried and populated into a dashboard. However, as mentioned previously, it’s highly recommended to run an aggregated query to limit the records returned as a result set.

Referenceable Incorta Variables

Inside the Spark SQL, you can reference Incorta variables using the following functions:

FunctionDescription
getPresVar(VAR_NAME, DEFAULT_VALUE)Uses the Presentation Variable VAR_NAME if defined, else it uses the DEFAULT_VALUE
getVar(VAR_NAME)Uses the Variable VAR_NAME if defined else, it uses the Session / Global Variable VAR_NAME if defined, else it uses the System Variable VAR_NAME if defined, else it uses the DEFAULT_VALUE.If multiple values are returned, only the first will be selected.

Casting Variable Example

For both getVar and gerPresVar, the default data type for the variable is a string. Use Cast as in the example to return a desired data type. In this example, you’ll see a session variable that represents ‘years’ and needs to be CAST an integer to match SALES.CALENDAR_YEAR.

CALENDAR_YEAR ,CALENDAR_MONTH_NUMBER
, COST_OF_GOODS, AMOUNT_SOLD
FROM
SALES.SALES
Where CALENDAR_YEAR = CAST(getVar('sv_years') as INT)

Insights over Spark SQL Views

When you create insights referencing Incorta Spark SQL Views, the insights are not rendered using the Incorta engine like other insights. Instead, the insights are submitted to the Advanced SQL Interface Service. In turn, the service will write the query results to a temp table that can be leveraged by the Incorta engine. As a result, Incorta can run queries against the temp table as if it were any other result set.

Known Limitations

  • Users require view permission to the physical tables or views referenced in SQLX views. Without view permissions to the underlying entities, users will see an error when viewing insights leveraging Spark SQL views or querying Spark SQL views from 3rd party BI tools.
    Enhanced access in 2024.7.3

    This limitation has been addressed starting 2024.7.3. To query Spark SQL views from BI tools or access insights based on these views, you only need permissions to the business schema containing the Spark SQL view or the dashboard containing the insight, respectively. You no longer require permissions to the underlying entities referenced in the Spark SQL view or the insight.

    This enhancement requires manual synching of the Spark Metastore after upgrading to 2024.7.3, otherwise, these views will not be visible to users, and insights will not render correctly. Go to the CMC > Clusters > <cluster_name> > Tenants > <tenant_name> > More Options (), and then select Sync Spark Metastore.

  • Spark SQL views are not shown in the data lineage viewer
  • Comments at the beginning or end of the code will result in query failure (Resolved in 2024.7.2)
  • Accessing the SQL code view from a tabular insight based on Spark SQL will error
  • Decimal outputs from Spark SQL Views will display as null in insights
  • When creating an insight on a Spark SQL View, base table columns in which the view was derived cannot be used in Distinct or Aggregate filters