Concepts → Incorta Spark SQL View
In Incorta 2024.1(Cloud) / 2024.1.3(On-Prem), Incorta released the Advanced SQL Interface. The Advanced SQL Interface is a generational improvement to the standard SQL interface due to its incomplete adherence to SQL standards. Now, with the Advanced SQL Interface, Incorta achieves full Spark SQL compliance, which in turn improves compatibility with external tools and performance by leveraging the latest advancements in the Spark Framework.
In the 2024.7 release, Incorta continued to build on the success of the Advanced SQL interface by extending accessibility beyond external tools and into the experience of building views within Incorta.
You may hear the terms SQLX Views and Spark SQL views. These terms are synonymous and labeled as Spark SQL Views in the platform.
Pre-requisites
Cloud | On-Prem | |
---|---|---|
Release Number | 2024.1.x+ | 2024.1.3 |
Configure Spark Metastore | ✔ | |
Advanced SQL Interface Enabled | ✔ | ✔ |
Restart Service After Enabling | ✔ | ✔ |
When creating a business view in Spark SQL, it’s intended use is to aggregate and analyze data. As a result, the query written should result in a dataset with fewer records in input. It is recommended that for any select *
views that contain no aggregation to be facilitated through the advantages of the drag and drop view building experience.
By default, when the Advanced SQL interface is launched, Spark SQL views inherit a default setting to limit view results to one million (1,000,000) rows. This limit is in place to prevent non-aggregated questions that can degrade system performance. Cloud customers can contact Support to change the setting. For On-Premise customers, the property sqlxviews.max.rows can be added to the node.properties or service.properties. If the property is changed, please ensure the Analytics service is restarted.
Starting 2024.7.2, Spark SQL views are available in Premium clusters only and won’t be available by default with the Advanced SQLi. After upgrading to 2024.7.2, you must enable Incorta Premium and turn on the Spark SQL View toggle in the Server Configurations > Incorta Labs to be able to create new, explore, or edit Spark SQL views.
Benefits of the SparkSQL View
Perform Complex Data Transformations
Easily execute advanced data transformations, such as moving averages, rankings, and cohort analysis, using Spark SQL Views. Only the necessary columns from business views are used, optimizing performance and simplifying data handling.
Apply Filters Using Variables
Dynamically filter data with session, presentation, or global variables in Spark SQL Views. The presentation variables adjust in real time based on the dashboard’s current context for more accurate insights.
Filter Based on Source Data
Enable dashboards to filter data using the source columns from Spark SQL Views, including verified business views. This ensures a smooth experience when working with a mix of insights from Spark SQL Views and traditional data sources.
Visualize Your Results
View transformed data directly within Incorta dashboards for further analysis and reporting.
Creating an Incorta Spark SQL View
Once the services have restarted, a user with ownership to edit a business schema will now have the ability to add a new view type of Spark SQL.
Go to a Business Schema → Edit → New > Select Create via Spark SQL.
When creating a business view in Spark SQL, its intended use is to aggregate and analyze data. As a result, the query written should result in a dataset with fewer records in input. It is recommended that any select * from views that contain no aggregation be facilitated through the advantages of the drag-and-drop view-building experience.
Note: The SQL View has been renamed to PostgreSQL view inside the Business schema and in the CMC to clearly delineate the view types.
Supported SQL
The Supported SQL syntax is Spark SQL (version 3.4.1).
https://spark.apache.org/docs/latest/sql-ref.html
When writing in the Spark SQL View editor, it is mandatory to provide an alias for any columns involved in operations such as SUM()
, AVG()
, etc. Failing to use an alias will result in query execution failure like the below error code:
Error code INX_17010101: Cannot execute the querty without explicitly assigning an alias for expression {list}
Example of code that will fail
WITHq1 AS (SELECT * FROM Online_Store.OrderDetails)SELECT StateProvinceName state, SUM(LineTotal)FROM q1WHERE StateProvinceName = "California"GROUP BY 1
Example of code that will succeed
Note that in the below code SUM(LineTotal)
has an alias of sales.
WITHq1 AS (SELECT * FROM Online_Store.OrderDetails)SELECT StateProvinceName state, SUM(LineTotal) salesFROM q1WHERE StateProvinceName = "California"GROUP BY 1
Referencable objects
In the FROM clause of a Spark SQL View, you can reference the following physical schema objects:
- Physical_Schema.Table
- Physical_Schema.Alias
- Physical_Schema.Analyzer_table
- Physical_Schema.Incorta_SQL_Table
- Physical_Schema.Materialized_View
- Business_Schema.View that meets the following conditions
- Is a verified business view
- The view was created via through drag and drop creation
It is not possible to reference the following objects:
- Business_Schema.Analyzer_View
- Business_Schema.Postgres_SQL_View
- Business_Schema.Spark_SQL_View
An additional benefit of the Spark SQL view is that data in non-optimized tables are referenceable (tables not loaded into memory) and can be queried and populated into a dashboard. However, as mentioned previously, it’s highly recommended to run an aggregated query to limit the records returned as a result set.
Referenceable Incorta Variables
Inside the Spark SQL, you can reference Incorta variables using the following functions:
Function | Description |
---|---|
getPresVar(VAR_NAME, DEFAULT_VALUE) | Uses the Presentation Variable VAR_NAME if defined, else it uses the DEFAULT_VALUE |
getVar(VAR_NAME) | Uses the Variable VAR_NAME if defined else, it uses the Session / Global Variable VAR_NAME if defined, else it uses the System Variable VAR_NAME if defined, else it uses the DEFAULT_VALUE.If multiple values are returned, only the first will be selected. |
Casting Variable Example
For both getVar
and gerPresVar
, the default data type for the variable is a string. Use Cast as in the example to return a desired data type. In this example, you’ll see a session variable that represents ‘years’ and needs to be CAST an integer to match SALES.CALENDAR_YEAR
.
CALENDAR_YEAR ,CALENDAR_MONTH_NUMBER, COST_OF_GOODS, AMOUNT_SOLDFROMSALES.SALESWhere CALENDAR_YEAR = CAST(getVar('sv_years') as INT)
Insights over Spark SQL Views
When you create insights referencing Incorta Spark SQL Views, the insights are not rendered using the Incorta engine like other insights. Instead, the insights are submitted to the Advanced SQL Interface Service. In turn, the service will write the query results to a temp table that can be leveraged by the Incorta engine. As a result, Incorta can run queries against the temp table as if it were any other result set.
Known Limitations
- Users require view permission to the physical tables or views referenced in SQLX views. Without view permissions to the underlying entities, users will see an error when viewing insights leveraging Spark SQL views or querying Spark SQL views from 3rd party BI tools.Enhanced access in 2024.7.3
This limitation has been addressed starting 2024.7.3. To query Spark SQL views from BI tools or access insights based on these views, you only need permissions to the business schema containing the Spark SQL view or the dashboard containing the insight, respectively. You no longer require permissions to the underlying entities referenced in the Spark SQL view or the insight.
This enhancement requires manual synching of the Spark Metastore after upgrading to 2024.7.3, otherwise, these views will not be visible to users, and insights will not render correctly. Go to the CMC > Clusters > <cluster_name> > Tenants > <tenant_name> > More Options (⋮), and then select Sync Spark Metastore.
- Spark SQL views are not shown in the data lineage viewer
- Comments at the beginning or end of the code will result in query failure (Resolved in 2024.7.2)
- Accessing the SQL code view from a tabular insight based on Spark SQL will error
- Decimal outputs from Spark SQL Views will display as
null
in insights - When creating an insight on a Spark SQL View, base table columns in which the view was derived cannot be used in Distinct or Aggregate filters