Performance Tuning Guide
Guide: Performance Tuning for Incorta
Incorta is all about speed: speed of design, speed of build, speed to insight.
The right Incorta configuration and design principles will help you optimize performance.
This guide provides the information you need to tune Incorta for best front-end and back-end performance as well as providing information about the tools that monitor Incorta and help you identify areas for improvement.
The guide has four sections each of which covers an area that impacts the performance of Incorta. The Analytics Tuning section focuses on the levers that enhance speed for end user experience. The Loader Tuning section focuses on bringing data into Incorta as efficiently as possible. Incorta's integration with Spark enables a number of important functionalities that can have an impact both on preparing and rendering data. The Spark Tuning section of this document focuses on optimizing your Spark setup. Finally, there are a number of tools that are useful for identifying areas on which to focus your tuning effort and is covered in the Tools section of the document.
Hardware Sizing
The first step for tuning your Incorta instance is to size your hardware appropriately. This exercise will ensure that your Incorta instance is neither bogged down because of too little hardware/memory nor that you are spending too much on your set up given your usage pattern. Incorta provides a Hardware Sizing Guide for your reference and you can always work with Professional Services to help you size appropriately.
Linux Settings
The only supported platform that Incorta runs on is Linux. Linux has many knobs and dials and there are a few settings that are impactful with regards to the performance of Incorta.
Swappiness
Swappiness is the kernel parameter that defines how much (and how often) the Linux kernel will copy RAM contents to swap, which is on disk and is therefore slower than RAM. This parameter’s default value is 60 and it can take anything from 0 to 100. The higher the value of the swappiness parameter, the more aggressively the kernel will swap. For Incorta, we want to minimize the amount of swapping and use RAM as much as possible so the recommendation is to set swappiness to 10.
Summary of Recommendations
Recommendation | Notes |
---|---|
Set to a low value (10) to avoid swapping as much as possible | To check the current value of swappiness, run the command: $ cat /proc/sys/vm/swappiness |
To change the value of swappiness, run the command (with Sudo User or with root): sudo sysctl vm.swappiness=10
To make the change persistent across reboots, append the following line to the /etc/sysctl.conf file: vm.swappiness=10
Ulimit
The ulimit command sets or reports user process resource limits. It provides control over the resources available to the shell and to processes started by it. There are 2 types of resource limitation: “hard” and “soft”. A hard resource limits defines the physical limit that a user can reach. A “soft” resource limit is manageable by the user.
You can check the ulimits values by running this command: $ ulimit -a*
From an Incorta perspective, we are interested in two parameters
- nproc - max number of processes
- nofile - max number of open files
To alter these settings, use a Sudo user or root. Edit the file /etc/security/limits.conf and add the desired limits for the user owning incorta.
incorta soft nproc 10000incorta hard nproc 20000incorta soft nofile 3000incorta hard nofile 5000incorta soft nproc 10000incorta hard nproc 20000incorta soft nofile 3000incorta hard nofile 5000
For the max number of processes (nproc), you will need to edit another file as well: /etc/security/limits.d/20-nproc.conf. Comment out the soft value for nproc as it overrides the values in /etc/security/limits.conf file.
#* soft nproc 4096root soft nproc unlimited
Once you have completed your edits, restart the services.
Summary of Recommendations
Setting | Recommendation | Command | Notes |
---|---|---|---|
nproc (max user processes) | Set between and 10000 and 20000 | ulimit -a max user nproc 20000 | This setting controls the number of processes that an individual linux user can run. The Incorta user needs the flexibility to run many processes at once so setting the max user processes high will prevent Incorta from being bottlenecked because of a limit on processes. |
nofile (open files) | Set between 5000 and 10000 | ulimit -a open files 10000 | This setting controls the number of files that an individual linux user can open at once. The Incorta user needs the flexibility to open many files at once so setting the open files high will prevent Incorta from being bottlenecked because of a limit on the number of files it is allowed to open. |
Single Node
A node in Incorta parlance refers to any server on which Incorta is installed, whether physical or virtual. The simplest installation of Incorta uses a single node to host all of the components that Incorta requires and is appropriate when the amount of data and processing fits onto a single server. Often, even when multiple nodes are required for Production, it is possible to use a single node Incorta instance with reduced data for development purposes.
Multi-Node
Starting with version 4.3, Incorta added important scaling features. It is now possible to add additional Incorta nodes if you need to increase capacity and reduce contention for resources that might be slowing performance either on the front end or on the back end. Once a node is added, it can be provisioned with Incorta services which are managed in the Cluster Management Console (CMC).
Summary of Recommendations
Recommendation | Notes |
---|---|
Add Loader Service | Add a Loader Service if the time required to load data is slowing down because you are loading more of it or you need a dedicated loader service for certain schemas. |
Add Analytics Service | Add an Analytics Service if the number or size of queries, either from the Incorta UI or via SQLi has increased to the point of slowing response time significantly. |
Add Shared Disk | Shared disk makes the data accessible to across services and the Spark cluster. Add shared disk if you need more room for your parquet files. Note that the shared disk should have good throughput or else it may become a bottleneck. |
Use or add to Spark cluster | Incorta provides a Spark utility as part of the install but if you have very complex Spark use cases or work with very large data sets, you should use a standalone Spark cluster instead which you can size appropriately to your need. See the Apache Spark website for more information on Spark installation. |
Front-End Tuning
Front-end tuning means tuning for Incorta end user experience. End users want to see their requests render as quickly as possible and Incorta is capable of great rendering performance. There are, however, a number of factors that can affect how long reports and dashboards take to render.
Formulas
Incorta provides the flexibility to define formula columns when building insights, business schemas and physical schemas. Formula columns defined directly on insights or in business schemas are calculated at the time that an insight is requested so the time required to calculate formula results is added to the time that the insight takes to render for the end user. Formulas calculated at the physical schema level are calculated when data is loaded so they do not add additional time when rendering on an insight. Because of this behavior, rendering will be faster if formulas can be pushed to the physical schema level.
The work required to calculate a formula has to happen somewhere so moving formula definitions between the business schema or report level and the physical schema will save rendering time but will add to loading time. There is a tradeoff. If loading time is what needs to be minimized, it may help to move the formulas up the stack even though they will cause insights to take more time to render.
Be aware that some formulas should not be moved to the physical schema level because they will not calculate properly unless calculated on the fly. Specifically, formulas that use group functions like sum or avg should not be moved to the physical schema.
Materialized View Formulas
It is possible to leverage Spark to do the work to calculate formulas by including the formulas in the definition of an MV instead of being added to the MV definition as Incorta Formula columns.
Finally, if you are tuning insights and looking for formulas to move to the physical schema, focus on formulas in dimension columns first. These formulas are the most expensive and moving these calculations into the physical schema, if possible, will have the biggest impact on rendering times.
Summary of Recommendations
Recommendation | Notes |
---|---|
Move formulas into Physical schema | Move calculation to the loader |
Remove formulas from Dimension fields first | Formulas in Dimensions have the biggest impact on rendering time |
Force Reload Columns
Force Reload Columns is a tenant level setting and comes into play when an Analytics Service is restarted. This is how it behaves:
Disabled
Full Load
All columns and joins evicted from memory. Columns will load to memory on demand when a query that uses the column for the first time is generated by a user clicking on a dashboard that uses the column.
Incremental Load
Normal Data Columns: No data evicted from memory. New incremental data loaded to memory on demand when first requested.
Joins and Formula Columns: Evicted. Will reload on demand when first requested.
Enabled
Full Load
All columns and joins evicted from memory and then reloaded in full to memory as part of the load process.
Incremental Load
Normal Data Columns No data evicted. New incremental data loaded to memory as part of the load process.
Joins and Formula Columns Evicted. Will reload on as part of the load process. This setting will cause the incremental loads to take a longer time. We recommend against enabling this setting as the load times can extend too long which can hold up users from being able to see their data. The only use case where it might make sense to use it is when tenant data is only loaded once per day and during off hours when the extended load time does not impact end users.
Summary of Recommendations
Recommendation | Notes |
---|---|
Disable | The cost to load time is high. The scenario where this could have value is for a schema where all objects are loaded infrequently (e.g. once per day), during off hours for users and the processing even with this feature enabled completes well before users log onto Incorta |
Warmup Mode
Warmup mode is a tenant level setting and only comes into play after an analytics server restart. It has no effect after a data load.
- None: This is the default. If this option is selected, then no data will be preloaded into memory which means that the first time that data is accessed by clicking on a dashboard, the user will wait for the columns not yet loaded to memory to load.
- Business View Columns: If this option is selected, then columns defined in your business schema will be loaded into memory at server restart.
- Last Used Columns: If this option is selected, then the columns that were loaded into memory as of the time that the server was brought down will be loaded into memory at server restart.
- Most Used Columns: This option is only available starting with version 4.6. If this option is selected, then the columns used in the most frequently requested dashboards will be loaded into memory at server restart.
- All: This option is the equivalent of the, as of version 4.5, deprecated Eager Mode feature. If this option is selected all columns and joins will be loaded into memory at server restart.
Restart time will vary based on the Warmup Mode selected. None should be fastest and All should be slowest. Business View Columns and Last Used Columns should be somewhere in between with each tenant's unique definitions determining which is faster.
Summary of Recommendations
Recommendation | Notes |
---|---|
Last Used Columns | This setting brings the columns stored in memory back to their pre-server restart state |
# of Insights on Dashboards
Starting with version 4.6, Incorta will come with a version of Tomcat that supports HTTP/2. With HTTP/2, the Max Number of default simultaneous persistent connections per server/proxy is virtually unlimited which means that the number of insights that can process simultaneously on a dashboard is in theory unlimited. For design reasons alone, it does not make sense to put too many insights onto a single dashboard. It is almost always better to provide a summary overview of a confined subject area on a single dashboard and then to provide drill through to the details that someone needs to answer their questions as opposed to trying to have every flavor of information about a topic area all on the same screen.
With version 4.5.x and below, Incorta comes with earlier versions of Tomcat that only support HTTP/1. With HTTP/1, the number of insights that can process simultaneously is determined by the browser used with Incorta. This table provides the number of persistent connections per Incorta supported browser:
# of Persistent Connections Supported (< v.4.6)
Browser | Persistent Connections |
---|---|
Chrome | 6 |
Safari | 6 |
Firefox | 6 |
Given that the number of insights that can process in parallel is limited with HTTP/1, we recommend that you limit the number of insights per dashboard for versions of Incorta lower than 4.6. If your insights are very fast, then a few more than six may be OK. To maximize rendering performance, limit insights on dashboards to no more than six.
Summary of Recommendations
Recommendation | Notes |
---|---|
# of insights | Limit to six or fewer per dashboard for Incorta versions less than 4.6. |
Usability | Always keep in mind the usability of the dashboard as you design it. Sometimes less (few insights) is more even if the rendering time is fast |
Analyzer User Experience
On the Advanced tab of the tenant configuration user interface is the Turn off/on Global Auto Refresh for Insights setting. This setting controls whether, at a tenant level, an insight will automatically refresh when entering the Analyzer screen and after each change that is made to the insight. By default the setting is enabled. Having the report render automatically when changes are made can be quite convenient. If, however, you are working with an insight that takes a long time to render, having the setting enabled can be a frustrating experience as you will find yourself doing more waiting than working. In this case, reset the auto-refresh setting to disabled at the individual insight level. The Turn off/on Global Auto Refresh for Insights controls what the default value for that report level setting will be. If you work with many insights in a tenant that take a long time to render, you probably need to look into tuning them, but you may also want to disable the Turn off/on Global Auto Refresh for Insights setting. This will, by default, require users to manually refresh all insights in the Analyzer which allows them to control when they will wait for the reports to render. Users can always set Auto-refresh to enabled on individual insights as they see fit.
Summary of Recommendations
Recommendation | Notes |
---|---|
Global Auto Refresh for Insights | Disable if users often work with long running insights in the Analyzer |
Analytics and Loader Service Settings
Some performance levers for Incorta are available for both the Analytics Service and the Loader Service. This section covers settings that are applicable to both.
CPU Utilization (%)
This setting in the CMC allows you to set the percentage of CPU on the node allocated to Incorta. More CPU can mean better processing performance but you need to take into consideration which other processes are contending for those resources.
The generally recommended settings for CPU Utilization (%) are:
Install | Service Type | Allocation |
---|---|---|
Single Node | Loader | 25 |
Single Node | Analytics | 25 |
Multi Node | Loader | 50 |
Multi Node | Analytics | 50 |
A single node installation of Incorta will include a number of Incorta services that use CPU in addition to the Loader and Analytics services:
- Spark
- Cluster Management Console (CMC)
- Meta Database
- Zookeeper
- Export Server
In the case of multi node Incorta configurations, if the Incorta service is the only service on the box then you may be able to raise the percentage allocated to the Incorta service as high as 85%. If there are any other processes running whether on a single node Incorta installation or on any given server in a multi node installation, then you need to take all of them into consideration when allocating CPU Utilization (%).
On Heap versus Off Heap
Incorta, starting with v4, supports both on heap and off heap memory management for both the analytics service and the loader service. On heap memory refers to memory managed inside the java process itself, or the java heap space. Off heap memory refers to memory managed outside the java process. As java processes, the analytics service and loader service both have settings in the CMC that control the on heap usage versus the off heap usage.
- On-Heap memory management: Objects are allocated on the JVM heap and bound by GC.
- Off-Heap memory management: Objects are allocated in memory outside the JVM by serialization, managed by the application, and are not bound by GC.
It is a good practice to keep 70-75% of the memory offered to Incorta off heap. This keeps the memory in the java process for the loader and analytics services (i.e. the on heap memory) small which makes java maintenance activities like garbage collection less intrusive. The On Heap Memory (GB) setting on the analytics service screen or the loader service screen in the CMC controls the on heap memory size and this should be 20-25% of the loader and analytics service respectively. The Off Heap Memory (GB) setting should be set to 70-75% of memory size, for the loader and analytics services respectively.
Keep in mind that these settings only control the amount of memory allotted to the Incorta services. For example, you may have a 1TB RAM server and only choose to allocate 150GB to the loader service and 100GB to the analytics service.
Loader Tuning
The loader service handles extracting data from data sources, transforming and storing the data to disk in parquet format and then loading the data and the relevant direct data mapping information into memory.
Schema Pool Size
On the Advanced tab of the tenant configuration user interface is the Schema Pool Size configuration property. The value for this property sets the number of schemas that can be loaded at a time per tenant. A schema is considered in loading state as long as one (or more) of its tables is being extracted, transformed, and loaded. Thus, this field is restricted by the “Table Pool Size”. The default value for this property is “2”, but it can be changed to any value between 1 and 10. Setting this value higher will allow more schemas to process at once which allows for more flexibility in scheduling and may allow you to load schemas sooner or more frequently. Conversely, the more schemas processed in parallel the more likely that it will take longer to process each schema to completion. The optimum setting for this configuration will be dependent on the amount of memory available and the amount of data being processed in the schemas processed by the loader process.
Note that starting with version 4.5, Incorta takes into account join dependencies between schemas in addition to schema pool size when determining whether a schema load will run at its scheduled time. If schema A has tables with joins to tables in schema B which is currently in the process of loading, it will not start loading before schema B has completed loading even if it is scheduled to do so. In fact, if dependencies do not clear before its next scheduled load time, it will skip a scheduled load completely and only load once based on the latest scheduled load before it is available for load.
Starting 6.0, the Schema Pool Size option in the Cluster Management Console (CMC) > Cluster Configurations > Tenant Configurations > Tuning is deprecated and replaced by the Maximum Concurrent Load Jobs option that controls the maximum number of load jobs that the Loader Service can handle at the same time.
Chunking
In the Cluster Management Console (CMC), on the /clusters/"tenant"/Cluster Configurations/Tuning Configurations tab, there is a property called Table Maximum Parallel Chunks. This property specifies the number of concurrent chunks from a data source to process at the same time. It allows you to parallelize the work so that data loading can complete more quickly for large data sets, whether from a database or a flat file.
Limit Data
Incorta can handle massive amounts of data and still perform well but if data is not needed, then it is a good idea to limit the amount of data that is loaded into Incorta which will speed load times. There are various ways to do this:
- Flat File - On initial load exclude data you do not need in the file and only include data that is needed.
- SQL - When defining the table, include a condition to limit the amount of data pulled into Incorta. Alternatively, you can define a Load Filter to limit the amount of data.
Performance Optimized Setting
On a table by table basis, you can elect whether a table should be Performance Optimized by enabling or disabling the setting.
Summary of Recommendations
Setting | Notes |
---|---|
Enabled | Data from the table is loaded into memory. This optimizes the data for reporting. |
Disabled | If the table is not used for reporting, for example it is an intermediary table that is used in MV logic only, then there is no reason to load it into memory. |
Spark Tuning
Spark is an optional component of your Incorta installation that can be used to provide powerful advanced features. Incorta uses Spark for Materialized View (MV) creation (via PySpark and/or Spark SQL) and to query data in Parquet files with the SQLi interface. Both of these features extend what you can do with Incorta and as such, it is important to configure Spark for optimal performance with Incorta when using them.
Spark Server configuration
There are two different options available when setting up Spark to work with Incorta. Which to choose depends on how much Spark will be used which in turn depends on the size of the dataset and the complexity of the use cases being addressed.
Bundled Spark
Incorta bundles Spark with the platform. Bundled Spark runs on the node where Incorta is installed only. It is suitable for straightforward usage of Spark with smaller datasets.
Standalone Spark
It is possible to configure a standalone Spark cluster to be used with Incorta. The cluster can be sized up as needed and can be set up wherever makes the most sense. More than one instance of Incorta can even share the Spark cluster though of course you need to watch out for contention. Note that the Spark working directory should not be set up on NFS/EFS but rather should be on a local directory.
Data Locality
Data locality can have a major impact on the performance of Spark jobs. If data and the code that operates on it are colocated then computation tends to be fast. If code and data are separated, one must move to the other which adds an extra step before computation can begin. Typically, it is faster to ship serialized code from place to place than a chunk of data because code size is much smaller than data. Spark builds its scheduling around this general principle of data locality.
Data locality is how close data is to the code processing it. There are several levels of locality based on the data’s current location. In order from closest (faster) to farthest (slower):
- PROCESS_LOCAL: data is on the same JVM.
- NODE_LOCAL: data is on the same node but not in the same JVM.
- NO_PREF: data has no locality preference and data is accessed equally and quickly from anywhere.
- RACK_LOCAL: data is on a different server on the same rack so needs to be sent over the network switch for the server rack.
- ANY: data is elsewhere on the network and not in the same rack.
When possible, Spark schedules all tasks at the best locality level. In situations where there is no unprocessed data on any idle executor, Spark switches to lower locality levels. Spark will wait until a busy CPU frees up to start a task on data on the same server or will immediately start a new task in a place that requires moving data there.
Summary of Recommendations
Recommendation | Notes |
---|---|
Bundled Spark | Recommended for small datasets |
Standalone Spark | Recommended for large datasets and complex use cases |
Locality | ??? |
Spark configuration
There are three different places that Spark configuration properties can be set in Incorta.
Spark Configuration Files
You can set or adjust the default values for Materialized View Spark configuration properties in the <incorta_home>/spark/conf/spark-defaults.conf configuration file. The default values for this file are:
Settings | Description | Default |
---|---|---|
spark.cores.max | The maximum number of CPU cores requested by the application across the cluster | 8 |
spark.executor.cores | The number of cores to use on each executor | 4 |
spark.sql.shuffle.partitions | The number of partitions to use when shuffling data for joins or aggregations | 4 |
spark.driver.memory | Amount of memory used for the driver process | 4g |
spark.port.maxRetries | Maximum number of retries when binding to a port before giving up | 100 |
There is a second file in the same directory called spark-env.sh. This is where you can specify memory to allocate to Spark workers.
Settings | Description | Default |
---|---|---|
SPARK_WORKER_MEMORY | Used to set how much total memory workers have to give executors (e.g. 1000m, 2g) | 4g |
Materialized View Definition
When defining a Materialized View in Incorta, you can set your Spark configuration properties directly in the MV definition itself. This is the recommended place because it is the place that allows you to fine tune specifically for the MV that is being defined.
MV Spark Properties | Recommendation |
---|---|
spark.cores.max | If not enough cores are assigned to an MV, it may not finish. If this is the case, then the cores max should be increased. The cores max should be a multiple of executor cores (e.g. 1x, 2x, etc...). Note that if you do not set spark.cores.max, the default will be determined by the value of spark.deploy.defaultCores. |
spark.executor.cores | Note that cores max should be a multiple of executor cores (e.g. 1x, 2x, etc...). |
spark.sql.shuffle.partitions | Start with shuffle partitions equal to cores max and then increase by multiples (e.g. 1x, 2x, etc...). The larger the data set, the more shuffle partitions you will need. Note that this value indicates a starting place for Spark. Spark will dynamically allocate more shuffle partitions based on actual data size. |
spark.driver.memory | Amount of memory used for the driver process |
spark.executor.memory | Amount of memory allocated to the Spark executor |
spark.port.maxRetries | Maximum number of retries when binding to a port before giving up |
CMC UI
In the Cluster Management Console (CMC), on the /clusters/"tenant"/Cluster Configurations/Server Configurations/Spark Integration tab, you can configure the default SQLi Spark settings. Note that these only apply when SQLi is used.
Settings | Description | Default |
---|---|---|
Enable SQL App | The SQL App runs within Spark to handle all incoming SQLi queries. Enable this option to start the SQL App, and keep it up and running, to execute incoming SQL queries. Disabling the setting will allow all available Spark memory to be available for Materialized View execution. | Disabled |
SQL App Driver Memory | Allocate memory (in GB) to be used by the SQL interface Spark to construct (not calculate) the final results. | 1 Gb |
Spark App cores | Set the number of dedicated CPU cores for the SQLi Spark App. Ensure that there are enough cores in your setup that are reserved for OS, applications, and other services. | 4 |
Spark App memory | Provide the maximum memory that will be used by SQLi Spark queries, leaving extra memory for MVs if needed. The memory required for both applications combined cannot exceed the Worker Memory. | 2 Gb |
SQL App executors | Maximum number of executors that can be spawned on a single worker. | 2 |
SQL App shuffle partitions | SQL interface Spark shuffle partitions amount. A single shuffle represents a block of data being processed to perform Joins and/or aggregations. The size of a shuffle partition increases as the size of processed data increases. | 8 |
SQL App extra options | Extra Spark options that can be passed to the SQL interface Spark bridge application. These options may be used to override default configurations. Sample value: spark.sql.shuffle.partitions=8;spark.executor.memory=4g;spark.driver.memory=4g. |
Materialized Views versus SQLi
Incorta uses Apache Spark to create Materialized Views and to process SQL queries from third party applications over the PostegreSQL protocol using the Incorta SQL Interface (SQLi).
If you are not using SQL to access Incorta in a way that requires Spark, then set the Enable SQL App configuration setting to disabled in the CMC (/home/clusters/<cluster_name>/Cluster Configurations/Server Configurations/Spark Integration/Enable SQL App) so that Spark memory can be used exclusively by MVs.
Summary of Recommendations
Recommendation | Notes |
---|---|
Turn off Enable SQL App configuration setting | When this setting is off, all Spark memory is available for MV execution |
Troubleshooting Spark Issues
If you run into Spark issues, refer to the Incorta Spark Troubleshooting guide.
Tools
Inspector Tool
The Inspector Tool lists broken references, unused objects and other issues that Incorta can discover about itself so that they can be fixed. Fixing these inefficiencies can possibly provide a boost in performance. The first place to go in the Inspector is the 4-Validation UseCases dashboard in the Inspector folder which lists out the issues the Inspector has found. It provides enough information to guide a user with schema privileges to how to fix the issues. Issue types are prioritized from 1 and up with a priority of 1 being the most important issues to fix first.
Another important function the Inspector Tool provides is the lineage. It can show you everywhere that a particular column or formula is referenced and what makes up its parentage. This is useful once you have identified something, especially a formula, as inefficient. Its lineage may help you find the root cause of the inefficiency and identifying everywhere that it is used will help you apply your update/fix in all the needed places.
The following table provides guidelines for how to use the Inspector Tool:
Dashboard | Function | Performance Notes |
---|---|---|
Summary Dashboard | This dashboard provides high level information about the tenant and where it may have some issues. | It can be used to provide some direction with regards to where you might look for tuning opportunities. |
Schema Details Dashboard | This dashboard provides detailed information about the tenant and where there are opportunities to clean up or enhance it. | It can be used to provide some direction with regards to where you might look for tuning opportunities. |
Lineage Summary Dashboard | This dashboard provides a high level overview of columns (business schema or formula) that have predecessors. | Use the dashboard to get a quick view of data elements or calculations that could possibly be moved closer to the source. |
Lineage Details Dashboard | This dashboard provides information about the predecessors of data elements within Incorta. | It is helpful when tuning to be able to trace the lineage of the columns/formulas that you are working with. |
Lineage - where used Dashboard | This dashboard lists all of the places that a column is used or referenced. | When tuning, identifying where a particular column or formula is used helps you track down all of the places to implement your change. |
Validation UseCases Dashboard | This dashboard identifies errors in the tenant such as broken references, duplicates and invalid formulas. | Use the Error Codes or Errors insights to filter down the dashboard. Focus on fixing high priority issues (1 & 2) first. |
Lineage - where used Dashboard | This dashboard allows you to find where objects are used in Incorta | If you need to track down where a formula or other object is used in Incorta, use this dashboard to track them down. |
Unused Columns Dashboard | This dashboard provides a listing of unused physical schema columns. | Unused physical columns, will take up RAM that could be used otherwise. They will take up space in the tenant directory by inflating parquet files as well. Deleting unused columns from memory will free up space in both areas. |
Unused Schemas Dashboard | This dashboard provides a listing of unused schemas. | Unused schemas, will take up RAM that could be used otherwise. They will take up space in the tenant directory as well. Deleting unused schemas from memory will free up space in both areas. |
Unused Tables Dashboard | This dashboard provides a listing of unused tables. | Unused tables, if optimized will take up RAM that could be used otherwise. Even if unoptimized, they will take up space in the tenant parquet directory. Deleting unused tables from memory will free up RAM and will allow for the deletion of the corresponding parquet files from disk (currently a manual process). |
Unused Business Columns Dashboard | This dashboard provides a listing of unused business schemas columns. | There is not likely to be a large performance impact to removing an unused business view column but it is good hygiene. |
Unused Business Views Dashboard | This dashboard provides a listing of unused business views. | There is not likely to be a large performance impact to removing an unused business schema but it is good hygiene. |
Note that starting with version 4.6, the Inspector is incorporated into the product and is available for use out of the box. Prior to 4.6, the Inspector Tool can be installed separately and scheduled to run on a regular interval so that its attendant dashboards are populated with up-to-date information.
Loader Parser
The Loader Parser provides a breakdown of how the loader spends its time. It provides load and extraction timings for all of the different types of objects that Incorta loads into memory when the loader process runs.
Dashboard | Function | Performance Notes |
---|---|---|
loader_parser_schemas | This dashboard provides schema load time details broken out by load type. | Use this dashboard to identify which schemas take a long time to load. |
loader_parser_tables | This dashboard provides table load time details. | Use this dashboard to identify which tables take a long time to load. |
loader_parser_joins | This dashboard provides join load time details broken out by operation and load type. | Use this dashboard to identify which joins take a long time to load. |
loader_parser_compaction | This dashboard provides compaction timing details broken out by load type. | Use this dashboard to identify which schemas and tables take a long time to compact. |
loader_parser_formulas | This dashboard provides formula load time details. | Use this dashboard to identify which formulas take a long time to load. |
Schedule Dashboards
It is possible to schedule dashboards to run at specified times. This allows you to run a slow rendering dashboard after a data reload so that its reports can be added to cache which should result in faster rendering for users who access the dashboard afterwards. There are a couple of caveats to keep in mind when considering this technique.
- This technique is well suited for dashboards whose data is not refreshed continuously. If the data is refreshed nightly and loading time is very predictable, then scheduling the dashboard to run before users access it in the morning will work very well. For a dashboard whose data is refreshed nearly as often as the time it takes to load the data, scheduling the dashboard execution becomes trickier if load times are inconsistent.
- If a dashboard/report has a data filter based on $currentuser, then the result set for each user who accesses it will be different which means that you would need to schedule the dashboard to run for every user who accesses it, which may be unreasonable if there are many users who access it.
- In the CMC on the Tuning tab under Tenant configuration is a configuration property called Maximum cached insights which specifies the maximum number of insights (not dashboards) to store in the cache. Incorta cannot cache an insight larger than 10MB. If an insight is larger than 10MB, Incorta does not cache the insight, which can affect performance. This property defaults to 2,000 insights. Raise this value if you have more than 2,000 insights defined in your tenant. Incorta uses both the Maximum cached insights property and the Maximum cached memory (%) property, also on the Tuning tab, to determine the cache size. Incorta uses the lower of the two limits. For example, if you set off-heap memory to 100GB, and the Maximum cached memory (%) to 1%, then the cache size is 1GB. If you set Maximum cached insights to 2,000 and you reach 1GB with fewer reports, Incorta stops caching reports at 1GB.
Rotate Log Files
One way to keep your Incorta server disks free from unnecessary clutter is to rotate your logs on a regular basis keeping only a set number of days worth of logs on disk for troubleshooting purposes and archiving logs that are older. This prevents logs from filling up your hard disk which can in turn hamper Incorta from working with and generating the files (e.g. parquet) that it needs to operate properly.
Summary of Recommendations
Recommendation | Notes |
---|---|
Rotate all of your Incorta log files by compressing them and archiving them to a repository off of your Incorta servers. | Preserve 5-10 days worth of logs on Incorta servers and archive anything older. Make sure to have a process by which you can retrieve the logs that have been archived so that they may be used for troubleshooting processes. |