Tools → Versioning Migration Tool

About the Versioning Migration Tool

Starting with release 5.1, there is a new directory structure for shared storage. A load job for a physical schema creates file versions in this new directory structure. There are file versions for Direct Data Mapping files and Apache Parquet files.

After upgrading to release 5.1, you have two options for how to implement this new directory structure for shared storage:

  • run the Versioning Migration Tool immediately after upgrade
  • or, perform a full load for every physical schema in a given tenant

The Versioning Migration Tool is a command-line tool that you run offline after upgrading the cluster metadata database, but before restarting the Analytics and Loader services. You can use the tool to migrate single or multiple tenants, physical schemas, or physical schema tables, Incorta SQL tables, Incorta Analyzer tables, and materialized views. You can run the tool more than once without affecting migrated entities.

Important

The tool can be either a shell script file versioningMigrationTool.sh or a batch file versioningMigrationTool.bat depending on the installation environment: Linux or Windows respectively.

Versioning Migration Tool access rights and context

You use the Versioning Migration Tool after upgrading from an older release of Incorta Direct Data Platform to release 5.1. To run this tool and use it to migrate to the new directory structure, you must use a terminal application to access the Incorta host machine with a System Administrator that has root access to it. You must run it after upgrading the cluster metadata database and before starting the Incorta services..

Note

You can upgrade the cluster metadata database and start the services using the Cluster Management Console (CMC).

Important

If you have one or more tenants hosted on a virtual file system, such as Azure Data Lake Storage (ADLS), Hadoop Distributed File System (HDFS), and Google Cloud Storage (GCS), you must prepare for the Versioning Migration Tool to allow it to migrate the contents of these tenants.

For more information, see Considerations for tenants hosted on virtual file systems.

Versioning Migration Tool Running Modes

You can run the Versioning Migration Tool in either interactive mode or unattended mode.

Interactive mode

In this mode, provide the tool with all the arguments required to migrate the cluster content, such as the Incorta Metadata database connection string, the specific tenants, the specific physical schemas, and, if required, the backup path. You can run the tool in interactive mode either to migrate the cluster content or to create a .properties file that you can use later to run the tool in unattended mode.

To use the Versioning Migration Tool to migrate to the new directory structure while running it in interactive mode, follow these steps:

  • In the terminal of the host for the Incorta Node that runs the Loader Service or Analytics Services, navigate to the installation path as the incorta user:

Linux OS example:

sudo su incorta
cd /home/incorta/IncortaAnalytics/IncortaNode`
  • For Linux operating systems, use the following command to run the tool:
./versioningMigrationTool.sh
  • For Windows operating systems, use the following command to run the tool:
versioningMigrationTool.bat
  • When prompted, enter the required information. See Versioning Migration Tool Parameters for more details.
  • When prompted, press Enter to start the migration process.
  • Wait for the Versioning Migration Tool to complete the backup and migration processes, as applicable.
Important

To use the tool to create the parameters .properties file, after providing all the required information and when prompted, enter S, and then press Enter to save the parameters you provided to a .properties file and exit the tool. The path of the resulting file is /home/IncortaAnalytics/IncortaNode/ and the file name follows this naming convention: migration.<date>-<timestamp>.properties, for example, migration.20210526-152237.properties.

Unattended mode

In this mode, when you run the tool, you need to provide the .properties file that contains the parameters or arguments required to migrate the cluster directory structure. You can create this file using either a text editor or the tool itself while running it in interactive mode. When the tool runs in unattended mode, it reads the required information from the .properties file.

The following is an example of the .properties file:

dbURL=jdbc:mysql://127.0.0.1:3306/incorta_metadata
dbUser=user
dbPassword=1234
tenants=demo
schemas=HR
tablePattern=
backupDate=true
backupPath=
maxThreads=10

For more information about the parameters to include in the .properties file, review Versioning Migration Tool Parameters.

To use the Versioning Migration Tool to migrate to the new directory structure while running it in unattended mode, follow these steps:

  • In the terminal for the Incorta node host, whether the Loader Node or the Analytics Node, navigate to the installation path as the incorta user:
sudo su incorta
cd /home/incorta/IncortaAnalytics/IncortaNode
  • Run the Versioning Migration Tool and provide the path and name of the parameters .properties file.

  • For Linux operating systems, use the following command to run the tool: ./versioningMigrationTool.sh /<File_Path>/<File_Name>.

    Example:

./versioningMigrationTool.sh /home/IncortaAnalytics/IncortaNode/migration.20210526-152237.properties
  • For Windows operating systems, use the following command to run the tool: versioningMigrationTool.bat /<File_Path>/<File_Name>.

Example:

versioningMigrationTool.bat /home/IncortaAnalytics/IncortaNode/migration.20210526-152237.properties
  • Wait till the Versioning Migration Tool completes the backup and migration processes as applicable.

Versioning Migration Tool Parameters

The following table shows the parameters that should be available in the .properties file or that you have to provide when prompted when running the tool in interactive mode:

ParameterDescriptionExample
dbURLEnter the database connection string in a suitable format depending on the database management systemjdbc:mysql://127.0.0.1:3306/incorta_metadata
dbUserEnter the database user
dbPasswordEnter the database user password
tenantsEnter the tenants that you want to migrate their files separated by a space or a comma. Leave blank to migrate all tenants.demo,foundations,casestudy
schemasEnter the schemas that you want to migrate their files separated by a space or a comma. Leave blank to migrate all schemas.SALES,HR
tablePatternEnter the fully qualified name (schemaName.objectName) of the physical schema object that you want to migrate its files. Leave blank to migrate all objects. You can use a regular expression to specify multiple physical schema objects.SALES.Products
.*\.emp.* to include all objects with “emp” in their names in the selected tenants and physical schemas
backupDateEnter true or false to specify if you want the tool to create a backup for the original tenant data (snapshot and parquet directories) before altering the directory structuretrue or false
backupPathEnter the backup directory path. This can be a directory on the host machine or a shared drive on a cloud service that Incorta has access to. Leave blank to create the backup file for each tenant under its directory.
maxThreadsEnter the maximum number of worker threads that the tool can use during the migration process. Leave blank so that the tool automatically calculates the value.
Important

During the migration process, the tool must have access to the Incorta metadata database. The tool queries the database to determine the tenants, related physical schemas, and related entity objects. It also inserts rows into two new tables required for file versioning: FILES_VERSIONS and VERSION_LOCK. The Analytics and Loader Services read from these tables to determine which files to load into memory.

Note

If you provide a value for the maxThreads property that exceeds the available number of worker threads on the host machine, the tool will use all the available worker threads.

Additional Considerations

Disk space and backup considerations

Before using the Versioning Migration Tool, you must ensure that there is adequate disk space in shared storage.

If you create a backup prior to the migration, you need to account for the backup size as part of your disk space calculations. The default backup directory is the tenant directory. You can specify a different directory .

Important

The backup file will contain the "parquet" and "snapshots" directories. The tool backs up the files for all physical schemas and tables in these directories, not only the ones selected for migration.

To determine the size of the existing files in shared storage, Parquet files, for example, run the following Linux bash shell commands:

cd ~/IncortaAnalytics/Tenants/
du -sh */parquet | sort -hr
Note

You can create a backup of each tenant’s data yourself instead of instructing the tool to create this backup. You can restore the old directory structure, if needed, and run the Versioning Migration Tool to start over the migration process.

Warning

Depending on the size of the shared storage files, the backup process may take a significant amount of time

Considerations for tenants hosted on virtual file systems

If you have one or more tenants hosted on a virtual file system (VFS), such as ADLS, HDFS, or GCS, you must prepare for the Versioning Migration Tool to allow it to migrate the contents of these tenants.

To prepare for the migration of tenants hosted on a virtual file system, follow these steps that are applicable to all supported virtual file systems:

  • Copy the core-site.xml file to the following locations:
    • <installation_path>/cmc/lib/
    • <installation_path>cmc/tmt
    • <installation_path>/IncortaNode/runtime/lib/
    • <installation_path>/IncortaNode/runtime/webapps/incorta/WEB-INF/lib/
    • <installation_path>/IncortaNode/hadoop/etc/hadoop/
Note

You can use the cp command to perform this task.

Example: cp /incorta/core-site.xml /home/incorta/IncortaAnalytics/cmc/lib/

  • In the case of ADLS only, set the environment variables in ~/.bash_profile or ~/.bashrc as follows:
    • export AZURE_CLIENT_ID=<your_Azure_Client_ID>
    • export AZURE_CLIENT_SECRET_KEY=<your_Azure_Client_Secret_Key>
    • export AZURE_TENANT_ID=<your_Azure_Tenant_ID>
  • For all supported virtual file systems, inject the core-site.xml file in incorta.engine.tools.jar:
    • Navigate to the following directory: <installation_path>/IncortaNode/runtime/webapps/incorta/WEB-INF/lib. The default installation path is /home/incorta/IncortaAnalytics
    • Run the following command: jar uf incorta.engine.tools.jar core-site.xml

After a successful migration

When the Versioning Migration Tool completes the migration process, the migration result summary appears showing migrated, failed, and skipped entities. You can run the tool more than once to migrate failed entities.

Old structure content

After completing the migration successfully and starting the services, it is recommended that you delete unneeded directories and files including the following:

  • The tenant backup files
  • The snapshot directory
  • The parquet directory
  • The loadTime.log file

The Versioning Migration Tool log files

By default, the Versioning Migration Tool creates log files in the <installation_path>/IncortaNode and <installation_path>/IncortaNode/migration directories. You can use the log files to determine the results of the migration process.

The tool logging configurations are available in the migration-logging.properties file that exists in the IncortaNode directory. You can change the default configurations by editing this .properties file.

While log files created under the IncortaNode directory start with versioningMigrationTool as a prefix followed by a timestamp.

Log files created under the migration directory start with the prefix specified in the migration-logging.properties file and followed by a timestamp also. The default prefix is incorta-migration.

Considerations for migrating between different environments

When migrating shared storage files from one Incorta cluster to another, for example, from User Acceptance Testing (UAT) to Production, you must first copy the parquet (source) folder and then perform a load from staging. Both environments must run an Incorta release that supports file versioning and the copied files should not have records in the FILES_VERSIONS or VERSION_LOCK metadata database tables.

Warning

Only copying the ddm and source folders from shared storage between the different environments will not have the same result as copying the source folder and then loading data from staging.

The default maxThread value calculation

The Versioning Migration Tool calculates the default maxThread value based on the following equation:

CPU cores utilization * machine available processors)/100

The CPU cores utilization initial value is the engine.cpu_cores_utilization value that exists in the <installation_path>/IncortaNode/services/<service_directory>/incorta/engine.properties file. However, the tool can use another derived value as follows:

  • If the tool does not find the 'engine.cpu_cores_utilization' value, the default is 50.
  • If the value is less than 10, the tool will consider it 10.
  • If the value is greater than 100, the tool will consider it 100.
Note

If the result of the default maxThread calculation is less than 1, the default value will be 1.