AWS High Availability Incorta Cluster Guide

Guide: Install, Configure, and Deploy a High Availability Incorta Cluster in AWS

This guide describes how to install, configure, and deploy a high availability Incorta Cluster in the Amazon Web Services (AWS) cloud. The Cluster will support an active-active cluster typology. The Cluster will be public facing.

The Cluster typology in this guide includes 5 EC2 hosts for Incorta, 2 EC2 hosts for Apache ZooKeeper, an Elastic File Share (EFS) mount for each Incorta Node host, and Load Balancers that provide access to the Analytics Services through DNS addresses.

The 5 EC2 hosts for Incorta consists of the following applications and services:

  • 1 host for:
    • Incorta Cluster Management Console (CMC)
    • Apache Spark (2.4.3, Incorta Node build)
    • MySQL 5.6
    • Apache ZooKeeper 3.5.6
  • 4 hosts for Incorta Nodes:
    • 1 host for an Incorta Node that runs a Loader Service, LoaderService_1
    • 1 host for an Incorta Node that runs a Loader Service, LoaderService_2
    • 1 host for an Incorta Node that runs an Analytics Service, AnalytcisService_1
    • 1 host for an Incorta Node that runs an Analytics Service, AnalytcisService_2

The 2 EC2 hosts for Apache ZooKeeper consists of the following applications and services:

  • 1 host for a Apache ZooKeeper 3.5.6
  • 1 host for a Apache ZooKeeper 3.5.6

The Incorta specific portion of the procedure begins at Install and Start the Cluster Management Console.

AWS Prerequisites

For the selected AWS Region, the prerequisite configurations include the following:

  • A Virtual Private Cloud (VPC) with a
    • Subnet associated with your VPC and a specific Availability Zone
    • A Route Table with Internet Gateway and Network ACL for the Subnet
  • A Security Group with defined inbound and outbound rules for TCP and HTTP traffic
  • An IAM Role with an attached policy for AmazonElasticFileSystemFullAccess
  • An Elastic File Share for the VPC with the related Subnet and Security Group that results in a File System ID and DNS Name
  • 7 EC2 hosts running in the VPC with the same Subnet, Placement Group, IAM Role, Security Group, and the same Key Pair .pem file
  • 5 of the 7 EC2 hosts with the minimum configuration (m5a.xlarge)
    • Amazon Linux 2 AMI x86
    • 4 vCPUs
    • 16 GiB memory
    • Up to 5 GiB network
    • 30 GiB size storage
  • 2 of the 7 EC2 hosts running with the minimum configuration (t2.large)
    • Amazon Linux 2 AMI x86
    • 2 vCPUs
    • 8 GiB memory
    • Low to moderate network
    • 20 GiB size storage
  • A Classic Load Balancer in the same VPC with the same Availability Zone Subnet that:
    • Is internet facing
    • Supports HTTP on port 8080 with Load Balancer Stickiness on port 8080
    • Specifies the 2 EC2 instances that individually host an Incorta Node which runs the Analytics Service
    • Specifies a Health Check over TCP on port 8080
  • A Network Load Balancer in the same VPC with the same Availability Zone Subnet that:
    • Is internet facing
    • Supports TCP on port 5436 and 5442
    • Specifies a Target Group for the 2 EC2 instances that individually host an Incorta Node which runs the Analytics Service and has a configured Health check for TCP

AWS Host Access

This guide assumes that you can readily access all 7 EC2 hosts using a bash shell terminal that supports both Secure Shell (SSH) and Secure Copy File (SCP) using the shared Key Pair PEM file.

Incorta Installation Zip

This guide assumes that you have already downloaded the incorta-package-<version>.zip file and can securely copy this file using SCP to your 5 EC2 Hosts that will run either an Incorta Node or the Incorta Cluster Management Console.

EC2 Host Configuration Information

The relevant information for each EC2 host used as an example is tabulated here. This information will be used to complete the creation of the cluster.

EC2 Hosts, Nodes, Users, Applications, and Services
EC2_HostNode NameLinux User/Applications/Services
Host_1CMC_MySQL_Spark_ZooKeeper_1
  ●  incorta user
  ●  OpenJDK 11
  ●  EFS Mount and Directory
  ●  Apache ZooKeeper
  ●  Incorta Cluster Management Console
  ●  MySQL 5.6
  ●  Apache Spark
Host_2IncortaNodeLoader_1
  ●  incorta user
  ●  OpenJDK 11
  ●  EFS Mount and Directory
  ●  Incorta Node
  ●  Incorta Loader Service 1
Host_3IncortaNodeLoader_2
  ●  incorta user
  ●  OpenJDK 11
  ●  EFS Mount and Directory
  ●  Incorta Node
  ●  Incorta Loader Service 2
Host_4IncortaNodeAnalytics_1
  ●  incorta user
  ●  OpenJDK 11
  ●  EFS Mount and Directory
  ●  Incorta Node Incorta
  ●  Analytics Service 1
Host_5IncortaNodeAnalytics_2
  ●  incorta user
  ●  OpenJDK 11
  ●  EFS Mount and Directory
  ●  Incorta Node
  ●  Incorta Analytics Service 2
Host_6ZooKeeper_2
  ●  incorta user
  ●  OpenJDK 11
  ●  Apache ZooKeeper
Host_7ZooKeeper_3
  ●  incorta user
  ●  OpenJDK 11
  ●  Apache ZooKeeper
EC2 Host IP and DNS Addresses

Each EC2 Host in this public facing cluster has a:

  • Private DNS
  • Private IP
  • Public DNS (IPv4)
  • IPv4 Public IP

In this document, the following IP address placeholders are used for the seven hosts. You will need to tabulate the actual IP addresses for your specific environment.

EC2 HostPrivate DNSPrivate IPPublic DNS (IPv4)IPv4 Public IP
Host_1<HOST_1_Private_DNS><HOST_1_Private_IP><HOST_1_Public_DNS_IPv4><HOST_1_IPv4_Public_IP>
Host_2<HOST_2_Private_DNS><HOST_2_Private_IP><HOST_2_Public_DNS_IPv4><HOST_2_IPv4_Public_IP>
Host_3<HOST_3_Private_DNS><HOST_3_Private_IP><HOST_3_Public_DNS_IPv4><HOST_3_IPv4_Public_IP>
Host_4<HOST_4_Private_DNS><HOST_4_Private_IP><HOST_4_Public_DNS_IPv4><HOST_4_IPv4_Public_IP>
Host_5<HOST_5_Private_DNS><HOST_5_Private_IP><HOST_5_Public_DNS_IPv4><HOST_5_IPv4_Public_IP>
Host_6<HOST_6_Private_DNS><HOST_6_Private_IP><HOST_6_Public_DNS_IPv4><HOST_6_IPv4_Public_IP>
Host_7<HOST_7_Private_DNS><HOST_7_Private_IP><HOST_7_Public_DNS_IPv4><HOST_7_IPv4_Public_IP>

Shared Storage (Network File Sharing)

In this Incorta Cluster, the 5 EC2 Hosts that run Incorta need to be able to access Shared Storage using Amazon Elastic File System (EFS). EFS sharing is set up by mounting a designated disk partition on all of the hosts requiring access to the shared data. EFS designates a shared device as an EFS ID and EFS directory pair. In this document, the following placeholders are used:

EFS identifier : <efs-ID>
EFS directory : <efs-shared-dir>

Linux Users

This guide references 3 Linux Users for bash shell commands: ec2-user, root, incorta

ec2-user

The ec2-user is a standard, unprivileged user available to the EC2 Amazon Linux 2 AMI host.

root

The root user is a privileged user available to the EC2 Amazon Linux 2 AMI host. In this document, the root user installs and runs some applications and services. This is not a requirement.

incorta

The incorta user is a standard, unprivileged user that you create. The incorta user will install certain applications and services and will own certain directories.

Summary of Procedures

All hosts require the procedure for updating the EC2 packages, creating the incorta Linux user, and installing and configuring Java OpenJDK 11.

For Host_1, Host_6, and Host_7, you will install and configure an Apache ZooKeeper ensemble.

For Host_1, Host_2, Host_3, Host_4 and Host_5, you will create the IncortaAnalytics directory and create an EFS mount for Shared Storage.

For Host_1, you will install MySQL 5.6 and create the Incorta Metadata database. In addition, you will install the Incorta Cluster Management Console (CMC), and install and configure the Incorta supplied version of Apache Spark.

For Host_2, Host_3, Host_4 and Host_5, you will install an Incorta HA Node.

Using the Cluster Management Console, you will create a cluster, federate nodes, and install either a Loader or Analytics service on each federated node. You will then start the Incorta Cluster. Next, you will create and configure an example tenant with sample data. You will then access the tenant and perform a full data load for a given schema. After successfully loading the data for the schema, you will view a Dashboard based on that data. Successfully viewing a Dashboard indicates your Incorta Cluster is operational.

You will verify SQLi connectivity to Incorta. You will verify SQLi connectivity to configure a tool to access the tenant through one and then the other Analytics Service.

With your Incorta Cluster verified to be running, you will access Incorta through a public DNS for a Network Load Balancer over TCP. You will follow steps to stop an Incorta service, confirm access to the cluster, and then restart the Incorta service.

Secure Shell

For Secure Shell access to EC2 hosts, AWS asks that you create a Key Pair and download a .pem file. PEM (Privacy Enhanced Mail) is a base64 container format for encoding keys and certificates.

This guide assumes that you have a .pem file installed under the ~/.ssh/ directory for Mac OS or Linux.

If using a Windows SSH client such as PuTTY, you will need to convert the .pem format into a .ppk format (PuTTY Private Key) using PuTTyGen.

Create Shell Variables for Mac OS or Linux

In PuTTY or another Windows SSH client, you can create and save a SSH connection for each of the seven EC2 Hosts.

To expedite secure shell access from either Mac OS or Linux and the EC2 Hosts, create the following variables to store the IPv4_Public_IP values for each host.

To begin, open Terminal and define the following variables using the IPv4_Public_IP values for the EC2 Hosts. The Public IPs in the following are for illustration.

HOST_1=34.155.100.1
HOST_2=34.155.100.2
HOST_3=34.155.100.3
HOST_4=34.155.100.4
HOST_5=34.155.100.5
HOST_6=34.155.100.6
HOST_7=34.155.100.7

Next confirm that you can log in to Host_1 as the ec2-user using the shell variable.

ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_1}

When prompted to access the host, enter Yes and press return.

Then, after successfully connecting to the EC2 Host, exit your terminal:

exit

Repeat for Host_2, Host_3, Host_4, Host_5, Host_6, and Host_7:

ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_2}
exit
ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_3}
exit
ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_4}
exit
ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_5}
exit
ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_6}
exit
ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${HOST_7}
exit

Create the Incorta Group and User

The purpose of this section is to create a user and group for running and managing the Incorta software. In this document, the user and group are both called incorta. This is just an example. You should create a user and group name that matches your own needs.

You should also have the secure shell authentication file installed under the ~/.ssh/ directory. In this case, the file is identified as <ssh-auth-file>.pem. Use this as a placeholder for your own file.

Start with Host_1. Log in as the ec2-user:

ssh -i ~/.ssh/<ssh-auth-file>.pem ec2-user@${Host_1}

Create the incorta user and group for each of the seven hosts. You will do this from the bash command line after logging in with ssh as the default user, ec2-user.

sudo groupadd -g 1220 incorta
sudo useradd -u 1220 -g incorta incorta

Give the incorta user permission to use sudo. Backup the original file first.

sudo cp -r /etc/sudoers /etc/sudoers.bk

Use visudo to set values in the /etc/sudoers file:

sudo visudo

After the line reading root ALL=(ALL) ALL add the line incorta ALL=(ALL) NOPASSWD: ALL. The result should look like the following (additions in bold):

root ALL=(ALL) ALL
incorta ALL=(ALL) NOPASSWD: ALL

Set up the hosts so you can log in directly as the incorta user. Copy the .ssh file from the ec2-user home directory to the incorta home directory.

sudo cp -rp ~/.ssh /home/incorta
sudo chown -R incorta.incorta /home/incorta/.ssh

Log out of the host:

exit

Repeat this procedure for setting up the incorta user on Host_2, Host_3, Host_4, Host_5, Host_6 and Host_7.

Install Java

Update and Install Existing Packages

This section assumes you have created a group and user ID as described in Setting Up the Incorta Group and User. In this document, you continue to use incorta as the user.

You need to make sure the necessary components are up to date for each host. You will also need to add additional utilities to support Incorta.

Log in as the incorta user:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${Host_1}

Update the existing packages:

sudo yum -y update

Install utilities required for supporting various applications needed for Incorta:

sudo yum -y install telnet
sudo yum -y install expect

Install the Java OpenJDK 11

Many of the software components required to run Incorta are Java applications. You will need to add the Java OpenJDK and associated components for each of the seven hosts to run the required software.

Install the Java OpenJDK 11 for Amazon Linux 2:

sudo amazon-linux-extras install java-openjdk11

Install the Open JDK for Java 11:

sudo yum -y install java-11-openjdk-devel

Update the Java alternatives:

sudo update-alternatives --config javac

Accept the default by pressing Enter.

Confirm the version of Java matches what you just installed:

java -version
openjdk version "11.0.5" 2019-10-15 LTS
OpenJDK Runtime Environment 18.9 (build 11.0.5+10-LTS)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.5+10-LTS, mixed mode, sharing)

Install additional packages to support the Java installation:

sudo yum -y install gcc
sudo yum -y install byacc
sudo yum -y install flex bison

Make it so the JAVA_HOME environment variable is set when logging in to the host. Do this by creating a shell script file that will be run at login time. Create a file called custom.sh in the directory /etc/profile.d using the editor of your choice. You will need root privileges to do this. For example:

sudo vim /etc/profile.d/custom.sh

Then add the following to the file:

##! /bin/bash
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.5.10-0.amzn2.x86_64
export PATH=$PATH:$JAVA_HOME/bin

Note that the character denotes line continuation.

Verify that the environment variables, JAVA_HOME and PATH, have defined values for the incorta user.

source ~/.bash_profile
echo $JAVA_HOME
/usr/lib/jvm/java-11-openjdk-11.0.5.10-0.amzn2.x86_64

You will need to add more environment variable settings to custom.sh in the procedures that follow.

Log out of the host.

exit

Repeat these steps for installing Java (Install Java) on the remaining hosts (Host_2, Host_3, Host_4, Host_5, Host_6 and Host_7).

Create the Incorta Installation Directory

As the incorta Linux user, create the Incorta default installation directory on Host_1, Host_2, Host_3, Host_4, and Host_5. All Incorta components will be installed in the directory /home/incorta/IncortaAnalytics/. You must create this directory before starting the Incorta installer. The Incorta installer will fail if a valid directory is not specified at installation time.

Log in to Host_1 as the incorta user:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

Create the IncortaAnalytics directory.

mkdir IncortaAnalytics

Log out of Host_1.

exit

Repeat these steps for Host_2, Host_3, Host_4, and Host_5.

Setting Up Shared Storage

A common network file storage mount is one way to define Shared Storage in an Incorta Cluster typology. In AWS, network file storage is Elastic File Storage (EFS). With EFS, EC2 hosts can share files.

In order to share files between Incorta Nodes and Apache Spark in the Incorta Cluster, you must first install the Amazon EFS utility on Host_1, Host_2, Host_3, Host_4, and Host_5. Host_6 and Host_7 do not require this package as they do not need to access Shared Storage.

To begin, log in to Host_1 as the incorta user:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

Install the Amazon EFS utility:

sudo yum -y install amazon-efs-utils

Log out of Host_1.

exit

Repeat these steps for Host_2, Host_3, Host_4, and Host_5.

Create the EFS Mount for Host_1

Important: The procedure for Host_1 differs from Host_2, Host_3, Host_4, and Host_5.

Both the Cluster Management Console (CMC) and Apache Spark on Host_1 require access to Shared Storage.

First create the EFS mount and then create the Tenants directory in the EFS mount.

AWS hosts facilitate file sharing by providing a file system directory name and ID pair. You will need an ID and shared directory to complete this procedure. For example:

EFS identifier : <efs-ID>
EFS directory : <efs-shared-dir>

Log in to Host_1 as the incorta user:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

To make it easy to be consistent in this procedure, set the shell variables for the EFS identifier and EFS directory values. For example:

EFS_SHARED_DIR=<efs-shared-dir>
EFS_ID=<efs-ID>

Create the shared directory:

cd /mnt/
sudo mkdir ${EFS_SHARED_DIR}

Verify the directory has been created:

ls -l
drwxrwxrwx 4 incorta incorta 6144 Feb 11 18:54 <efs-shared-dir>

Mount the directory:

sudo mount -t efs ${EFS_ID}:/${EFS_SHARED_DIR} /mnt/${EFS_SHARED_DIR}

Create the Tenants directory:

sudo mkdir ${EFS_SHARED_DIR}/Tenants

Modify the mount point's access rights for the incorta group and user:

sudo chown -R incorta:incorta ${EFS_SHARED_DIR}
sudo chmod -R go+rw ${EFS_SHARED_DIR}

Do the same for the Tenants directory:

sudo chown -R incorta:incorta ${EFS_SHARED_DIR}/Tenants
sudo chmod -R go+rw ${EFS_SHARED_DIR}/Tenants

Get the full path of the Tenant directory for later use:

cd ${EFS_SHARED_DIR}/Tenants
pwd
/mnt/<efs-shared-dir>/Tenants

Next, create a file in the Tenants directory:

echo "efs test" > test.txt
ls -l
cat test.txt
efs test

Log out of Host_1.

exit

This completes the set up process for Host_1. Next, set up Host_2, Host_3, Host_4 and Host_5. They will mount the shared directory to access the contents of Tenants.

Create the EFS Mount for Host_2, Host_3, Host_4 and Host_5

Log in to Host_2 as the incorta user. If repeating this step, remember to change the value of ${HOST_2} to {HOST_3, ${HOST_4} and ${HOST_5}:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_2}

Set the shell variables:

EFS_SHARED_DIR=<efs-shared-dir>
EFS_ID=<efs-ID>

Create the local directory:

cd /mnt/
sudo mkdir ${EFS_SHARED_DIR}

Mount the shared directory:

sudo mount -t efs ${EFS_ID}:/${EFS_SHARED_DIR} /mnt/${EFS_SHARED_DIR}

Verify Host_2 can access the test.txt file created by Host_1:

ls -l /mnt/${EFS_SHARED_DIR}/Tenants/
cat /mnt/${EFS_SHARED_DIR}/Tenants/test.txt
efs test

Log out of Host_2:

exit

Now repeat these steps for Host_3, Host_4 and Host_5.

Installing and Configuring Apache ZooKeeper

Incorta uses Apache ZooKeeper for distributed communications. In this procedure, you will install and configure Apache ZooKeeper on Host_1, Host_6 and Host_7. The three ZooKeeper server instances will create a quorum for processing distributed messages.

Installing ZooKeeper

Log in as the incorta user then retrieve the Apache ZooKeeper installation file. Start with Host_1:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

Fetch Apache ZooKeeper version 3.5.6 and place it into the host's /tmp directory. Note: the code box below scrolls horizontally.

cd /tmp
wget https://archive.apache.org/dist/zookeeper/zookeeper-3.5.6/apache-zookeeper-3.5.6-bin.tar.gz

Extract the contents of the file then make it part of the locally installed software packages:

tar -xzf apache-zookeeper-3.5.6-bin.tar.gz
sudo mv apache-zookeeper-3.5.6-bin /usr/local/zookeeper

Create a ZooKeeper data directory:

sudo mkdir /var/lib/zookeeper

Prepare to make a custom ZooKeeper configuration file by duplicating the sample configuration file. Note: the code box below scrolls horizontally.

sudo cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg

Configuring ZooKeeper

Open the custom ZooKeeper configuration file with a text editor. For example:

sudo vi /usr/local/zookeeper/conf/zoo.cfg

Look for the line beginning with dataDir and change it to read:

dataDir=/var/lib/zookeeper

Move to the bottom of the file and add two lines as follows:

admin.enableServer=false
zookeeper.admin.enableServer=false

At the bottom of the file, add the IP addresses and port ranges for all ZooKeeper hosts:

server.1=<HOST_1_Private_IP>:2888:3888
server.2=<HOST_6_Private_IP>:2888:3888
server.3=<HOST_7_Private_IP>:2888:3888

Save your work, quit the editor, then log out of Host_1:

exit

Repeat Installing ZooKeeper and Configuring ZooKeeper for Host_6 and Host_7.

Setting Up the ZooKeeper IDs

You will complete this section for Host_1, Host_6 and Host_7. Start with Host_1.

Log in:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

Create an ID file for Host_1. The ZooKeeper server ID for Host_1 is 1.

echo 1 | sudo tee -a /var/lib/zookeeper/myid

Log out of Host_1:

exit

Create an ID file for Host_6. The ZooKeeper server ID for Host_6 is 2.

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_6}
echo 2 | sudo tee -a /var/lib/zookeeper/myid
exit

Create an ID file for Host_7. The ZooKeeper server ID for Host_7 is 3.

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_7}
echo 3 | sudo tee -a /var/lib/zookeeper/myid
exit

Starting ZooKeeper

Start ZooKeeper for Host_1 by first logging in:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

Run ZooKeeper's control script to start the service:

sudo /usr/local/zookeeper/bin/zkServer.sh start

Log out:

exit

Start ZooKeeper on Host_6:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_6}
sudo /usr/local/zookeeper/bin/zkServer.sh start
exit

Start ZooKeeper on Host_7:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_7}
sudo /usr/local/zookeeper/bin/zkServer.sh start
exit

Verifying Quorum

Next, query the ZooKeeper status on each ZooKeeper host to verify quorum. ZooKeeper defines quorum as an odd number of servers where one server is Leader and the two other servers are Followers.

On Host_1, check the status:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}
sudo /usr/local/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
exit

On Host_6, check the status:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_6}
sudo /usr/local/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
exit

On Host_7, check the status:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_7}
sudo /usr/local/zookeeper/bin/zkServer.sh status
/bin/java
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader
exit

One host reports as LEADER. The other two hosts report as FOLLOWER. This indicates a ZooKeeper quorum for the ZooKeeper Ensemble. In the example above, Host_1 and Host_6 are followers. Host_7 is the leader.

Install MySQL Server and Create the Incorta Metadata Database

You will now set up a metadata database managed with MySQL Server. In this section of the guide, the procedure describes how to install MySQL on Host_1 and then create a database. The MySQL database stores metadata about Incorta objects such as schemas, business schemas, and dashboards. For production use, Incorta supports MySQL Server 5.6 as well as Oracle 11g and 12c.

Install and Start MySQL server

There are references to two different types of root users: the Linux root user and the MySQL root user. The Linux root user is used to install MySQL server. The MySQL root user is a default administrative user for MySQL.

Start by logging into Host_1:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

Switch to the Linux root user:

sudo su

Download the MySQL RPM. Note: the code box below scrolls horizontally.

rpm -ivh http://repo.mysql.com/mysql-community-release-el5-6.noarch.rpm

Install MySQL server:

yum install -y mysql-server

Start the MySQL server daemon:

service mysqld start
Starting mysqld (via systemctl):[OK]

If the above command fails, try starting MySQL server with the following command:

/etc/init.d/mysqld start

Create a password for the MySQL root user. The password 'incorta_root' is used for illustrative purposes only.

/usr/bin/mysqladmin -u root password 'incorta_root'

The command generates a warning that can be safely ignored.

Create the Incorta Metadata Database

You will now create the needed database.

Log in to the MySQL Client CLI:

mysql -h0 -uroot -pincorta_root

where:

-h = host: 0 references the localhost

-u = user: root in this case

-p = password: password a specified at installation time

Create the database. In this document, we are calling it incorta_metadata.

create database incorta_metadata;
Query OK, 1 row affected (0.00 sec)

Create the MySQL incorta users

After creating the incorta_metadata database, next create the same named MySQL user for each Incorta Host in the cluster: Host_1, Host_2, Host_3, Host_4, and Host_5. For illustrative purposes, the MySQL user is incorta and the password is Incorta#1.

Create the MySQL users for all of the hosts on the subnet. In this example, the subnet is 192.168.128.255. The usernames use both the localhost reference and Private IP.

create user 'incorta'@'localhost'
create user 'incorta'@'<Host_1_Private_IP>' identified by 'Incorta#1';

Create a MySQL user for Host_2.

create user 'incorta'@'<Host_2_Private_IP>' identified by 'Incorta#1';

Create a MySQL user for Host_3.

create user 'incorta'@'<Host_3_Private_IP>' identified by 'Incorta#1';

Create a MySQL user for Host_4.

create user 'incorta'@'<Host_4_Private_IP>' identified by 'Incorta#1';

Create a MySQL user for Host_5.

create user 'incorta'@'<Host_5_Private_IP>' identified by 'Incorta#1';

Verify the users are created:

select User, Host from mysql.user where user = 'incorta';
+---------+---------------+
| User | Host |
+---------+---------------+
| incorta | localhost |
| incorta | 192.168.128.1 |
| incorta | 192.168.128.2 |
| incorta | 192.168.128.3 |
| incorta | 192.168.128.4 |
| incorta | 192.168.128.5 |
+---------+---------------+
6 rows in set (0.00 sec)

Grant Database Access Privileges

After creating MySQL users, next grant the users ALL privileges for the incorta_metadata database.

For the Host_1 incorta users, grant the ALL privilege to the incorta_metadata database:

grant all on *.* to 'incorta'@'localhost' identified by 'Incorta#1';
grant all on *.* to 'incorta'@'<Host_1_Private_IP>' identified by 'Incorta#1';

For the Host_2 incorta users, grant the ALL privilege to the incorta_metadata database:

grant all on *.* to 'incorta'@'<Host_2_Private_IP>' identified by 'Incorta#1';

For the Host_3 incorta users, grant the ALL privilege to the incorta_metadata database:

grant all on *.* to 'incorta'@'<Host_3_Private_IP>' identified by 'Incorta#1';

For the Host_4 incorta users, grant the ALL privilege to the incorta_metadata database:

grant all on *.* to 'incorta'@'<Host_4_Private_IP>' identified by 'Incorta#1';

For the Host_5 incorta users, grant the ALL privilege to the incorta_metadata database:

grant all on *.* to 'incorta'@'<Host_5_Private_IP>' identified by 'Incorta#1';

Verify the privileges have been granted for all the MySQL incorta users. Note: the code box below scrolls horizontally.

show grants for 'incorta'@'localhost';
+--------------------------------------------------------------------------------------------+
| Grants for incorta@localhost |
+--------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'incorta'@'localhost' IDENTIFIED BY PASSWORD |
| '*3304B4423C0D30FD76006E85829E9C5A695C1B33' |
+--------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
show grants for 'incorta'@'192.168.128.1';
+--------------------------------------------------------------------------------------------+
| Grants for incorta@192.168.128.1 |
+--------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'incorta'@'192.168.128.1' IDENTIFIED BY PASSWORD |
| '*3304B4423C0D30FD76006E85829E9C5A695C1B33' |
+--------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
show grants for 'incorta'@'192.168.128.2';
+--------------------------------------------------------------------------------------------+
| Grants for incorta@192.168.128.2 |
+--------------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'incorta'@'192.168.128.2' IDENTIFIED BY PASSWORD |
| '*3304B4423C0D30FD76006E85829E9C5A695C1B33' |
+--------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Verify privileges for Host_3, Host_4 and Host_5:

show grants for 'incorta'@'192.168.128.3';
show grants for 'incorta'@'192.168.128.4';
show grants for 'incorta'@'192.168.128.5';

Exit the MySQL Client:

exit;

Verify the incorta user (the host incorta user) has access to the incorta_metadata database:

mysql -h0 -uincorta -pIncorta#1 incorta_metadata

Exit the MySQL Client:

exit;

The incorta_metadata database has been created and verified.

Next, exit the root user and then log out of Host_1:

exit
exit

Install and Start the Cluster Management Console (CMC)

In this step you will install and start the CMC on Host_1. This requires you to unzip the incorta-package_<version>.zip file and run the Incorta installer. The installer guides you through making selections appropriate for installing a CMC that manages Incorta nodes on other hosts.

Run the Incorta Installer

Log in to Host_1 as the incorta user.

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

Start by unzipping the Incorta package:

cd /tmp
mkdir incorta
unzip incorta-package_<version> -d incorta

Run the Incorta installer:

cd /tmp/incorta
java -jar incorta-installer.jar -i console

The Incorta installer will present a series of prompts. The answers for creating a CMC on the host are shown to the right of each prompt as shown below.

The first set of prompts relate to the installation. All the prompts are important. However, note the one labeled "Incorta HA components". Here you select the CMC option to install the CMC software. For the Incorta hosts (later steps) you will select (2) for "Incorta HA components".

Welcome prompt : Press ENTER
License Agreement : Y : (Accept)
Installation Type : 1 : (New Installation)
Installation Set : 2 : (Custom Installation)
Incorta HA components : 1 : (Central Management Console (CMC))
Installation Folder : Press Enter to accept the default directory

By default the installation folder matches the home directory of the current user plus IncortaAnalytics. For example: /home/incorta/IncortaAnalytics. This is the directory you created in a previous step.

The second set of prompts is concerned with how the CMC is made available for use as well as how other hosts communicate with it. This guide will use the default ports as shown.

CMC Configuration—Step 1

Server Port (6005) : Accept (Enter)
HTTP Connector Port (Default: 6060) : Accept (Enter)
HTTP Connector Redirect Port (Default: 6443) : Accept (Enter)
AJP Connector Port (Default: 6009) : Accept (Enter)
AJP Connector Redirect Port (Default: 6443) : Accept (Enter)

The third and final set of prompts address the memory heap size for the CMC and the administrator username and password. For the CMC, there is only one administrator user and no other users. This guide uses the default values. The password, Incorta#1, is for illustration purposes.

CMC Configuration—Step 2

Memory Heap Size : Accept default (Enter)
Administrator's Username : Accept default (Enter) (admin)
Administrator's Password : Incorta#1

With all of the installation parameters entered, press Enter to begin the installation process at the Ready To Install CMC prompt.

Select Start CMC to start the CMC once installation is complete. At the installation status prompt, confirm the successful start of the CMC. You should see the following:

==============================================================
Installation Status
-------------------
Success! Incorta Analytics has been installed under the
following path:
/home/incorta/IncortaAnalytics/cmc
To access your CMC installation please go to this link.
http://<HOST_1_Private_IP>:6060/cmc/

Sign in to the CMC

In a browser, navigate to the IPv4 Public IP address of the CMC:

http://<HOST_1_IPv4_Public_IP>:6060/cmc/

At the login prompt, use your login information. In this document, the admin user and password Incorta#1 are used as specified in the previous section.

Select Sign In. Confirm you see the Welcome message. This confirms you have installed the CMC.

Keep the Cluster Management Console open as it will be used later to create the Incorta Cluster.

Log out of Host_1:

exit

Install, Configure, and Start Apache Spark

You need to install Apache Spark on Host_1. Incorta requires a specific version of Spark to run. This version is included in the Incorta package.

Installing Apache Spark

You will run the Incorta installer once again on Host_1, this time selecting "Incorta HA components".

Start by logging in to Host_1 as the incorta user:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@${HOST_1}

Change directories to /tmp/incorta and start the Incorta installer:

cd /tmp/incorta
java -jar incorta-installer.jar -i console

IMPORTANT: As this procedure is for installing Apache Spark, do NOT respond as if you were installing the CMC . Instead of CMC, select Incorta HA components.

The sole purpose of installing the Incorta HA components on Host_1 is to install the correct version of Apache Spark that the Incorta High Availability Cluster requires. This guide uses the default port and default installation directory.

Below are the responses for each prompt the installer will present.

Welcome prompt : ENTER
License Agreement : Y (Accept)
Installation Type : 1 (New Installation)
Installation Set : 2 (Custom Installation)
Incorta HA components : 2 (Incorta HA Node)
Installation Folder : Press enter to accept the default
Incorta Node Agent Port : Press enter to accept the default
(Default: 4500)
Public IP : Use HOST_1 public IP address
Start Node Agent : 0 (Disable automatic start)

Press Enter at the Ready to Install prompt. At the success prompt, press Enter.

Configure Spark

You will find Spark in /home/incorta/IncortaAnalytics/IncortaNode/spark. Configuration requires you add the Spark binaries directory to the PATH environment variable. Edit /etc/profile.d/custom.sh to include the Spark home directory environment variable and add the Spark binary directory to the path.

Open custom.sh:

sudo vim /etc/profile.d/custom.sh

Add SPARK_HOME and add to the PATH:

##! /bin/bash
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.5.10-0.amzn2.x86_64
export SPARK_HOME=/home/incorta/IncortaAnalytics/IncortaNode/spark
export PATH=$PATH:$JAVA_HOME/bin:$SPARK_HOME/bin

Save custom.sh and quit the editor.

Update your environment variables:

source ~/.bash_profile

Set up to work on the Spark configuration files:

cd $SPARK_HOME/conf

Set Spark DNS Address, IP Address and Port Numbers

With the editor of your choice, open spark-env.sh. Find the parameters below and set them as follows:

SPARK_PUBLIC_DNS=<HOST_1_Public_DNS_IPv4>
SPARK_MASTER_IP=<HOST_1_Private_IP>
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=9091
SPARK_WORKER_PORT=7078
SPARK_WORKER_WEBUI_PORT=9092
SPARK_WORKER_MEMORY=6g

This guide uses the default values as shown. Adjust the SPARK_WORKER_MEMORY value as appropriate for your installation. Save this file and exit the editor.

Define Spark Resource Limits

Open spark-defaults.conf with the editor of your choice and set the following (scroll right in the text box):

spark.master spark://<HOST_1_Private_IP>:7077
spark.eventLog.enabled true
spark.eventLog.dir /home/incorta/IncortaAnalytics/IncortaNode/spark/eventlogs
spark.local.dir /home/incorta/IncortaAnalytics/IncortaNode/spark/tmp
spark.executor.extraJavaOptions
-Djava.io.tmpdir=/home/incorta/IncortaAnalytics/IncortaNode/spark/tmp
spark.driver.extraJavaOptions
-Djava.io.tmpdir=/home/incorta/IncortaAnalytics/IncortaNode/spark/tmp
spark.cores.max 2
spark.executor.cores 2
spark.sql.shuffle.partitions 2
spark.driver.memory 6g
spark.port.maxRetries 100

The values for spark.cores.max, spark.executor.cores, spark.sql.shuffle.partitions, spark.driver.memory and spark.port.maxRetries are hardware specific. Set them as appropriate for the cluster you are building. For more information, see Performance Tuning, section Analytics and Loader Service Settings.

The paths entered for spark.eventLog.dir, spark.local.dir, spark.executor.extraJavaOptions and spark.driver.extraJavaOptions refer to /home/incorta/IncortaAnalytics.

Start Spark:

cd ~/IncortaAnalytics/IncortaNode
./startSpark.sh

Log out of Host_1:

exit

In a browser, navigate to the Spark master web interface to view the Spark master node:

http://<HOST_1_IPv4_Public_IP>:9091

You will see <HOST_1_Private_IP>:7077 at the top of the page and information about the applications Spark is handling. At this point it will only show the Worker just started.

View the Spark worker node at port 9092:

http://<HOST_1_IPv4_Public_IP>:9092

You will again see an Apache Spark page, showing <HOST_1_Private_IP>:7078 at the top of the page and potentially tables showing running and finished executors. This shows that Spark is now installed, configured and ready to use.

Install the Incorta HA Components

The Incorta HA components include the Node Agent. The Node Agent is used to start and stop an HA node. The Node Agent is all you are concerned with here. The other binaries are used only after the Node Agent has been started.

You will now install the Incorta HA components on the hosts intended to support the loader and analysis services. Perform the following for the loader and analytics hosts (Host_2, Host_3, Host_4 and Host_5).

Log in to Host_2 as the incorta user:

ssh -i ~/.ssh/<ssh-auth-file>.pem incorta@<HOST_2_IPv4_Public_IP>

Then:

mkdir incorta
unzip incorta-package_<version>.zip -d incorta

Run the Incorta installer:

java -jar incorta-installer.jar -i console

The responses to the prompts in this wizard are the same as when you installed HA components to gain access to Spark, except you will start the Node Agent this time.

Welcome prompt : Enter
License Agreement : Y
Installation Type : 1 (New Installation)
Installation Set : 2 (Custom Installation)
Incorta HA Components : 2 (Incorta HA Node)
Installation Folder : `/home/incorta/IncortaAnalytics`
Node Agent Configuration (port) : 4500
Public IP Address : <HOST_2_IPv4_Public_IP>
Start Node Agent : 1 (Start node agent)**

Now press Enter to begin the installation. The following should appear indicating the installation has been successful:

=============================================================
Installation Status
-------------------
Success! Incorta Analytics has been installed under the
following path:
/home/incorta/IncortaAnalytics/IncortaNode

Log out of Host_2:

exit

You have completed the installation of the Incorta HA components for Host_2. The Incorta Node Agent is running on this host as well. To complete this procedure, you will need to install the Node Agent on Host_3, Host_4 and Host_5 using these same instructions.

Create the Incorta Cluster

You will create an Incorta cluster in this procedure. This will be a cluster ready for the federation process in which nodes are made known to the cluster and services are added to the nodes. Please note hosts are now referred to as nodes in the documentation.

The wizard used in this procedure requires five steps to complete:

  1. Basic: provide the name of the cluster
  2. Database: set up a reference to the database managed by MySQL from Set Up the Incorta Metadata Database
  3. System Admin: Set up credentials for accessing the cluster
  4. ZooKeeper: Specify the nodes responsible for ZooKeeper redundancy
  5. Spark Integration: Set up how the CMC and Spark communicate

Start by signing into the CMC using the credentials you set up during CMC installation. If you have inadvertently closed the CMC window, point your browser to

http://<HOST_1_IPv4_Public_IP>:6060/cmc/

Select Clusters in the Navigation Bar. The action bar should read home > clusters.

Bring up the new clusters wizard by selecting the add button (+ icon, upper right).

Basic

For the name of the cluster, use exampleCluster. Select the Check button to make sure the name is not already in use. Select Next to move to the Database step.

Database

You previously created a MySQL metadata database for Incorta (incorta_metadata). For the Database Type, select MySQL.

For JDBC URL, use the private IP address of the host managing the metadata database. This is <HOST_1_Private_IP>. For the port number, use 3306. For the database name, use the actual name of the database which is incorta_metadata if you have been using the values provided in this guide. Your entry for JDBC URL should look like this:

jdbc:mysql://<HOST_1_Private_IP>:3306/incorta_metadata?useUnicode=yes&characterEncoding=UTF-8

For username and password, enter the following as previously described:

Username : incorta
Password : Incorta#1

Select Next to move to the System Admin step.

System Admin

In this document, we are using admin for the Username and Password:

Username : admin
Password : Incorta#1
Email : admin@incorta.com
Path : /mnt/<efs-shared-dir>/Tenants

where /mnt/<efs-shared-dir>/Tenants is confirmed through ls /mnt. For example: /mnt/efs_03a5369/Tenants

Select Check disk space.

Select Next to proceed to the ZooKeeper step.

ZooKeeper

ZooKeeper URL: <HOST_1_Private_IP>:2181,<HOST_6_Private_IP>:2181,<HOST_7_Private_IP>:2181

Use the private IP addresses for all hosts that will be making up the ZooKeeper ensemble. Select Next to advance to the Spark Integration step.

Spark Integration

In a browser, navigate to <HOST_1_IPv4_Public_IP>:9091

Look for the line near the top of the page beginning with "URL". Copy this URL including the port number 7077 and paste it in the text box for Master URL. For example,

Master URL: spark://ip-192-168-128-1.ec2.internal:7077

This guide uses the default values for the remaining entries:

App Memory (GB) : 1
App Cores : 1
App Executors : 1
DS Port : 5442

Select Next to get to the Review step and review your settings. Use the Back button to view and make changes to settings in previous steps. When you are satisfied with your settings, select the Create button. You will receive notification the cluster was successfully created. Select Finish.

Federating Nodes in the Incorta Cluster

In this guide, Host_2 and Host_3 will run the loader service and Host_4 and Host_5 will run the analytics service. Federating Nodes means adding the hosts running the Node Agent to the cluster. Once the hosts are federated, they are referred to as Incorta Nodes.

In the Navigation Bar, select Nodes and verify the Action Bar shows home > nodes.

You will use the Federation wizard to add Nodes to the Cluster. In short, this wizard goes through three steps, two of them requiring information from you:

  • Discover: identify a host to federate by its private IP address
  • Federate: provide a unique name for the node
  • Finish completes the federation process

Start the Node Federation wizard by selecting the Add button (+).

Discover

Host : Enter the private IP addresses for the host, for example: <HOST_2_Private_IP>

Port : Accept the default of 4500

Select Next to proceed to federating individual nodes.

Federate

In this step you name the nodes. See the table below. Select Federate once the unique name is confirmed. Add another Node by selecting Add another node in the canvas and add the second Node. Continue this process for nodes 4 and 5.

The Node naming convention is shown in the table below:

EC2_HostName
Host_2IncortaNodeLoader_1
Host_3IncortaNodeLoader_2
Host_4IncortaNodeAnalytics_1
Host_5IncortaNodeAnalytics_2

You now have four federated nodes: two will be designated as Loader nodes and two will be designated as Analytics nodes. What these nodes do is determined by the services you assign to them. In the next procedure, you will assign loader and analytics services to the four nodes.

Add Services to the Nodes in the Cluster

Select Nodes in the Navigation Bar and confirm home > nodes appears in the action bar.

Add and Configure the Loader Services

In the federated nodes canvas, a list of the four federated nodes is visible. Select on the label for the first Loader Node (i.e., IncortaNodeLoader_1).

The canvas that appears is specific to the node and will be labeled according to the node name. Select Services in the bar toward the bottom of the canvas then select the Add button (+) to bring up the Create a new service wizard. This wizard has two steps:

  1. Basic: give the service a name, declare the type of service, resource utilization (memory, CPU)
  2. Additional Settings: supply port numbers for the services required for a Loader service

In this guide, the resource utilization settings are the defaults. You may need to make adjustments in either step to accommodate the needs of your cluster.

Basic Settings

In this step, you provide a name for the service, designate the type of service being created and its resource utilization limits. As this Node was designated as a Loader, the guide uses the following values for the parameters:

Service Name : LoaderService_1
Type : Loader
Memory Size (GB) : 12 (adjust as appropriate)
CPU Utilization (%) : 90 (adjust as appropriate)

Select Next.

Additional Settings

This guide uses the following port numbers for Loader services:

Tomcat Server Port : 7005
HTTP Port : 7070
HTTP Redirect Port : 7443

Select Create. The loader service will be created and associated with IncortaNodeLoader_1. Select Finish.

In the Navigation bar, select Nodes again then select the second Loader node. Follow the procedure used for node IncortaNodeLoader_1, using the same settings except, name the service LoaderService_2.

Service Name : LoaderService_2
Type : Loader
Memory Size (GB) : 12
CPU Utilization (%) : 90

Select Next.

The guide uses the following port numbers:

Tomcat Server Port : 7005
HTTP Port : 7070
HTTP Redirect Port : 7443

Select Create. The loader service will be created and associated with IncortaNodeLoader_2. Select Finish.

Add and configure the Analytics Services

The Analytics Services are set in a similar manner. Select Nodes, then select the first Analytics node, IncortaNodeAnalytics_1. As with the Loader nodes, select the Add button. For Basic Settings, this guide uses the defaults.

Basic Settings
Service Name : AnalyticsService_1
Type : Analytics
Memory Size (GB) : 13
CPU Utilization (%) : 75

Select Next.

Additional Settings
Tomcat Server Port : 8005
HTTP Port : 8080
HTTP Redirect Port : 8443
AJP Port : 8009
AJP Redirect Port : 8443

Select Create then select Finish. Repeat for the second Analytics node, setting the Service Name to AnalyticsService_2.

Configuring and Starting the Cluster

Configuring the Cluster means associating the created services with the Cluster. Doing this gives the Cluster functionality. When the services are associated (or joined) with the Cluster, they can be started and stopped individually or as part of a Cluster-wide operation. In Add Services to the Nodes, you associated a service with each Node. You now must connect the service with the cluster.

In the CMC Navigation Bar, select Clusters. Then in the Cluster list, select exampleCluster. Verify the path in the action bar reads: home > clusters > exampleCluster. Select Services. You will now add the Loader and Analytics services to the cluster. You will see each service in this canvas.

Join the Loader Services to the Cluster

  1. In the Services tab, in the Action Menu, select + to bring up the Add a service to the cluster dialog.
  2. In the Node pull down, select the first Loader node, IncortaNodeLoader_1. Recall that you added LoaderService_1 to this node.
  3. In the Service pull down, select LOADER.
  4. Select Add.
  5. Repeat steps 1 through 4 for the second Loader node, IncortaNodeLoader_2.

Join the Analytics Services to the Cluster

For the Analytics Services, use the Add a service to the cluster dialog again, only this time, select analytics nodes and services.

  1. In the Services tab, in the Action Menu, select + to bring up the Add a service to the cluster dialog.
  2. In the Node pull down, select IncortaNodeAnalytics_1. Recall that you added AnalyticsService_1 to this node.
  3. In the Service pull down, select ANALYTICS.
  4. Select Add.
  5. Repeat steps 1 through 4 for IncortaNodeAnalytics_2.

Start the Cluster

  1. Select the Details tab in the action bar.
  2. Select the Start button in the lower right half of the Cluster canvas.
  3. Select the Sync button in the upper right corner of the canvas to monitor the process of starting the cluster. When the Analytics and Loader services read "started", the cluster is up and running.

Create a Tenant

You will use the Create a Tenant wizard to create a tenant for use with this example cluster. Select the Tenants tab in the action bar. Then, select the Add button.

Tenant
Name : example_tenant
Username : admin
Password : Incorta#1
Email : admin@incorta.com
Path : Path to the shared disk space (for example, /mnt/efs_090e465/Tenants)

Select Check disk space to confirm there is sufficient room for your datasets.

Change the switch position (to the right) to Include Sample Data, then select Next.

Email

In this step, you are identifying who to contact about the tenant. You can select the Create button and finish creating the tenant at this point as the values here are optional. Illustrative examples are shown in the table below.

Sender's Username Auth : Disabled (default)
System Email Address : Tenant owner's email address (for use as a user)
System Email Password : Tenant owner's password
SMTP Host : smtp.gmail.com (default)
SMTP Port : 465 (default)
Share Notifications : Disabled (default)

Verifying the Tenant

Load a Schema

Log in to Incorta at <HOST_4_IPv4_Public_IP>:8080/incorta/#/login. Use the administrator user and password, for example, admin/Incorta#1. In the Navigation bar, select Schema. From the list of schema, select SALES. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. Note the Last Load Status information. This shows the most recent load event for this schema and confirms the SALES schema is accessible to AnalyticsService_1.

Now check AnalyticsService_2 by logging into <HOST_5_IPv4_Public_IP>:8080/incorta/#/login. Select Schema from the Navigation bar and then select the SALES schema. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. Look at the Last Load Status information and see that it indicates the schema was just loaded. This confirms the SALES schema is accessible to AnalyticsService_2.

Checking SQLi Access

Incorta supports the SQL interface by exposing itself as a PostgreSQL database. Any client that runs the SQL queries against PostgreSQL via JDBC can query Incorta.

To check SQLi access, you will use DbVisualizer (free version) to connect with the Incorta tenant. You will need the IP addresses of both of the analytics hosts (<HOST_4_IPv4_Public_IP> and <HOST_5_IPv4_Public_IP>) and the name of the tenant (example_tenant) associated with the cluster.

Begin by downloading DbVisualizer and installing it on your local host. Start DbVisualizer. If the Connection Wizard appears, cancel out of it. You are going to enter the connection parameters in a form which shows all parameters at once.

Check SQLi Access Through AnalyticsNode_1

Create a connection using the DbVisualizer menus. From the DbVisualizer menu, select Database. From the menu, select Create Database Connection and then select the No Wizard button. This results in a Database Connection tab appearing with example parameter values. Enter the following parameters for the text boxes in the Connection tab.

Connection

Name : SQLi Check (what the connection is for)
Notes : -- (optional)

Database

Settings Format : Server Info (not changeable)
Database Type : PostgreSQL (type of database to read)
Driver (JDBC) : PostgreSQL (driver to use to connect to database)
Database Server : <HOST_4_IPv4_Public_IP> (IP address of IncortaNodeAnalytics_1)
Database Port : 5436 (SQLi port)
Database : example_tenant (name of the database; this is the tenant)

Authentication

Database Userid : admin (user ID to use when accessing the tenant)
Database Password : Incorta#1

Options

Auto Commit : <check>
Save Database Password : Save Between Sessions
Permission Mode : Development

When you have completed your entries, check to be sure you can access the server through your specified port. Select the Ping Server button. If that works, connect to the server and access the database by selecting the Connect button. If you cannot successfully ping the server, check the IP address, the port number and the database name are correct. If you cannot connect to the server after successfully pinging it, check the Database name and Authentication parameters and try again.

Checking the Operability of IncortaNodeAnalytics_1

You can run a query on the database to confirm your connection is completely operational. From the DbVisualizer menu, select SQL Commander then select New SQL Commander. Set Database Connection to SQLi Check. For the remaining text boxes enter :

example_tenant : SALES : 1000 : -1

Enter a query in the editor. For example:

select * from SALES.PRODUCTS

You should see a list of products in the output window below. This confirms your ability to connect to Incorta using SQLi with this Analytics service. Disconnect from the database. From the DbVisualizer menu, select Database, then, from the menu, select Disconnect.

Check SQLi Access Through AnalyticsNode_2

Return to the Connection tab and change the IP address for the Database Server parameter to the IP address of IncortaNodeAnalytics_2:

Database Server : <HOST_5_IPv4_Public_IP>

Select the Ping Server button to be sure the Node is accessible through the port. Then select the Connect button.

Checking the Operability of AnalyticsNode_2

Run a query as you did for IncortaNodeAnalytics_1. You can use the same SQL Commander tab; select it and select the run button. You will see the same results as you did for AnalyticsNode_2. Disconnect from the database. From the DbVisualizer menu, select Database, then, from the menu, select Disconnect.

Summary of Accomplishments So Far

At this point you have established basic functionality of the Cluster:

  • CMC: you created, composed and started a Cluster
  • SQLi: verified a connection to Incorta by running a query

Now that you know the cluster is operational, you can set up a load balancer to support High Availability (HA).

Add Support for High Availability

You can configure a load balancer to be a single point of access for the Analytics Nodes in your cluster. You can also configure the load balancer to monitor the health of the Analytics Nodes so as to not route traffic to Nodes that are unresponsive.

Load Balancers (LB)

AWS offers the Classic Load Balancer and Network Load Balancer for EC2 hosts. Generally the configuration concerns are:

  • Creating single point of access
  • Registration--tell the load balancer to which Nodes it should route traffic
  • Health Checks--tell the load balancer what to do when a Node is unresponsive

For more information and set up instructions, see:

To complete the tutorials you will need your AWS account credentials and the public IP addresses of your Analytics Nodes.

The end of this process yields a URL. This is the access point through the LB to your cluster. Here are two examples:

http://classiclb<aws-user-id>.us-east-1.elb.amazonaws.com:8080/incorta/
http://networklb<aws-user-id>.us-east-1.elb.amazonaws.com:8080/incorta/

Confirming Cluster High Availability

High Availability (HA) in Incorta means functionality is available as long as at least one Loader service and one Analytics service are running. You should be able to log in to Incorta, do work and expect to continue to do work if one of each service type is available.

In this section, you will test whether Incorta operates with High Availability given service outages:

  • one analytics service out
  • one loader service out

You will test these conditions using the Incorta web GUI as well as through DbVisualizer.

Completing these tests using both Incorta access methods will confirm your instance of Incorta is High Availability.

NOTE: Distributed schemas do not support High Availability Incorta Clusters.

Test the Classic Load Balancer through Incorta Web GUI

To complete the next two tests, you will need your running configured Cluster as well as the running and configured Classic Load Balancer (CLB).

Analytics Service

Here you will test whether the Analytics Services are High Availability. You will do this by viewing a Dashboard with both analytics services started and then with only one analytics service. If Dashboards can be viewed under both of these conditions, you can conclude Analytics Services are High Availability.

It is not possible to determine which of the two Analytics Services your session will engage when logging in. For this reason, you may need to stop both services in turn, looking for an effect on your session. For example, if you log in, starting a session with AnalyticsService_1, then stopping AnalyticsService_2 will have no impact on your session. However, if you stop AnalyticsService_1, your session will end, requiring you to log back in. Because of failover, when you re-start your session, you will be able to resume from where you left off.

Start by logging in to the CMC in the usual way. Select Clusters in the Navigation bar. Select Services in the action bar and verify that both Loader and both Analytics services are started. Next, start an Incorta session by logging in through the CLB. Select Content to view the dashboards. Select Dashboards Demo then select Sales Executive Dashboard. You will see a number of insights appear on the canvas. This confirms an analytics service is running. Next, you will experiment with enabling and disabling the analytics services through the CMC to test High Availability.

  1. Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select AnalyticsService_1. In the canvas, select Stop.
  2. Go to the Analytics session and refresh the Dashboard. One of two things will happen. If the refresh happens immediately, your session is running through AnalyticsService_2 and your session was unaffected. On the other hand, if the refresh stalls and you are eventually presented with the login screen, your session was running through AnalyticsService_1. If your session was running through AnalyticsService_1, you can now test High Availability. Proceed to step 3. Otherwise, proceed to step 4.
  3. Log in to Incorta through the CLB. You should find your session returns you right back to where you left off in the previous session. Refresh the Dashboard to confirm the session has been restored. Your session can only be running through AnalyticsService_2 now. Incorta has failed over so that you can continue working. Continue to step 7.
  4. Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select AnalyticsService_1. In the canvas, select Restart. In the Navigation bar, select Clusters then select exampleCluster in the canvas. Select Services in the Action bar, then select AnalyticsService_2. In the canvas, select Stop.
  5. Return to your Incorta session and refresh the Dashboard. The refresh should stall and you will eventually be logged out. This shows your session was running through AnalyticsService_2.
  6. Log in to Incorta through the CLB. You should find your session returns you right back to where you left off in the previous session. Refresh the Dashboard to confirm the session has been restored. Your session can only be running through AnalyticsService_1 now. Incorta has failed over so that you can continue your session. Continue to step 7.
  7. Go to the CMC, select Clusters in the Navigation bar, select Details in the canvas, then select Restart. Press the sync button periodically and be sure the services read started. Once they do, confirm all services have restarted by selecting on Services in the canvas.

This result reveals that Analytics sessions are High Availability. Though you need to log in to your session again, the status of the session is retained and shown to you immediately.

Loader Service

Here you will test whether the loader services are High Availability. You will do this by initiating schema load operations. You will find that the loader service is always available as long as at least one of them is started. This is showing the active:active nature of the Loader service.

Start by logging in to the CMC in the usual way. Select on Clusters in the Navigation bar. Select Services in the action bar and verify that both Loader and both Analytics services are started. Next, start an Incorta session by logging in through the CLB. In the Navigation bar, select Schema. From the list of schema, select SALES. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. Check the Last Load Status indicator. It should show close to the current time. This checks that a load will occur with both Loader services started. Next, you will experiment with enabling and disabling the Loader services to test High Availability.

  1. Stop LoaderService_1. Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select on LoaderService_1. In the canvas, select Stop.
  2. Check Loader functionality. Go to the Analytics session. In the Navigation bar, select Schema. From the list of schema, select SALES. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. After a few moments, the Last Load Status indicator should show the load succeeded at a new time relative to when you started this work.
  3. Restart LoaderService_1. Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select on LoaderService_1. In the canvas, select Restart.
  4. Stop LoaderService_2. Go to the CMC, select Clusters in the Navigation bar, select exampleCluster in the canvas. Select Services in the Action bar, then select on LoaderService_2. In the canvas, select Stop.
  5. Check Loader functionality. Go to the Analytics session. In the Navigation bar, select Schema. From the list of schema, select SALES. In the action bar select Load. From the Load menu select Load Now then select Full. At the Data Loading popup, select Load. After a few moments, the Last Load Status indicator should show the load succeeded at a new time relative to the last load request.

This shows the Incorta loader service loads schema transparently. The impact on the analytics session is just to show the schema is loading and eventually shows the load is complete. This result shows that loader services are also High Availability. This result shows the High Availability of the Load Services.

Test the Network Load Balancer Through the SQLi Interface

To complete this test, you will need your running and configured Cluster as well as the running and configured Network Load Balancer (NLB).

SQLi accesses Incorta through the network load balancer (NLB). The objective is to verify SQL scripts can be run through an SQLi connection with either of the analytics services stopped. This is similar to the analytics service test performed through the CLB. To complete this test, you will need to download and install DbVisualizer (free version).

  1. Log in to the CMC and verify all configured services are started.

  2. Start DbVisualizer if it is not already started and select the connection titled SQLi Check from the Database tab on the left of the window.

  3. In the canvas, change the value for Database Server to the NLB URL.

  4. Confirm the port number to 5436. Change to 5436 if necessary.

  5. Select Connect.

  6. If no SQL Commander window is visible, from the DbVisualizer menu, select SQL Commander then, from the menu, select New SQL Commander. Otherwise, select the existing SQL Commander tab.

  7. In the SQL Commander editor, run this SQL script:

    select * from SALES.PRODUCTS
  1. From the CMC, stop AnalyticsService_1.
  2. Run the SQL script again. You will see a list of products from the SALES database.
  3. Start AnalyticsService_1 and stop AnalyticsService_2.
  4. Run the SQL script. You will again see a list of products from the SALES database. You can conclude it does not matter which Analytics Service you are using and therefore connecting through SQLi and port 5436 is High Availability.
  5. Repeat steps 4 through 11 connecting through port 5442. Your results will be identical to those using port 5436.

As you have stopped both Analytics services one at a time, you can see that the SQLi interface supports High Availability.

Summary

Using both the Classic Load Balancer and Network Load Balancer, you have successfully confirmed High Availability for the Incorta Cluster.