Connectors → Amazon Web Services (AWS) DynamoDB
About Amazon Web Services (AWS) DynamoDB
AWS DynamoDB is a fully managed, proprietary NoSQL database service that supports key-value and document data structures and is offered as part of the Amazon Web Services (AWS) portfolio. It is a multi-region, multi-active, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications.
About The AWS DynamoDB Connector
The AWS DynamoDB connector uses the incorta.connector.dynamodb.jar
driver. The AWS DynamoDB connector supports the following Incorta specific functionality:
Feature | Supported |
---|---|
Chunking | |
Data Agent | |
Encryption at Ingest | |
Incremental Load | ✔ |
Multi-Source | ✔ |
OAuth | |
Performance Optimized | ✔ |
Remote | |
Single-Source | ✔ |
Spark Extraction | |
Webhook Callbacks | ✔ |
The AWS DynamoDB connector authentication methods
The AWS DynamoDB connector supports two methods for authentication:
- Access Key: in this method, Incorta uses Access Key ID and Secret Access Key to get access to resources in your AWS account. Access keys are long-term credentials for an AWS Identity and Access Management (IAM) user or the AWS account root user.
- Temporary Security Credentials: In this method, Incorta uses a temporary session token, in addition to an Access Key ID and Secret Access Key, to get access to resources in your AWS account. The temporary security credentials are short-term. After they expire, they are no longer valid, and you must get a new set of credentials.
For more information, see Understanding and getting your AWS credentials
Steps to connect AWS DynamoDB and Incorta
To connect AWS DynamoDB and Incorta, here are the high level steps, tools, and procedures:
- Create an external data source
- Create a schema with the Schema Wizard
- or, Create a schema with the Schema Designer
- Load the schema
- Explore the schema
Create an external data source
Here are the steps to create a external data source with the AWS DynamoDB connector:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Data.
- In the Action bar, select + New → Add Data Source.
- In the Choose a Data Source dialog, in Application, select DynamoDB.
- In the New Data Source dialog, specify the applicable connector properties.
- To test, select Test Connection.
- Select Ok to save your changes.
AWS DynamoDB connector properties
Here are the properties for the AWS DynamoDB connector:
Property | Control | Description |
---|---|---|
Data Source Name | text box | Enter the name of the data source |
Authentication Method | drop down list | Select the authentication method for Incorta to get access to resources in your AWS account. Select between Access Key and Temporary Session Token. |
Access Key ID | text box | Enter the Access Key ID for your AWS account or the temporary Access Key ID depending upon the authentication method you select |
Secret Access Key | text box | Enter the Secret Access Key for your AWS account or the temporary Secret Access Key depending upon the authentication method you select |
Temporary Session Token | text box | Select Temporary Session Token for the Authentication Method to configure this property. Enter the temporary session token associated with the temporary Access Key ID and Secret Access Key you entered. |
Region | drop down list | Select the region defined for your AWS DynamoDB service |
Extra Options | text box | Enter supported extra options in the form of key=value . |
Create a schema with the Schema Wizard
Here are the steps to create an AWS DynamoDB schema with the Schema Wizard:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Schema Wizard
- In (1) Choose a Source, specify the following:
- For Enter a name, enter the schema name.
- For Select a Datasource, select the DynamoDB external data source.
- Optionally, create a description.
- In the Schema Wizard footer, select Next.
- In (2) Manage Tables, in the Data Panel, first select the name of the Data Source, and then check the Select All checkbox.
- In the Schema Wizard footer, select Next.
- In (3) Finalize, in the Schema Wizard footer, select Create Schema.
As DynamoDB is not designed as a relational database and does not support join operations, Incorta does not automatically create joins between the schema tables. You need to define them manually.
Create a schema with the Schema Designer
Here are the steps to create an AWS DynamoDB schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the Action bar, select + New → Create Schema.
- In Name, specify the schema name, and select Save.
- In Start adding tables to your schema, select DynamoDB.
- In the Data Source dialog, specify the DynamoDB table data source properties.
- Select Add.
- In the Table Editor, in the Table Summary section, enter the table name.
- To save your changes, select Done in the Action bar.
DynamoDB table data source properties
For a schema table in Incorta, you can define the following DynamoDB specific data source properties as follows:
Property | Control | Description | Comment / Example |
---|---|---|---|
Type | drop down list | Default is DynamoDB | |
Data Source | drop down list | Select the DynamoDB external data source | |
Select Table | drop down list | Select the table from the selected data source | |
Filter Expression | text box | When you select this text box, it invokes the Query Editor. Enter the filter expression to refine the table query results. Rows that do not match the filter conditions are not returned. Use the : (colon) character in the expression to dereference an attribute value. | Price >= :num and ProductStatus IN (:avail, :back, :disc) |
Expression Attribute Values | text box | When you select this text box, it invokes the Query Editor. Enter an expression to specify one or more values that can be substituted in an expression. | {":num":{"N":"200"}, ":city":{"S":"New York"}, ":active":{"BOOL":"true"}} |
Expression Attribute Names | text box | When you select this text box, it invokes the Query Editor. Enter an expression to specify one or more substitution tokens for attribute names in an expression. | {"#P":"Percentile"} where #P is the attribute substitution and Percentile is the attribute |
Projection Expression | text box | When you select this text box, it invokes the Query Editor. Enter a string that identifies one or more attributes (columns) to retrieve from the specified table or index. Enter the names of the attributes separated by commas. | ProductCategory, Description, Price |
Timestamp and Date Columns | text box | When you select this text box, it invokes the Query Editor. Enter a JSON-formatted string to describe the date and timestamp columns that the table may have. | {"OrderDate": {"type": "timestamp", "format": "epoch-milliseconds", "timezone": "-05:00"}} |
Incremental | toggle | Enable the incremental load configuration for the schema table | |
Incremental Column | drop down list | Enable Incremental to configure this property. Select the column to use for incremental loading. | You can select from only the columns that you have defined in the Timestamp and Date Columns. |
Number of Workers | text box | The number of threads to load the table’s data in parallel | The default is 5. |
Maximum Number of Items per Worker | text box | The maximum number of items to load from this table | This is similar to the SQL limit clause. Leave this property blank to retrieve all items. |
Sample Size | text box | The number of items (rows) to sample from the table while discovering the table schema | The default is 5000. Enter -1 to include all items. |
Support Multisource Tables | toggle | Enable this option if you are planning to use this DynamoDB data set with other data sets for the same table | Disabling this option will make the table faster to load. |
Callback | toggle | Enable post extraction callback, that is, enable callback on the data source data set(s) by invoking a certain callback URL with parameters containing details about the load job. | |
Callback URL | text box | Enable Callback to configure this property. Specify the callback URL. |
For more information about the table properties, see Additional Considerations.
View the schema diagram with the Schema Diagram Viewer
Here are the steps to view the schema diagram using the Schema Diagram Viewer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the list of schemas, select the DynamoDB schema.
- In the Schema Designer, in the Action bar, select Diagram.
Only joins that you manually create appear on the diagram as there are no joins automatically created between the schema tables.
Load the schema
Here are the steps to perform a Full Load of the DynamoDB schema using the Schema Designer:
- Sign in to the Incorta Direct Data Platform™.
- In the Navigation bar, select Schema.
- In the list of schemas, select the DynamoDB schema.
- In the Schema Designer, in the Action bar, select Load → Full Load.
- To review the load status, in Last Load Status, select the date.
Explore the schema
With the full load of the DynamoDB schema complete, you can use the Analyzer to explore the schema, create your first insight, and save the insight to a new dashboard.
To open the Analyzer from the schema, follow these steps:
- In the Navigation bar, select Schema.
- In the Schema Manager, in the List view, select the DynamoDB schema.
- In the Schema Designer, in the Action bar, select Explore Data.
For more information about how to use the Analyzer to create insights, see Analyzer and Visualizations.
Additional Considerations
Filter Expressions
- A filter expression is applied after a query finishes, but before the results are returned. Therefore, a query consumes the same amount of read capacity, regardless of whether a filter expression is present.
- A Query operation can retrieve a maximum of 1 MB of data. This limit applies before the filter expression is evaluated.
- In the table properties, in the filter expression, you can reference primary key or sort key attributes.
- The syntax of a filter expression consists of the attribute (column or field), the operator or function, and the pointer to or placeholder of the attribute values that you define in the Expression Attribute Values.
The following is an example of a filter expression:
Price >= :num and ProductStatus IN (:avail, :back, :disc)
Price
andProductStatus
are the columns or attributes>=
,and
andIN
are the operators and functions in the expression:num
,:avail
,:back
,:disc
are the pointers to the expression attribute values
For more information, refer to Working with Queries in DynamoDB.
Expression Attribute Values
If you need to compare an attribute with a value, define an expression attribute value as a placeholder. Expression attribute values in AWS DynamoDB are substitutes for the actual values that you want to compare. An expression attribute value must begin with a colon :
and be followed by one or more alphanumeric characters.
Examples of expression attribute values:
{":num":{"N":"200"}, ":avail":{"S":"Available"}, ":active":{"BOOL":"true"}}
:num
,:avail
andactive
are pointers to the attribute values.N
,S
andBOOL
are the data types of the attribute values, which are number, string, and boolean, respectively. For more information, see Supported Data Types.200
,Available
, andtrue
are the dereferenced attribute values.
You can then use these expression attribute values in an expression, for example, ProductStatus IN (:avail, :back, :disc)
.
Expression Attribute Names
An expression attribute name is a placeholder that you use in an AWS DynamoDB expression as an alternative to an actual attribute name. An expression attribute name must begin with a pound sign #
and be followed by one or more alphanumeric characters.
The following are some use cases for using Expression Attribute Names:
- To access an attribute whose name conflicts with a DynamoDB reserved word.
- To create a placeholder for repeating occurrences of an attribute name in an expression.
- To prevent special characters in an attribute name from being misinterpreted in an expression.
{"#P":"Percentile"}
and {"#N":"Name"}
are examples of expression attribute names:
#P
and#N
are the attribute substitutions.Percentile
andName
are the actual attributes.
For more information, see Expression Attribute Names in DynamoDB.
Projection Expressions
A Projection Expression is a string that identifies one or more attributes (columns or fields) to retrieve from the specified table or index. These attributes can include scalars, sets, or elements of a JSON document. The attributes in the expression must be separated by commas. If you do not specify any attributes, all attributes will be returned. If any of the requested attributes are not found, they will not appear in the result. For more information, see Projection Expressions.
Timestamp and Date Columns
Since DynamoDB doesn’t natively support date or timestamp data types, date and timestamp data is represented using either string or number attributes. For more information, see Naming Rules and Data Types.
In order to manipulate date and timestamp data in Incorta, you need to define each date or timestamp column or attribute in the table properties.
For any date or timestamp column, you have to specify the following information:
- Type: date or timestamp
- Format:
- Epoch-seconds
- Epoch-milliseconds
- Any Java-compliant timestamp pattern, for example
yyyy-MM-dd'T'HH:mm:ss.SSS
. For more information, see the Patterns for Formatting and Parsing section on the Class DateTimeFormatter page.
- Timezone: Optional. Set a valid Java timezone ID. If you don’t set it, it is defaulted to the timezone of the Incorta server. For more information about the timezones, see the
of
method on the Class ZoneOffset page.
The following is an example of the definition of multiple date and timestamp columns or attributes in the table properties.
{"UpdateTime": {"type": "timestamp","format": "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'","timezone": "-07:00"},"OrderTime": {"type": "timestamp","format": "epoch-milliseconds","timezone": "-07:00"},"Birthdate": {"type": "date","format": "yyyy-MM-dd","timezone": "-07:00"}}
Supported Incremental Loads
You can enable Incremental Load for a DynamoDB table data source. Incremental load will depend upon the maximum value in the Incremental Column you select in the table properties. Make sure you define a key column in the table before using the incremental load; otherwise, when you run an incremental load, both new and updated data will be added to the existing data resulting in duplicate rows.