DynamoDB Table Design Principles - NoSQL Modeling
Abstract
DynamoDB is one of the most popular NoSQL DB storages, that has amazing performance and scaling characteristics and is widely used in many designs. There is mindshift that requires us to switch from traditional RDBMS storages to NoSQL, especially in terms of Data cardinality, denormalization, non-Join queries when planning and using NoSQL storages.
Introduction
The main caveat when using DynamoDB is to properly plan you Table structure and indexes in a way that will reflect to your data access patterns before the Table is created. And this is due to some limits.
First of all structures like LSIs are only possible to be created on the Table creation time, secondly DynamoDB does not support Regular DB schema Migration, that we used to have in RDBMS world (like flyway, liquibase, etc.), so ALTERing the table is not the operation that we can afford.
This post describes the variety of technics for Data Access to DynamoDB and also explains Data Access Patterns planning on the Table design stage.
Target DynamoDB Table Structure
NoSQL DynamoDB Table Structure is generated using MultiCloud Diagrams Framework
Methodology
All access patterns are driven by the proper defined Primary Key. Let’s go through the process of Table planning with overview of each structure. DynamoDB provides following PKs:
- Simple PK (partition key)
- Compound PK (partition key & sort key)
Table Structure
Entity Relationship Diagram
erDiagram
MOVIES {
string Actor
string Movie
string Genre
string Year
}
Operations
- API level access
- Query level access
- Scans
API level Get Item operation
To perform getItem API operation, we must include both existing PK and Sort Key (it is not possible to query by PK if SK exists).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'
# Define the key(s) for the item you want to retrieve
key = {
'Actor': {'S': 'Tom Cruise'},
'Movie': {'S': 'Top Gun'}
}
response = dynamodb.get_item(
TableName=table_name,
Key=key,
# Add any optional parameters if needed, e.g., ProjectionExpression, ConsistentRead, etc.
)
item = response.get('Item')
We have a requirement to support following data access patterns to access Table data based on Actor and Movie pairs:
ID | Access Pattern |
---|---|
A1 | Get All movies of Actor |
A2 | Get movie details with Actor and movie name |
A3 | Get movies of Actor that starts with ‘A’ |
All this Access Patterns have the same PK and same SK and can be handled from initial Table key design out of the box.
Query
Queries are the most powerful request mechanism in DynamoDB. They allow to access the data only by PK or using compound PK and SK
SK Condition
When querying through (PK and SK) besides standard SK comparison with some particular value, following operations can be used:
- Equal to
- Less than or equal to
- Less than
- Greater than or equal to
- Greater than
- Between
- Begins with
Using Queries we can handle all these Access Patterns:
ID | Access Pattern | Access BY | Key Condition | Filter Condition |
---|---|---|---|---|
A1 | Get All movies of Actor | BY actor name | “Actor = ‘Tom Cruise’” | |
A2 | Get movie details with Actor and movie name | BY actor name and BY movie name | “Actor = ‘Tom Cruise’ AND EQUALS_TO(Movie, ‘Top Gun’)” | |
A3 | Get movies of Actor that starts with ‘A’ | BY actor name and BY movie name | “Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Movie, ‘T’)” |
Projection Expression
Projections allow to limit attributes in the response, here is the example with boto3
library to get only Year
and Genre
attributes from table.
1
2
3
4
5
6
7
8
9
10
11
12
13
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'
key_condition_expression = Key('Movie').eq('Top Gun') & Key('Actor').begins_with('Tom')
projection_expression = 'Year, Genre'
response = dynamodb.query(
TableName=table_name,
KeyConditionExpression=key_condition_expression,
ProjectionExpression=projection_expression
)
items = response['Items']
LSI
We have more access patterns in requirements - to get Movies based on actor name and genre type, to get actor movies within specific range.
ID | Access Pattern |
---|---|
A4 | Get movies Actor that have starting from ‘2020’ year |
A5 | Get movies Actor that have Genre ‘Comedy’ |
Both these access patterns share the same PK as in our Table, but have distinct SK (Year and Genre). Of course, it is possible to query or scan the DataSet and apply Filtering on top, but it will be not efficient. Because DynamoDB first retrieves all data based on PK, SK and on retrieved DataSet applies the Filters Predicate.
Much more effective is to define LSI - it is an index that has same PK as a main Table but distict attribute for SK:
ID | Access Pattern | Access BY | Key Condition | Filter Condition |
---|---|---|---|---|
A4 | Get movies Actor that have starting from ‘2020’ year | LSI: BY actor name and BY year | “Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Year, 2022)” | |
A5 | Get movies Actor that have Genre ‘Comedy’ | LSI: BY actor name and BY Genre | “Actor = ‘Tom Cruise’ AND EQUAL_TO(Genre, ‘Action’)” |
Scan
This is the most expensive operation to access the Table data. We should always avoid usage of the full table scan, due to:
- huge negative impact to extra consumed Table Read Capacity
- too long Time of the scan operation
- no PK or Index used in operation
But in some cases Scan operation is used: small Table size, no possibility to introduce and consume one more GSI, etc.
1
2
3
4
5
6
7
8
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'
response = dynamodb.scan(
TableName=table_name,
# Add any optional parameters if needed, e.g., FilterExpression, ProjectionExpression, etc.
)
items = response['Items']
GSIs
Help to introduce additional Data Access Patterns that are not covered with Primary and Sort Keys, we can introduce and use GSIs.
GSI Inverted Index
We will use technic called Inverted Index - PK(Actor) and SK(Movie) of Movies Table will be flipped in the GSI to PK(Movie) and SK(Actor). GSI automatically updates data when Items are added/updated/deleted in Main Table, there is no need to add extra inserts into Index Table, synchronize them with Main Table - all these are automatically provisioned.
So, after Flipping Movies <-> Actors Tables we have required access pattern that is using non PK attribute of Main Table:
ID | Access Pattern | Access BY | Key Condition | Filter Condition |
---|---|---|---|---|
A6 | Get all actors of Movie | GSI: BY movie name | “Movie = ‘Top Gun’” |
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'
# Define the GSI name
index_name = 'GSI1'
# Define the key condition expression and expression attribute values
key_condition_expression = Key('Movie').eq('Top Gun')
projection_expression = 'Year, Genre'
# Perform the query on the GSI
response = dynamodb.query(
TableName=table_name,
IndexName=index_name,
KeyConditionExpression=key_condition_expression,
ProjectionExpression=projection_expression,
)
items = response['Items']
Entity Chart
Following Entity Chart are describing Table and Index structures:
Entity | PK | SK |
---|---|---|
Movies | Actor | Movie |
Movies#LSI | Actor | Year |
Movies#LSI | Actor | Genre |
Movies#GSI | Movie | Actor |
As a result we fully provided Table structure for required access patterns with 1 LSI, 1 GSI:
ID | Access Pattern | Access BY | Key Condition | Filter Condition |
---|---|---|---|---|
A1 | Get All movies of Actor | BY actor name | “Actor = ‘Tom Cruise’” | |
A2 | Get movie details with Actor and movie name | BY actor name and BY movie name | “Actor = ‘Tom Cruise’ AND EQUALS_TO(Movie, ‘Top Gun’)” | |
A3 | Get movies of Actor that starts with ‘A’ | BY actor name and BY movie name | “Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Movie, ‘T’)” | |
A4 | Get movies Actor that have starting from ‘2020’ year | LSI: BY actor name and BY year | “Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Year, 2022)” | |
A5 | Get movies Actor that have Genre ‘Comedy’ | LSI: BY actor name and BY Genre | “Actor = ‘Tom Cruise’ AND EQUAL_TO(Genre, ‘Action’)” | |
A6 | Get all actors of Movie | GSI: BY movie name | “Movie = ‘Top Gun’” |