DynamoDB Table Design Principles - NoSQL Modeling

Posted Oct 2, 2023 Updated Feb 25, 2024

By Roman Tsypuk

7 min read

Abstract

DynamoDB is one of the most popular NoSQL DB storages, that has amazing performance and scaling characteristics and is widely used in many designs. There is mindshift that requires us to switch from traditional RDBMS storages to NoSQL, especially in terms of Data cardinality, denormalization, non-Join queries when planning and using NoSQL storages.

Introduction

The main caveat when using DynamoDB is to properly plan you Table structure and indexes in a way that will reflect to your data access patterns before the Table is created. And this is due to some limits.

First of all structures like LSIs are only possible to be created on the Table creation time, secondly DynamoDB does not support Regular DB schema Migration, that we used to have in RDBMS world (like flyway, liquibase, etc.), so ALTERing the table is not the operation that we can afford.

This post describes the variety of technics for Data Access to DynamoDB and also explains Data Access Patterns planning on the Table design stage.

Target DynamoDB Table Structure

NoSQL DynamoDB Table Structure is generated using MultiCloud Diagrams Framework

Methodology

All access patterns are driven by the proper defined Primary Key. Let’s go through the process of Table planning with overview of each structure. DynamoDB provides following PKs:

Simple PK (partition key)
Compound PK (partition key & sort key)

Table Structure

Entity Relationship Diagram

erDiagram
  MOVIES {
    string Actor
    string Movie
    string Genre
    string Year
  }

Operations

API level access
Query level access
Scans

API level Get Item operation

To perform getItem API operation, we must include both existing PK and Sort Key (it is not possible to query by PK if SK exists).

  
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'
# Define the key(s) for the item you want to retrieve
key = {
  'Actor': {'S': 'Tom Cruise'},
  'Movie': {'S': 'Top Gun'}
}

response = dynamodb.get_item(
  TableName=table_name,
  Key=key,
  # Add any optional parameters if needed, e.g., ProjectionExpression, ConsistentRead, etc.
)
item = response.get('Item')

We have a requirement to support following data access patterns to access Table data based on Actor and Movie pairs:

ID	Access Pattern
A1	Get All movies of Actor
A2	Get movie details with Actor and movie name
A3	Get movies of Actor that starts with ‘A’

All this Access Patterns have the same PK and same SK and can be handled from initial Table key design out of the box.

Query

Queries are the most powerful request mechanism in DynamoDB. They allow to access the data only by PK or using compound PK and SK

SK Condition

When querying through (PK and SK) besides standard SK comparison with some particular value, following operations can be used:

Equal to
Less than or equal to
Less than
Greater than or equal to
Greater than
Between
Begins with

Using Queries we can handle all these Access Patterns:

ID	Access Pattern	Access BY	Key Condition
A1	Get All movies of Actor	BY actor name	“Actor = ‘Tom Cruise’”
A2	Get movie details with Actor and movie name	BY actor name and BY movie name	“Actor = ‘Tom Cruise’ AND EQUALS_TO(Movie, ‘Top Gun’)”
A3	Get movies of Actor that starts with ‘A’	BY actor name and BY movie name	“Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Movie, ‘T’)”

Projection Expression

Projections allow to limit attributes in the response, here is the example with boto3 library to get only Year and Genre attributes from table.

  
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'

key_condition_expression = Key('Movie').eq('Top Gun') & Key('Actor').begins_with('Tom')
projection_expression = 'Year, Genre'

response = dynamodb.query(
  TableName=table_name,
  KeyConditionExpression=key_condition_expression,
  ProjectionExpression=projection_expression
)

items = response['Items']

LSI

We have more access patterns in requirements - to get Movies based on actor name and genre type, to get actor movies within specific range.

ID	Access Pattern
A4	Get movies Actor that have starting from ‘2020’ year
A5	Get movies Actor that have Genre ‘Comedy’

Both these access patterns share the same PK as in our Table, but have distinct SK (Year and Genre). Of course, it is possible to query or scan the DataSet and apply Filtering on top, but it will be not efficient. Because DynamoDB first retrieves all data based on PK, SK and on retrieved DataSet applies the Filters Predicate.

Much more effective is to define LSI - it is an index that has same PK as a main Table but distict attribute for SK:

ID	Access Pattern	Access BY	Key Condition	Filter Condition
A4	Get movies Actor that have starting from ‘2020’ year	LSI: BY actor name and BY year	“Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Year, 2022)”
A5	Get movies Actor that have Genre ‘Comedy’	LSI: BY actor name and BY Genre	“Actor = ‘Tom Cruise’ AND EQUAL_TO(Genre, ‘Action’)”

Scan

This is the most expensive operation to access the Table data. We should always avoid usage of the full table scan, due to:

huge negative impact to extra consumed Table Read Capacity
too long Time of the scan operation
no PK or Index used in operation

But in some cases Scan operation is used: small Table size, no possibility to introduce and consume one more GSI, etc.

  
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'

response = dynamodb.scan(
  TableName=table_name,
  # Add any optional parameters if needed, e.g., FilterExpression, ProjectionExpression, etc.
)
items = response['Items']

GSIs

Help to introduce additional Data Access Patterns that are not covered with Primary and Sort Keys, we can introduce and use GSIs.

GSI Inverted Index

We will use technic called Inverted Index - PK(Actor) and SK(Movie) of Movies Table will be flipped in the GSI to PK(Movie) and SK(Actor). GSI automatically updates data when Items are added/updated/deleted in Main Table, there is no need to add extra inserts into Index Table, synchronize them with Main Table - all these are automatically provisioned.

So, after Flipping Movies <-> Actors Tables we have required access pattern that is using non PK attribute of Main Table:

ID	Access Pattern	Access BY	Key Condition	Filter Condition
A6	Get all actors of Movie	GSI: BY movie name	“Movie = ‘Top Gun’”

  
dynamodb = boto3.client('dynamodb')
table_name = 'Movies'

# Define the GSI name
index_name = 'GSI1'

# Define the key condition expression and expression attribute values
key_condition_expression = Key('Movie').eq('Top Gun')
projection_expression = 'Year, Genre'

# Perform the query on the GSI
response = dynamodb.query(
  TableName=table_name,
  IndexName=index_name,
  KeyConditionExpression=key_condition_expression,
  ProjectionExpression=projection_expression,
)

items = response['Items']

Entity Chart

Following Entity Chart are describing Table and Index structures:

Entity	PK	SK
Movies	Actor	Movie
Movies#LSI	Actor	Year
Movies#LSI	Actor	Genre
Movies#GSI	Movie	Actor

As a result we fully provided Table structure for required access patterns with 1 LSI, 1 GSI:

ID	Access Pattern	Access BY	Key Condition
A1	Get All movies of Actor	BY actor name	“Actor = ‘Tom Cruise’”
A2	Get movie details with Actor and movie name	BY actor name and BY movie name	“Actor = ‘Tom Cruise’ AND EQUALS_TO(Movie, ‘Top Gun’)”
A3	Get movies of Actor that starts with ‘A’	BY actor name and BY movie name	“Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Movie, ‘T’)”
A4	Get movies Actor that have starting from ‘2020’ year	LSI: BY actor name and BY year	“Actor = ‘Tom Cruise’ AND GREATER_THAN_OR_EQUAL_TO(Year, 2022)”
A5	Get movies Actor that have Genre ‘Comedy’	LSI: BY actor name and BY Genre	“Actor = ‘Tom Cruise’ AND EQUAL_TO(Genre, ‘Action’)”
A6	Get all actors of Movie	GSI: BY movie name	“Movie = ‘Top Gun’”

References (Links)

aws, dynamodb, nosql

This post is licensed under CC BY 4.0 by the author.