Understanding Access Patterns in DynamoDB. Part1: Why Naive Reads and Entity Updates Can Cause Multiple Problems

Posted Sep 1, 2024 Updated Sep 2, 2024

By Roman Tsypuk

10 min read

DynamoDB, AWS’s NoSQL database service, is designed for scalability and performance. However, getting the best out of DynamoDB requires a deep understanding of access patterns.

Introduction

A common pitfall for new users is relying on naive read and entity update strategies, which can lead to significant issues in performance, consistency, and cost.

In this post, we’ll explore these access patterns, explain why they are problematic, and discuss best practices for designing efficient and reliable DynamoDB operations.

Naive Reads: The Problem of Inefficient Data Access

A naive read pattern in DynamoDB involves directly querying or scanning a table to retrieve data without considering the efficiency of the operation.

While this approach might seem straightforward, it often leads to several issues.

Entity Updates: The Pitfalls of Naive Write Operations

Naive entity updates involve directly modifying attributes of an item without considering concurrency, consistency, or the potential for data loss. This approach can lead to several serious problems:

Race Conditions:

When multiple processes or users attempt to update the same entity simultaneously, race conditions can occur.

Without proper handling, such as using conditional writes or transactions, one update might overwrite another, leading to inconsistent data.

Lost Updates:

DynamoDB’s default update behavior is “last writer wins,” meaning that if two updates occur simultaneously, the second update will overwrite the first.

This can lead to lost data if multiple processes are updating the same item concurrently without coordination.

Data Inconsistency:

Naive updates that don’t account for versioning or state can result in data inconsistencies. For example, if you update an item without checking its current version or state, you might inadvertently overwrite important changes made by another process.

Storage Model for Experiment

We will work with the same Storage model by running multiple experiments to verify data-loss factor and consistency.

In this post we will just see how problem dramatically scales with factor of producers.

Concurrent Experiments

Table Item actually a compound object - it has few attributes tuples each containing individual Domain Object and its version.

Thus, to make experiment even more complex we will assume that there are multiple producers that are updating same item in table by PK but each producer performs update of its own Sub-Entity and its version (with any update version is incremented).

Running different combination of producers and its concurrency level, we will track the final version numbers and state of Entities, by knowing that each consumer is granted to perform 1000 ops.

Workers in Experiments

Each worker has a pool of concurrent operations. In experiment we will scale both workers count and pool size of each worker.

Producers Diagram

Naive-approach Experiment of lost updates based on concurrency factor

  
def update_item(primary_key_value, entity_id):
  dynamodb = boto3.resource('dynamodb')
  table = dynamodb.Table('ReadModelTable')

  response = table.get_item(
    Key={
      'PrimaryKey': primary_key_value
    }
  )
  item = response.get('Item')

  if f'{entity_id}_version' in item:
    version = item[f'{entity_id}_version'] + 1  # Increment the version
    item[f'{entity_id}_version'] = version
    item[entity_id] = {
      'Name': entity_id,
      'version': version
    }
  else:
    item[f'{entity_id}_version'] = 1  # Initialize if it doesn't exist

  table.put_item(Item=item)


def main():
  # Arguments passed to the script
  args = sys.argv[1:]
  primary_key = args[0]
  req_count = int(args[1])
  workers = int(args[2])
  entity_id = args[3]

  primary_keys = [primary_key]  # List of primary keys to update
  primary_keys = primary_keys * req_count  # 100 ops

  with ThreadPoolExecutor(max_workers=workers) as executor:
    # Start the load operations and mark each future with its primary key
    futures = {executor.submit(update_item, key, entity_id): key for key in primary_keys}

    for future in as_completed(futures):
      primary_key = futures[future]
      try:
        future.result()
      except Exception as e:
        print(f"Exception occurred for {primary_key}: {e}")


if __name__ == "__main__":
  main()

Experiment Results:

Everything is perfect when we have single producer that writes sequentially (does not work on production, but works on my machine approach).

1 Worker: updates single Entity in compound object

We performed 1000 update requests with counter increment and its value corresponds to expected results:

Requests	Workers	Pool-size	Entities under update	Entity1 ver
1000	1	1	1	1000

But once even single producer is scaled we are noting lost writes:

Requests	Workers	Pool-size	Entities under update	Entity1 ver
1000	1	1	1	1000
1000	1	2	1	600
1000	1	4	1	430
1000	1	8	1	321
1000	1	16	1	218
1000	1	32	1	132
1000	1	64	1	92
1000	1	128	1	71

Take a look that the writes-lose is a factor of concurrent request. Thus pool of 128 concurrent requests are showing 14-times consistency decrease.

Data volume Consistency starts as 100% at single element pool, but drops down once more concurrency is added.

The problem scales dramatically as buzz-factor on adding every next producer:

2 Workers: each update own Entity in compound object

Requests	Workers	Pool-size	Entities under update	Entity1 ver	Entity2 ver
1000	2	1	2	829	828
1000	2	2	2	461	459
1000	2	4	2	261	341
1000	2	8	2	202	199
1000	2	16	2	131	124
1000	2	32	2	67	89
1000	2	64	2	41	60
1000	2	128	2	38	42

3 Workers: each update own Entity in compound object

Requests	Workers	Pool-size	Entities under update	Entity1 ver	Entity2 ver	Entity3 ver
1000	3	1	3	623	646	643
1000	3	2	3	316	324	462
1000	3	4	3	240	244	278
1000	3	8	3	147	153	184
1000	3	16	3	93	85	94
1000	3	32	3	61	64	58
1000	3	64	3	27	52	36
1000	3	128	3	38	23	24

4 Workers: each update own Entity in compound object

Requests	Workers	Pool-size	Entities under update	Entity1 ver	Entity2 ver	Entity3 ver	Entity4 ver
1000	4	1	4	423	453	372	468
1000	4	2	4	316	279	306	309
1000	4	4	4	179	180	210	216
1000	4	8	4	111	115	141	112
1000	4	16	4	63	76	83	64
1000	4	32	4	58	41	36	60
1000	4	64	4	37	27	28	34
1000	4	128	4	34	18	13	25

5 Workers: each update own Entity in compound object

Requests	Workers	Pool-size	Entities under update	Entity1 ver	Entity2 ver	Entity3 ver	Entity4 ver	Entity5 ver
1000	5	1	5	454	461	483	499	517
1000	5	2	5	298	244	278	270	244
1000	5	4	5	171	156	202	153	156
1000	5	8	5	95	106	108	91	98
1000	5	16	5	57	66	61	86	71
1000	5	32	5	42	47	44	50	47
1000	5	64	5	39	23	31	26	30
1000	5	128	5	30	9	13	20	10

Timing metrics

This metrics for documentation purpose - they will be used to compare writes speed with during future experiments with thread-safe technics:

Requests	Workers	Pool-size	req/sec	duration, sec
1000	1	1	4.18	239
1000	1	2	8.26	121
1000	1	4	15.63	64
1000	1	8	26.32	38
1000	1	16	45.45	22
1000	1	32	41.67	24
1000	1	64	38.46	26
1000	1	128	28.57	35

Requests	Workers	Pool-size	req/sec	duration, sec
1000	2	1	6.87	291
1000	2	2	16.53	121
1000	2	4	30.77	65
1000	2	8	54.05	37
1000	2	16	68.97	29
1000	2	32	68.97	29
1000	2	64	60.61	33
1000	2	128	32.79	61

Requests	Workers	Pool-size	req/sec	duration, sec
1000	3	1	12.61	238
1000	3	2	24.39	123
1000	3	4	45.45	66
1000	3	8	63.83	47
1000	3	16	83.33	36
1000	3	32	57.69	52
1000	3	64	54.55	55
1000	3	128	42.25	71

Requests	Workers	Pool-size	req/sec	duration, sec
1000	4	1	17.24	232
1000	4	2	33.90	118
1000	4	4	64.52	62
1000	4	8	100.00	40
1000	4	16	114.29	35
1000	4	32	90.91	44
1000	4	64	81.63	49
1000	4	128	53.33	75

Requests	Workers	Pool-size	req/sec	duration, sec
1000	5	1	17.39	230
1000	5	2	42.74	117
1000	5	4	80.65	62
1000	5	8	119.05	42
1000	5	16	121.95	41
1000	5	32	106.38	47
1000	5	64	78.13	64
1000	5	128	49.50	101

Conclusions

This reports shows multiple problems that occur on production systems that are operating with naive write with pre-loading approach.

In future posts we will overview and compare different optimisation technics to improve data quality and will see in which situations to use-or-not-to-use them for storage consistent results:

strong consistent reads
attribute-based updates with expressions
conditional updates
locking subsystem

References (Links)

AppendixA: Full experiment metrics

Requests	Workers	Pool-size	Entities under update	Entity1 ver	Entity2 ver	Entity3 ver	Entity4 ver	Entity5 ver
1000	1	1	1	1000	0	0	0	0
1000	1	2	1	600	0	0	0	0
1000	1	4	1	430	0	0	0	0
1000	1	8	1	321	0	0	0	0
1000	1	16	1	218	0	0	0	0
1000	1	32	1	132	0	0	0	0
1000	1	64	1	92	0	0	0	0
1000	1	128	1	71	0	0	0	0
1000	2	1	2	829	828	0	0	0
1000	2	2	2	461	459	0	0	0
1000	2	4	2	261	341	0	0	0
1000	2	8	2	202	199	0	0	0
1000	2	16	2	131	124	0	0	0
1000	2	32	2	67	89	0	0	0
1000	2	64	2	41	60	0	0	0
1000	2	128	2	38	42	0	0	0
1000	3	1	3	623	646	643	0	0
1000	3	2	3	316	324	462	0	0
1000	3	4	3	240	244	278	0	0
1000	3	8	3	147	153	184	0	0
1000	3	16	3	93	85	94	0	0
1000	3	32	3	61	64	58	0	0
1000	3	64	3	27	52	36	0	0
1000	3	128	3	38	23	24	0	0
1000	4	1	4	423	453	372	468	0
1000	4	2	4	316	279	306	309	0
1000	4	4	4	179	180	210	216	0
1000	4	8	4	111	115	141	112	0
1000	4	16	4	63	76	83	64	0
1000	4	32	4	58	41	36	60	0
1000	4	64	4	37	27	28	34	0
1000	4	128	4	34	18	13	25	0
1000	5	1	5	454	461	483	499	517
1000	5	2	5	298	244	278	270	244
1000	5	4	5	171	156	202	153	156
1000	5	8	5	95	106	108	91	98
1000	5	16	5	57	66	61	86	71
1000	5	32	5	42	47	44	50	47
1000	5	64	5	39	23	31	26	30
1000	5	128	5	30	9	13	20	10

aws, dynamodb, naive

This post is licensed under CC BY 4.0 by the author.