Post

Understanding Access Patterns in DynamoDB. Part1: Why Naive Reads and Entity Updates Can Cause Multiple Problems

DynamoDB, AWS’s NoSQL database service, is designed for scalability and performance. However, getting the best out of DynamoDB requires a deep understanding of access patterns.

Introduction

A common pitfall for new users is relying on naive read and entity update strategies, which can lead to significant issues in performance, consistency, and cost.

In this post, we’ll explore these access patterns, explain why they are problematic, and discuss best practices for designing efficient and reliable DynamoDB operations.

Naive Reads: The Problem of Inefficient Data Access

A naive read pattern in DynamoDB involves directly querying or scanning a table to retrieve data without considering the efficiency of the operation.

While this approach might seem straightforward, it often leads to several issues.

Entity Updates: The Pitfalls of Naive Write Operations

Naive entity updates involve directly modifying attributes of an item without considering concurrency, consistency, or the potential for data loss. This approach can lead to several serious problems:

Race Conditions:

When multiple processes or users attempt to update the same entity simultaneously, race conditions can occur.

Without proper handling, such as using conditional writes or transactions, one update might overwrite another, leading to inconsistent data.

Lost Updates:

DynamoDB’s default update behavior is “last writer wins,” meaning that if two updates occur simultaneously, the second update will overwrite the first.

This can lead to lost data if multiple processes are updating the same item concurrently without coordination.

Data Inconsistency:

Naive updates that don’t account for versioning or state can result in data inconsistencies. For example, if you update an item without checking its current version or state, you might inadvertently overwrite important changes made by another process.

Storage Model for Experiment

We will work with the same Storage model by running multiple experiments to verify data-loss factor and consistency.

In this post we will just see how problem dramatically scales with factor of producers.

Concurrent Experiments

Table Item actually a compound object - it has few attributes tuples each containing individual Domain Object and its version.

dynamo.png

Thus, to make experiment even more complex we will assume that there are multiple producers that are updating same item in table by PK but each producer performs update of its own Sub-Entity and its version (with any update version is incremented).

Running different combination of producers and its concurrency level, we will track the final version numbers and state of Entities, by knowing that each consumer is granted to perform 1000 ops.

Workers in Experiments

Each worker has a pool of concurrent operations. In experiment we will scale both workers count and pool size of each worker.

workers.png

Producers Diagram

compound_entity.png

Naive-approach Experiment of lost updates based on concurrency factor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
def update_item(primary_key_value, entity_id):
  dynamodb = boto3.resource('dynamodb')
  table = dynamodb.Table('ReadModelTable')

  response = table.get_item(
    Key={
      'PrimaryKey': primary_key_value
    }
  )
  item = response.get('Item')

  if f'{entity_id}_version' in item:
    version = item[f'{entity_id}_version'] + 1  # Increment the version
    item[f'{entity_id}_version'] = version
    item[entity_id] = {
      'Name': entity_id,
      'version': version
    }
  else:
    item[f'{entity_id}_version'] = 1  # Initialize if it doesn't exist

  table.put_item(Item=item)


def main():
  # Arguments passed to the script
  args = sys.argv[1:]
  primary_key = args[0]
  req_count = int(args[1])
  workers = int(args[2])
  entity_id = args[3]

  primary_keys = [primary_key]  # List of primary keys to update
  primary_keys = primary_keys * req_count  # 100 ops

  with ThreadPoolExecutor(max_workers=workers) as executor:
    # Start the load operations and mark each future with its primary key
    futures = {executor.submit(update_item, key, entity_id): key for key in primary_keys}

    for future in as_completed(futures):
      primary_key = futures[future]
      try:
        future.result()
      except Exception as e:
        print(f"Exception occurred for {primary_key}: {e}")


if __name__ == "__main__":
  main()

Experiment Results:

Everything is perfect when we have single producer that writes sequentially (does not work on production, but works on my machine approach).

1 Worker: updates single Entity in compound object

We performed 1000 update requests with counter increment and its value corresponds to expected results:

RequestsWorkersPool-sizeEntities under updateEntity1 ver
10001111000

But once even single producer is scaled we are noting lost writes:

RequestsWorkersPool-sizeEntities under updateEntity1 ver
10001111000
1000121600
1000141430
1000181321
10001161218
10001321132
1000164192
10001128171

Take a look that the writes-lose is a factor of concurrent request. Thus pool of 128 concurrent requests are showing 14-times consistency decrease.

Data volume Consistency starts as 100% at single element pool, but drops down once more concurrency is added.

data_consistency_1worker.png

The problem scales dramatically as buzz-factor on adding every next producer:

2 Workers: each update own Entity in compound object

RequestsWorkersPool-sizeEntities under updateEntity1 verEntity2 ver
1000212829828
1000222461459
1000242261341
1000282202199
10002162131124
100023226789
100026424160
1000212823842

data_consustency_2w.png

3 Workers: each update own Entity in compound object

RequestsWorkersPool-sizeEntities under updateEntity1 verEntity2 verEntity3 ver
1000313623646643
1000323316324462
1000343240244278
1000383147153184
10003163938594
10003323616458
10003643275236
100031283382324

data_concurency_3w.png

4 Workers: each update own Entity in compound object

RequestsWorkersPool-sizeEntities under updateEntity1 verEntity2 verEntity3 verEntity4 ver
1000414423453372468
1000424316279306309
1000444179180210216
1000484111115141112
1000416463768364
1000432458413660
1000464437272834
10004128434181325

5 Workers: each update own Entity in compound object

RequestsWorkersPool-sizeEntities under updateEntity1 verEntity2 verEntity3 verEntity4 verEntity5 ver
1000515454461483499517
1000525298244278270244
1000545171156202153156
1000585951061089198
100051655766618671
100053254247445047
100056453923312630
100051285309132010

Timing metrics

This metrics for documentation purpose - they will be used to compare writes speed with during future experiments with thread-safe technics:

RequestsWorkersPool-sizereq/secduration, sec
1000114.18239
1000128.26121
10001415.6364
10001826.3238
100011645.4522
100013241.6724
100016438.4626
1000112828.5735
RequestsWorkersPool-sizereq/secduration, sec
1000216.87291
10002216.53121
10002430.7765
10002854.0537
100021668.9729
100023268.9729
100026460.6133
1000212832.7961
RequestsWorkersPool-sizereq/secduration, sec
10003112.61238
10003224.39123
10003445.4566
10003863.8347
100031683.3336
100033257.6952
100036454.5555
1000312842.2571
RequestsWorkersPool-sizereq/secduration, sec
10004117.24232
10004233.90118
10004464.5262
100048100.0040
1000416114.2935
100043290.9144
100046481.6349
1000412853.3375
RequestsWorkersPool-sizereq/secduration, sec
10005117.39230
10005242.74117
10005480.6562
100058119.0542
1000516121.9541
1000532106.3847
100056478.1364
1000512849.50101

Conclusions

This reports shows multiple problems that occur on production systems that are operating with naive write with pre-loading approach.

In future posts we will overview and compare different optimisation technics to improve data quality and will see in which situations to use-or-not-to-use them for storage consistent results:

  • strong consistent reads
  • attribute-based updates with expressions
  • conditional updates
  • locking subsystem

AppendixA: Full experiment metrics

RequestsWorkersPool-sizeEntities under updateEntity1 verEntity2 verEntity3 verEntity4 verEntity5 ver
100011110000000
10001216000000
10001414300000
10001813210000
100011612180000
100013211320000
10001641920000
100011281710000
1000212829828000
1000222461459000
1000242261341000
1000282202199000
10002162131124000
100023226789000
100026424160000
1000212823842000
100031362364664300
100032331632446200
100034324024427800
100038314715318400
1000316393859400
1000332361645800
1000364327523600
10003128338232400
10004144234533724680
10004243162793063090
10004441791802102160
10004841111151411120
10004164637683640
10004324584136600
10004644372728340
100041284341813250
1000515454461483499517
1000525298244278270244
1000545171156202153156
1000585951061089198
100051655766618671
100053254247445047
100056453923312630
100051285309132010
This post is licensed under CC BY 4.0 by the author.