Post

AWS CSM Mode: Advanced monitoring of AWS client

After exploring aws boto3 core sources on github, I found this interesting commit, that enables monitor mode called CSM. Once exploring how it works I decided write a post, cause this feature is extremely useful but also start writing aws-client-monitor toolbox on top of it.

Introduction to AWS CSM (Client-Side Monitoring) Mode

AWS Client-Side Monitoring (CSM) is a powerful feature designed to track and analyze the performance of your AWS SDK calls. When enabled, it provides detailed metrics on API requests, response times, and error rates, helping developers gain a deeper understanding of their application’s behavior when interacting with AWS services. This is crucial for debugging, optimizing performance, and ensuring that applications are running efficiently in production environments.

CSM mode works by capturing information about SDK API calls and sending that data to a local monitoring agent. It helps you:

  • Track API request latencies.
  • Identify high failure rates in SDK requests.
  • Gain visibility into the most frequently called AWS services.

In this blog post, we’ll explore AWS CSM mode in more detail, look at common use cases, and provide Golang code snippets to demonstrate how to implement it.

Why Use AWS CSM Mode?

As applications become increasingly reliant on cloud services, monitoring and optimizing the performance of these interactions becomes critical. AWS SDKs are widely used to interface with AWS services such as S3, DynamoDB, Lambda, and many others. However, managing and tracking these interactions can be challenging, especially when it comes to identifying latency issues or bottlenecks in the communication between your application and AWS.

CSM provides a granular view of how SDK requests are performing. It allows you to gather metrics like:

  • Latency: How long each request takes.
  • Errors: Which AWS services are returning errors and why.
  • Request Frequency: Which services are being called the most.

This data is invaluable for performance tuning, debugging, and capacity planning.

Common Use Cases for AWS CSM Mode

Performance Optimization:

By tracking the latency of AWS service calls, you can identify the API requests that are taking the longest to execute. This can help you optimize the application’s performance, whether through caching, retries, or parallelizing requests.

Error Tracking:

If your application experiences frequent errors while interacting with AWS services, CSM can help identify the root cause. For example, if a specific AWS service is returning a large number of 5xx errors, CSM will capture this information, enabling developers to troubleshoot quickly.

Capacity Planning:

Monitoring the number of requests made to AWS services can help forecast capacity needs and adjust resources accordingly. For example, if you’re making a large number of requests to DynamoDB, it may be time to scale your read/write capacity.

Debugging Production Issues:

When something goes wrong in production, CSM can provide critical insights into which AWS services or API calls are causing issues, allowing for quick resolution.

Setting Up AWS CSM in Golang

The AWS SDK for Go provides native support for client-side monitoring. To enable CSM, you need to configure the SDK to send data to the local CSM agent, which processes and forwards it to monitoring tools like Amazon CloudWatch.

Step 1: Install the AWS SDK for Go

First, you need to install the AWS SDK for Go, if you haven’t already:

1
go get -u github.com/aws/aws-sdk-go

Step 2: Enable CSM in the AWS SDK

To enable CSM in the AWS SDK for Go, you need to configure the environment variables that control CSM behavior, or you can do this programmatically within your application.

Here’s an example of how you can enable CSM using environment variables:

1
2
3
4
export AWS_CSM_ENABLED=true
export AWS_CSM_HOST=127.0.0.1
export AWS_CSM_PORT=31000
export AWS_CSM_CLIENT_ID=my-client-id
  • AWS_CSM_ENABLED: Enables or disables CSM.
  • AWS_CSM_HOST: The hostname where the CSM agent is running (usually localhost).
  • AWS_CSM_PORT: The port where the CSM agent is listening.
  • AWS_CSM_CLIENT_ID: A client identifier used to differentiate between clients.

Step 3: Configure AWS SDK to Use CSM

In Golang, you configure the SDK to use CSM when making AWS API calls. Here’s a simple example where we create an S3 client and make a request with CSM enabled.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
package main

import (
  "fmt"
  "net"
  "time"
)

func listenUDP(port int, ch chan<- []byte) {
  addr := net.UDPAddr{
    Port: port,
    IP:   net.ParseIP("0.0.0.0"),
  }

  conn, err := net.ListenUDP("udp", &addr)
  if err != nil {
    fmt.Println("Error listening on UDP:", err)
    return
  }
  defer func(conn *net.UDPConn) {
    err := conn.Close()
    if err != nil {
      print("Error closing UDP connection:", err)
    }
  }(conn)

  buffer := make([]byte, 2048)
  for {
    n, _, err := conn.ReadFromUDP(buffer)
    if err != nil {
      fmt.Println("Error reading from UDP:", err)
      continue
    }
    // Send received data to channel
    ch <- buffer[:n]
  }
}

func writeToConsole(ch <-chan []byte) {
  for msg := range ch {
    fmt.Println("Received from channel:", string(msg))
  }
}

func main() {
  byteChannel := make(chan []byte)

  // Goroutine to listen on UDP and write to the channel
  go listenUDP(31000, byteChannel)

  // Goroutines to read from the channel
  go writeToConsole(byteChannel)

  // Prevent the main function from exiting
  for {
    time.Sleep(1 * time.Second)
  }
}

Step 4: Invoke aws cli API

1
aws s3 ls

Step5: aws-client-monitor will display calls

aws-cli will as usual list the buckets but at the same time it will make 2 calls via UDP to our server.

For each AWS API call there 2 entities ApiCallAttempt and ApiCall, they have the following structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{
  "Version": 1,
  "ClientId": "my-client-id",
  "Type": "ApiCallAttempt",
  "Service": "S3",
  "Api": "ListBuckets",
  "Timestamp": 1728194484982,
  "AttemptLatency": 266,
  "Fqdn": "s3.eu-west-1.amazonaws.com",
  "UserAgent": "aws-cli/1.27.92 md/Botocore#1.31.2 ua/2.0 os/macos#21.6.0 md/arch#x86_64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.31.2",
  "AccessKey": "ASIAWFOD4FPxxx",
  "Region": "eu-west-1",
  "SessionToken": "IQoJb3JpZxxx=",
  "HttpStatusCode": 200,
  "XAmzRequestId": "8K3P9AWACxxx",
  "XAmzId2": "vZDGgBpIwz6Jfxxx="
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
  "Version": 1,
  "ClientId": "my-client-id",
  "Type": "ApiCall",
  "Service": "S3",
  "Api": "ListBuckets",
  "Timestamp": 1728194484981,
  "AttemptCount": 1,
  "Region": "eu-west-1",
  "UserAgent": "aws-cli/1.27.92 md/Botocore#1.31.2 ua/2.0 os/macos#21.6.0 md/arch#x86_64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.31.2",
  "FinalHttpStatusCode": 200,
  "Latency": 267,
  "MaxRetriesExceeded": 0
}

Tracking errors of aws API

We can also track error of AWS API, let’s try to create bucket that exists:

1
2
3
aws s3api create-bucket --bucket existing-bucket

An error occurred (IllegalLocationConstraintException) when calling the CreateBucket operation: The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.

Same error message also present in aws-client-monitor:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "Version": 1,
  "ClientId": "my-client-id",
  "Type": "ApiCallAttempt",
  "Service": "S3",
  "Api": "CreateBucket",
  "Timestamp": 1728195105316,
  "AttemptLatency": 215,
  "Fqdn": "existing-bucket.s3.eu-west-1.amazonaws.com",
  "UserAgent": "aws-cli/1.27.92 md/Botocore#1.31.2 ua/2.0 os/macos#21.6.0 md/arch#x86_64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.31.2",
  "AccessKey": "ASIAWFODxxx",
  "Region": "eu-west-1",
  "SessionToken": "IQoJxxx=",
  "HttpStatusCode": 400,
  "XAmzRequestId": "FPWTJWZC7114XQJE",
  "XAmzId2": "+G9yOxxx",
  "AwsException": "IllegalLocationConstraintException",
  "AwsExceptionMessage": "The unspecified location constraint is incompatible for the region specific endpoint this request was sent to."
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
  "Version": 1,
  "ClientId": "my-client-id",
  "Type": "ApiCall",
  "Service": "S3",
  "Api": "CreateBucket",
  "Timestamp": 1728195105312,
  "AttemptCount": 1,
  "Region": "eu-west-1",
  "UserAgent": "aws-cli/1.27.92 md/Botocore#1.31.2 ua/2.0 os/macos#21.6.0 md/arch#x86_64 lang/python#3.10.14 md/pyimpl#CPython cfg/retry-mode#legacy botocore/1.31.2",
  "FinalHttpStatusCode": 400,
  "FinalAwsException": "IllegalLocationConstraintException",
  "FinalAwsExceptionMessage": "The unspecified location constraint is incompatible for the region specific endpoint this request was sent to.",
  "Latency": 219,
  "MaxRetriesExceeded": 0
}

Metrics Extraction

Following metrics can be extracted on UDP server side:

  • API call duration (latency)
  • History of app operations with AWS API (service name : operation)
  • Success and failure rates
  • Error messages
  • Region
  • User-agent
  • IP
  • clientName
  • UserAgent
  • AccessKey
  • Request payload size.

We can use these insights to fine-tune our application’s performance and catch potential bottlenecks or errors.

Conclusion

AWS Client-Side Monitoring (CSM) is a valuable tool for gaining deep insights into how your application interacts with AWS services. It helps you monitor performance, identify errors, and optimize API usage. By integrating CSM with the AWS SDK for Go, developers can track API calls and gather real-time metrics, ultimately improving application performance and reliability.

If you’re working in a production environment where optimizing performance and troubleshooting issues is crucial, CSM is a feature worth enabling. With just a few configuration steps, you can gain access to a wealth of data that will help you fine-tune your AWS-based applications.

This post is licensed under CC BY 4.0 by the author.