Optimizing AWS Dynamodb cost
AWS Dynamodb is a preferred database for serverless applications. Scale to Zero AWS kit heavily uses Dynamodb to store the data. In this blog post, we will discuss how to save costs on Dynamodb.
Before we start, let's see how the read and write requests are charged.
Read and write requests
DynamoDB write and read requests are charged with increments.
Write operations are measured in 1kb increments. Read operations are measured in 4kb increments.
For example, writing an item with 400 bytes, you’ll pay for 1kb. If the item size is above 1kb (let’s say 1.1kb), you’ll pay for 2kb, and so on.
Write requests are about 5x more expensive than read requests in the On-Demand Throughput Type.
Write Request Units (WRU) | $1.525 per million write request units |
---|---|
Read Request Units (RRU) | $0.305 per million read request units |
So optimizing total item size can help to effectively reduce the cost.
Use compression for large data
If you store large data, it's better to compress it before storing it. As mentioned above, you pay for the item size, so compressing the data can save you significant money.
You can use a Brotli (or Gzip) compression for storing large data as a Binary attribute type. Let's say you're required to store user metadata in the database. The metadata can be quite large and it will cost money while writing and reading. However, if you compress the data, the metadata item will be much smaller and you'll pay less.
If you use GSI (Global Secondary Indexes), only project attributes that you need.
if you use GSI, try avoiding projecting (copying) all attributes from the base table to the GSI. Only project attributes that you need.
In our AWS Serverless kit, we use GSI for querying the user by email. We do not need to project all attributes from the base table to the GSI, as we only need to get the user ID and provider name (Cognito, Google, etc.)
Since we code the infrastructure, let's review this code snippet from the kit.
/**
* Global Secondary Indexes
*/
mainTable.addGlobalSecondaryIndex({
indexName: 'GSI1',
partitionKey: {
name: 'GSI1PK',
type: aws_dynamodb.AttributeType.STRING,
},
sortKey: {
name: 'GSI1SK',
type: aws_dynamodb.AttributeType.STRING,
},
projectionType: aws_dynamodb.ProjectionType.INCLUDE,
nonKeyAttributes: ['PN'], // add non-key attributes. Only add what you need! PN = Provider Name
});
In our case, we use the projection type INCLUDE
and specify the attribute name that we need to project. Later, we can easily add more attributes to be projected.
projectionType: aws_dynamodb.ProjectionType.INCLUDE,
nonKeyAttributes: ['PN', 'PL'], // PN = Provider Name, PL = Plan
There are 3 projection types:
ALL
- all of the table attributes are projected into the index.KEYS_ONLY
- only the index and primary keys are projected into the index.INCLUDE
- only the specified table attributes are projected into the index.
Avoid using All
since it will project all attributes.
Be careful when you use projection type KEYS_ONLY
. It will only project the key attributes (Partition and sort key) and you won't be able to get the other attributes from the GSI. Later if you want to add more attributes to be projected, you will need to recreate the GSI, because you can't modify the projection type.
Generally, it's better to use the generic name for GSIs. it's recommended to use the incremental numbers - GSI1, GSI2, GSI3, etc. By doing that, you can easily change your access patterns and avoid recreating the GSIs.
Avoid having long attribute names
Consider shortening attribute names. When Dynamodb charges for the item size, it also includes the attribute names. So, longer attribute names will cost more.
We use PK and SK for the partition key and sort key respectively.
It's also recommended to use epoch time format instead of ISO date because it's shorter.
Use provisioned capacity if the traffic is predictable
DynamoDB has two capacity modes: On-Demand and Provisioned. By default, the kit uses On-Demand. If you have predictable traffic, you can save money by using provisioned capacity. For example, if you run an e-commerce website, you know that the traffic will be higher during some periods (Black Friday, Christmas, etc.) and lower during other periods. You can use provisioned capacity and set the capacity for peak and low traffic.
Auto scaling is also supported for GSI besides the base table and you can manage each GSI with their own scaling policy. Tuning a target utilization is quite important.
Read more about DynamoDB capacity modes.
Conclusion
There are other advanced techniques to save cost such as conditional writes, checking the item hash before writing, etc. We will cover them in another blog post. The techniques mentioned above are easy to implement and can save you significant money before you start using advanced techniques.