jdwalker | blog

Musings in various programming languages


AWS DynamoDB Index Overloading

Categories: aws

For the indexes in Amazon’s DynamoDB service, we can fit multiple indexes in the same attribute, as long as we split them by namespace. We can do this via use of the BEGIN_WITH operator on the sort key. We can further extends this to namespace ranges with the BETWEEN operator.

Restrictions

An index is comprised of a hash_key / partition key, which can only do equals, and a range_key which can do operators =, >, <, <=, >=, betweens or begins_with. A key can be a string, a binary base 64 or a number.

Secondary Indexes are sparse meaning if the DynamoDB item doesn’t have the attributes associated with the index, then it won’t appear on the index.

We are restricted to a maximum of 5 Local Secondary Indexes (LSIs), that must be created when the table is. These share the same partition key, but have different attributes for the range key. They can only be created with the DynamoDB table, not while it exists. They allow retrieval of attributes on the item that are not are on the index.

Global Secondary Indexes on the other hand can be created at any time, but are copies of the table, increasing read/write/storage costs. AWS allows an initial limit of 20 global secondary indexes per table, which can increased on request. Projected attributes are restricted to those configured when creating the index.

BEGINS_WITH operator

Using begins with on a range_key is very powerful as it can be used to query hierarchical data, with the most specific data at the end of the key.

For example - addresses

London:Westminster:Downing St:10

London:Westminster:Buckingham Palace

We can do a range key of BEGINS_WITH London: to get all items with London addresses

London:Westminster:Downing St:10

London::Westminster:Buckingham Palace

Or in Westminister - BEGINS_WITH London:Westminister:

London:Westminster:Downing St:10

London:Westminster:Buckingham Palace

Or just in Downing Street BEGINS_WITH London:Westminister:Downing St:

London:Westminster:Buckingham Palace

Index Namespacing

We can use this BEGINS_WITH property to store mutually exclusive data in the same index.

For example, say we wanted to store both dog and cat attributes in our database. Dogs can follow many commands, whereas cats ignore commands. On the other hand cats climb trees while dogs don’t. We can thus store these together in the same attribute, and call it something like index1.

We could prefix the attribute with the appropriate names:

dog:command:roll over

dog:command:sit

dog:command:beg

cat:treeclimbed:spruce

cat:treeclimbed:elm

cat:treeclimbed:oak

Thus we can query on the same index with BEGINS_WITH dog:command: to get:

dog:command:roll over

dog:command:sit

dog:command::beg

Or BEGINS_WITH cat:treeclimbed: to get

cat:treeclimbed:spruce

cat:treeclimbed:elm

cat:treeclimbed:oak

However this all very useful for hierarchies, but what if we wanted numbers and ranges in our index? What if we wanted to also track parrots in the same index, counting how many different words, and query all those greater than 5?

BETWEEN operator

The BETWEEN operator allows selection of a higher and lower values in a range and returns exclusive values in that range. It works on numbers, and most importantly, strings. If our data has a maximum and minimum value we can store it in index1.

If we take our parrots values, prefix them with parrot:words: and pad the zeros up to say a 999999, we can do:

parrot:words:000003

parrot:words:000101

parrot:words:000201

Now if we want query the index and get all parrots whose words are between 2 and 3, we can do BETWEEN parrot:words:000002 AND parrot:words:000005 to get

parrot:words:000003

parrot:words:000101

parrot:words:000201

dog:command:roll over

If we wanted to do greater than 5 we could do BETWEEN parrot:words:000005 AND parrot:words:999999

parrot:words:000003

parrot:words:000101

parrot:words:000201

dog:command:roll over

If we wanted to do less than 5 we could do BETWEEN parrot:words:000000 AND parrot:words:000004

parrot:words:000003

parrot:words:000101

parrot:words:000201

dog:command:roll over

This approach can also be extended to dates in UTC in the ISO8601 format.

End

There is no limit to how many indexes you can fit into the same table, as long as the below rules are followed:

Potential downsides

Having to iterate between each namespace as a separate query – e.g. getting all animals birthdates would need a query for each type of animal, assuming the index is namespaced. If multiple things are stuffed in the same attribute e.g. both number of legs and number of whiskers: e.g. animal:legs:4:whiskers:8 To get all animals with whiskers 8 would need to run all number of legs as well.


Tags: aws