Monday, November 30, 2020

AWS SDK 2 for Java and storing a Json in DynamoDB

 AWS DynamoDB is described as a NoSQL key-value and a document database. In my work I mostly use the key-value behavior of the database but rarely use the document database features, however  the document database part is growing on me and this post highlights some ways of using the document database feature of DynamoDB along with introducing a small utility library built on top of AWS SDK 2.X for Java that simplifies using document database features of AWS DynamoDB

The treatment of the document database features will be very high level in this post, I will plan a follow up which goes into more details later


DynamoDB as a document database

So what does it mean for AWS DynamoDB to be treated as a document database. Consider a json representation of an entity, say something representing a Hotel:

{
    "id": "1",
    "name": "test",
    "address": "test address",
    "state": "OR",
    "properties": {
        "amenities":{
            "rooms": 100,
            "gym": 2,
            "swimmingPool": true                    
        }                    
    },
    "zip": "zip"
}
This json has some top level attributes like "id", a name, an address etc. But it also has a free form "properties" holding some additional "nested" attributes of this hotel. 

A document database can store this document representing the hotel in its entirety OR can treat individual fields say the "properties" field of the hotel as a document. 
A naive way to do this will be to simply serialize the entire content into a json string and store it in, say for eg, for the properties field transform into a string representation of the json and store in the database, this works, but there are a few issues with it. 
  1. None of the attributes of the field like properties can be queried for, say if I wanted to know whether the hotel has a swimming pool, there is no way just to get this information of of the stored content. 
  2. The attributes cannot be filtered on - so say if wanted hotels with atleast 2 gyms, this is not something that can be filtered down to. 

A document database would allow for the the entire document to be saved, individual attributes, both top level and nested ones, to be queried/filtered on. 
So for eg, in the example of "hotel" document the top level attributes are "id", "name", "address", "state", "zip" and the nested attributes are "properties.amenities.rooms", "properties.amenities.gym", "properties.amenities.swimmingPool" and so on.

AWS SDK 2 for DynamoDB and Document database support

If you are writing a Java based application to interact with a AWS DynamoDB database, then you would have likely used the new AWS SDK 2 library to make the API calls. However one issue with the library is that it natively does not support a json based document model. Let me go into a little more detail here.  

From the AWS SDK 2 for AWS DynamoDB's perspective every attribute that is saved is an instance of something called an AttributeValue
A row of data, say for a hotel, is a simple map of "attribute" names to Attribute values, and a sample code looks something like this:
val putItemRequest = PutItemRequest.builder()
    .tableName(TABLE_NAME)
    .item(
        mapOf(
            ID to AttributeValue.builder().s(hotel.id).build(),
            NAME to AttributeValue.builder().s(hotel.name).build(),
            ZIP to AttributeValue.builder().s(hotel.zip).build(),
            STATE to AttributeValue.builder().s(hotel.state).build(),
            ADDRESS to AttributeValue.builder().s(hotel.address).build(),
            PROPERTIES to objectMapper.writeValueAsString(hotel.properties),
            VERSION to AttributeValue.builder().n(hotel.version.toString()).build()
        )
    )
    .build()
dynamoClient.putItem(putItemRequest)
Here a map of each attribute to an AttributeValue is being created with an appropriate "type" of content, "s" indicates a string, "n" a number in the above sample. 

There are other AttributeValue types like "m" representing a map and "l" representing a list. 

The neat thing is that "m" and "l" types can have nested AttributeValues, which maps to a structured json document, however there is no simple way to convert a json to this kind of an Attribute Value and back. 

So for eg. if I were to handle the raw "properties" of a hotel which understands the nested attributes, an approach could be this:
val putItemRequest = PutItemRequest.builder()
    .tableName(TABLE_NAME)
    .item(
        mapOf(
            ID to AttributeValue.builder().s(hotel.id).build(),
            NAME to AttributeValue.builder().s(hotel.name).build(),
            ZIP to AttributeValue.builder().s(hotel.zip).build(),
            STATE to AttributeValue.builder().s(hotel.state).build(),
            ADDRESS to AttributeValue.builder().s(hotel.address).build(),
            PROPERTIES to AttributeValue.builder()
                .m(
                    mapOf(
                        "amenities" to AttributeValue.builder()
                            .m(
                                mapOf(
                                    "rooms" to AttributeValue.builder().n("200").build(),
                                    "gym" to AttributeValue.builder().n("2").build(),
                                    "swimmingPool" to AttributeValue.builder().bool(true).build()
                                )
                            )
                            .build()
                    )
                )
                .build(),
            VERSION to AttributeValue.builder().n(hotel.version.toString()).build()
        )
    )
    .build()
See how the nested attributes are being expanded out recursively. 
 

Introducing the Json to AttributeValue utility library

This is exactly where the utility library that I have developed comes in. 

Given a json structure as a Jackson JsonNode it converts the Json into an appropriately nested AttributeValue type and when retrieving back from DynamoDB, can convert the resulting nested AttributeValue type back to a json. 

The structure would look exactly similar to the handcrafted sample shown before. So using the utility saving the "properties" would look like this:
val putItemRequest = PutItemRequest.builder()
    .tableName(TABLE_NAME)
    .item(
        mapOf(
            ID to AttributeValue.builder().s(hotel.id).build(),
            NAME to AttributeValue.builder().s(hotel.name).build(),
            ZIP to AttributeValue.builder().s(hotel.zip).build(),
            STATE to AttributeValue.builder().s(hotel.state).build(),
            ADDRESS to AttributeValue.builder().s(hotel.address).build(),
            PROPERTIES to JsonAttributeValueUtil.toAttributeValue(hotel.properties),
            VERSION to AttributeValue.builder().n(hotel.version.toString()).build()
        )
    )
    .build()
dynamoClient.putItem(putItemRequest)
and when querying back from DynamoDB, the resulting nested AttributeValue converted back to a json this way(Kotlin code in case you are baffled by the "?let"):
properties = map[PROPERTIES]?.let { attributeValue ->
    JsonAttributeValueUtil.fromAttributeValue(
        attributeValue
    )
} ?: JsonNodeFactory.instance.objectNode()
The neat thing is even the top level attributes can be generated given a json representing the entire Hotel type. So say a json representing a Hotel is provided:
val hotel = """
    {
        "id": "1",
        "name": "test",
        "address": "test address",
        "state": "OR",
        "properties": {
            "amenities":{
                "rooms": 100,
                "gym": 2,
                "swimmingPool": true                    
            }                    
        },
        "zip": "zip"
    }
""".trimIndent()
val attributeValue = JsonAttributeValueUtil.toAttributeValue(hotel, objectMapper)
dynamoDbClient.putItem(
    PutItemRequest.builder()
            .tableName(DynamoHotelRepo.TABLE_NAME)
            .item(attributeValue.m())
            .build()
    )


Using the Library

The utility library is available here - https://github.com/bijukunjummen/aws-sdk2-dynamo-json-helper and provides details of how to get the binaries in place and use it with code.

 

Conclusion

AWS SDK 2 is an excellent and highly performant client, providing non-blocking support for client calls. I like how it provides a synchronous API and an asynchronous API and remains highly opionionated in consistenly providing a low level client API for calling the different AWS services. This utlility library provides a nice bridge for AWS SDK 2 to remain low level but be able to manage a json based document persistence and back. All the samples in this post are available in my github repository here - https://github.com/bijukunjummen/dynamodb-document-sample

Sunday, November 15, 2020

Permutation - Heap's Algorithm

 This is a little bit of an experimentation that I did recently to figure out a reasonable code to get all possible permutations of a set of characters. 


So say given a set of characters "ABC", my objective is to come up code which can spit out "ABC", "ACB", "BAC", "BCA", "CBA", "CAB". 


The approach I took is to go with the definition of permutation itself, so with "ABCD" as the set of characters a 4 slot that needs to be filled.



The first slot can be filled by any of A, B, C, D, in 4 ways:


The second slot by any of the remaining 3 characters, so with "A" in the first slot - 


The third slot by the remaining 2 characters, so with "A", "B" in the first two slots:
And finally, the fourth slot by the remaining 1 character, with say "A", "B", "C" in the first 3 slots:



In total, there would be 4 for the first slot * 3 for the 2nd slot * 2 for the 3rd slot * 1 for the 4th slot - 24 permutations altogether. 

I can do this in place, using an algorithm that looks like this:






A trace of flow and the swaps is here:
The only trick here is that the code does all the holding of characters and getting it into the right place in place by swapping the right characters to the right place and restoring it at the end of it. 

This works well for a reasonably sized set of characters - reasonable because for just 10 characters, there would be 3,628,800 permutations. 

An algorithm that works even better, though a complete mystery to me how it actually functions(well explained here if anybody is interested), is the Heap's Algorithm. Here is a java implementation of it: It very efficiently does one swap per permutation, which is still high but better than the approach that I have described before. 


In a sample perumutation of 8 characters, which generates 40320 permutations, the home cooked version swaps 80638 times, and the Heap's algorithm swaps 40319 times! thus proving its efficacy.