Tuesday, November 12, 2019

Hash a Json

I recently wrote a simple library to predictably hash a json.

The utility is built on top of the excellent Jackson Json parsing library


Problem

I needed a hash generated out of a fairly large json based content to later determine if the content has changed at all. Treating json as a string is not an option as formatting, shuffling of keys can skew the results.

Solution

The utility is simple - it traverses the Jackson JsonNode representation of the json:

1. For every object node, it sorts the keys and then traverses the elements, calculates aggregated hash from all the children
2. For every array node, it traverses to the elements and aggregates the hash
3. For every terminal node, it takes the key and value and generates the SHA-256 hash from it


This way the hash is generated for the entire tree.

Consider a Jackson Json Node, created the following way in code:

ObjectNode jsonNode = JsonNodeFactory
        .instance
        .objectNode()
        .put("key1", "value1");

jsonNode.set("key2", JsonNodeFactory.instance.objectNode()
        .put("child-key2", "child-value2")
        .put("child-key1", "child-value1")
        .put("child-key3", 123.23f));

jsonNode.set("key3", JsonNodeFactory.instance.arrayNode()
        .add("arr-value1")
        .add("arr-value2"));

String calculatedHash = sha256Hex(
        sha256Hex("key1") + sha256Hex("value1")
                + sha256Hex("key2") + sha256Hex(
                sha256Hex("child-key1") + sha256Hex("child-value1")
                        + sha256Hex("child-key2") + sha256Hex("child-value2")
                        + sha256Hex("child-key3") + sha256Hex("123.23"))
                + sha256Hex("key3") + sha256Hex(
                sha256Hex("arr-value1")
                        + sha256Hex("arr-value2"))
);

Here the json has 3 keys, "key1", "key2", "key3". "key1" has a primitive text field, "key2" is an object node, "key3" is an array of strings. The calculatedHash shows how the aggregated hash is calculated for the entire tree, the utility follows the same process to aggregate a hash.

If you are interested in giving this a whirl, the library is available in bintray - https://bintray.com/bijukunjummen/repo/json-hash and hosted on github here - https://github.com/bijukunjummen/json-hash