all and sundry: Cloud Bigtable - Write and Retrieval

Thursday, December 30, 2021

Cloud Bigtable - Write and Retrieval

This is a quick write up based on a few days of experimentation with Cloud Bigtable, with the following objectives:

1. Using an emulator for local development

2. A high level schema design with retrieval patterns in mind

3. Finding records

Emulator

Cloud Bigtable emulator provides a way to test the Bigtable functionality locally. Setting up the emulator is easy and is described in this document. Assuming that the gcloud utility, which is a CLI to work with the Google Cloud resources, is available on the machine, then the following command should get the emulator in place:

gcloud components install bigtable

Once installed, the emulator can be started up using the following command:

gcloud beta emulators bigtable start --host-port=localhost:8086

This brings up the emulator at port 8086.

Working with the Emulator

Now that a local instance of Bigtable is up, working with it requires another utility called "cbt", which can be installed, again using gcloud, the following way:

gcloud components install cbt

A table to hold an entity modeled after a "Hotel", call it "hotels" along with a "columnfamily" to hold the details, called "hotel_details", looks like this:

BIGTABLE_EMULATOR_HOST=localhost:8086
cbt -project "project-id" createtable hotels
cbt -project "project-id" createfamily hotels hotel_details

Now that the emulator and the cbt utility is available, let's start with a modeling exercise. Take this modeling exercise with a pinch of salt, my knowledge of Bigtable is evolving and the approach here likely will need heavy polishing.

Schema Design for an Entity

So my objective is to provide basic write and read functionality on a "Hotel" entity, described using a golang struct the following way:

type Hotel struct {
	Id      string
	Name    string
	Address string
	Zip     string
	State   string
}

To store such an entity into Bigtable attention should be paid to how the data will ultimately be read. In my case, there are going to be two read patterns.

Retrieval by Hotel's id field
Retrieving a list of hotels by the zip code

Now, Bigtable supports only 1 index, called the "Row key", and retrieval of a single record can be using this "Row key" or a set of records can be retrieved using the prefix of a row key.

In my case it will be difficult to support retrieval by id AND retrieval by zip code using one Row key, so my schema design is to have multiple records with different row keys for a single Hotel entity, along these lines, say for a Hotel which looks like this:

To support retrieval by id my row key looks something like this:

H/id#id1 along with data for the hotel being set to different column names.

To support retrieval by zip code my row key looks like this:

H/Zip#OR-1/Id#id1, the data this time points to the row key of the actual data which is H/id#id1, this way the entire data for the hotel does not have to duplicated. Given this row key, say if all hotels with a Zip code of OR-1 has to be retrieved, I can do it using a row key prefix of "H/Zip#OR-1" and then hydrate the information using the Id from the data.

So with this storing the information of a real hotel into Bigtable and querying it back looks like this in raw form:

----------------------------------------
H/Id#d7d63398-3442-413b-8859-3e73016fc5cc
  hotel_details:address                    @ 2021/12/29-20:53:30.816000
    "525 SW Morrison St, Portland"
  hotel_details:id                         @ 2021/12/29-20:53:30.816000
    "d7d63398-3442-413b-8859-3e73016fc5cc"
  hotel_details:name                       @ 2021/12/29-20:53:30.816000
    "The Nines"
  hotel_details:state                      @ 2021/12/29-20:53:30.816000
    "OR"
  hotel_details:zip                        @ 2021/12/29-20:53:30.816000
    "97204"
----------------------------------------
H/Zip#97204/Id#d7d63398-3442-413b-8859-3e73016fc5cc
  hotel_details:key                        @ 2021/12/29-20:53:30.816000
    "H/Id#d7d63398-3442-413b-8859-3e73016fc5cc"

This works quite well, I am not entirely sure if this optimal though, I will revisit the approach once I have gained a little more experience with using Bigtable

Retrieving by Zip Code

Assuming that a bunch of Hotels are present in the database with this schema design, a retrieval by zip code looks like this in golang:

func findHotels(table *bigtable.Table, ctx context.Context, zip string) ([]types.Hotel, error) {
	searchPrefix := fmt.Sprintf("H/Zip#%s", zip)
	var keys []string
	var hotels []types.Hotel
	err := table.ReadRows(ctx, bigtable.PrefixRange(searchPrefix),
		func(row bigtable.Row) bool {
			keys = append(keys, keyFromRow(row))
			return true
		})

	if err != nil {
		return nil, fmt.Errorf("error in searching by zip code: %v", err)
	}

	err = table.ReadRows(ctx, bigtable.RowList(keys), func(row bigtable.Row) bool {
		hotels = append(hotels, hotelFromRow(row))
		return true
	})
	if err != nil {
		return nil, fmt.Errorf("error in retrieving by keys: %v", err)
	}
	return hotels, nil
}

The code starts by generating the search prefix, which has a pattern of "H/Zip#zipcode" and retrieves the id from the retrieved records, and then batches a call to the table with the retrieved id's to get the details.

Conclusion

It may be easier to follow this along with real code, which is in my github repository available here - https://github.com/bijukunjummen/golang-bigtable-sample. This has sample to write to Bigtable and then retrieve from it.

all and sundry