Saturday, September 25, 2021

Cloud Build and Gradle/Maven Caching

One of the pain points in all the development projects that I have worked on has been setting up/getting an infrastructure for automation. This has typically meant getting access to an instance of Jenkins. I have great respect for Jenkins as a tool, but each deployment of Jenkins tends to become a Snowflake over time with the different set of underlying plugins, version of software, variation of pipeline script etc.

This is exactly the niche that a tool like Cloud Build solves for, deployment is managed by Google Cloud platform, and the build steps are entirely user driven based on the image used for each step of the pipeline.

In the first post I went over the basics of creating a Cloud Build configuration and in the second post went over a fairly comprehensive pipeline for a Java based project.

This post will conclude the series by showing an approach to caching in the pipeline - this is far from original, I am borrowing generously from a few sample configurations that I have found. So let me start by describing the issue being solved for.


Problem

Java has two popular build tools - Gradle and Maven. Each of these tools download a bunch of dependencies and cache these dependencies at startup -
  1. The tool itself is not a binary, but a wrapper which knows to download the right version of the tools binary.
  2. The projects dependencies specified in tool specific DSL's are then downloaded from repositories.
The issue is that across multiple builds the dependencies tend to get downloaded when run

Caching across Runs of a Build

The solution is to cache the downloaded artifacts across the different runs of a build. There is unfortunately no built in way (yet) in Cloud Build to do this, however a mechanism can be built along these lines:
  1. Cache the downloaded dependencies into Cloud Storage at the end of the build 
  2. And then use it to rehydrate the dependencies at the beginning of the build, if available

A similar approach should work for any tool that downloads dependencies. The trick though is figuring out where each tool places the dependencies and knowing what to save to Cloud storage and back. 

Here is an approach for Gradle and Maven.

Each step of the cloud build loads the exact same volume:
    
	volumes:
      - name: caching.home
        path: /cachinghome

Then  explodes the cached content from cloud storage into this volume.

    dir: /cachinghome
    entrypoint: bash
    args:
      - -c
      - |
        (
          gsutil cp gs://${_GCS_CACHE_BUCKET}/gradle-cache.tar.gz /tmp/gradle-cache.tar.gz &&
          tar -xzf /tmp/gradle-cache.tar.gz
        ) || echo 'Cache not found'
    volumes:
      - name: caching.home
        path: /cachinghome

Now, Gradle and Maven store the dependencies into a ".gradle" and ".m2" folder in a users home directory respectively. The trick then is to link the $USER_HOME/.gradle and $USER_HOME/.m2 folder to the exploded directory:


  - name: openjdk:11
    id: test
    entrypoint: "/bin/bash"
    args:
      - '-c'
      - |-
        export CACHING_HOME="/cachinghome"
        USER_HOME="/root"
        GRADLE_HOME="$${USER_HOME}/.gradle"
        GRADLE_CACHE="$${CACHING_HOME}/gradle"

        mkdir -p $${GRADLE_CACHE}

        [[ -d "$${GRADLE_CACHE}" && ! -d "$${GRADLE_HOME}" ]] && ln -s "$${GRADLE_CACHE}" "$${GRADLE_HOME}"
        ./gradlew check
    volumes:
      - name: caching.home
        path: /cachinghome

The gradle tasks should now use the cached content if available or create the cached content if it is being run for the first time. 


It may be simpler to see a sample build configuration which is here - https://github.com/bijukunjummen/hello-cloud-build/blob/main/cloudbuild.yaml


No comments:

Post a Comment