Bring Your "Own" Gitlab CI Runner

      ☕ 11 min read

Gitlab boasts this nifty feature of using your own Gitlab CI Runner. But what if you’re sans a “really personal” CI Runner? Fear not, we’re about to roll up our sleeves and craft one ourselves. []~( ̄▽ ̄)~*

Our DIY Runner's logo, doesn't it suddenly make the project feel more legitimate? (No, not really)
Our DIY Runner's logo, feels like the project just got serious, right?

In this dive, we’re going to:

  • Outline the core duties of a Gitlab Runner;
  • Dissect the interactions between a Runner and Gitlab during operation;
  • Design and implement our very own Runner;
  • Bootstrap: get our Runner to run its own CI jobs.

Of course, if you’re the type to jump straight to the code, feel free to check out the Github repo. If you dig it, a star would be much appreciated.

Core Duties

Here are the essentials a Gitlab Runner must handle:

  1. Fetch jobs from Gitlab;
  2. Upon fetching, prepare a pristine, isolated, and reproducible environment;
  3. Execute the job within this environment, uploading logs as it goes;
  4. Report back the outcome (success/failure) after job completion or in case of an unexpected exit.

Our DIY Runner is expected to fulfill these tasks as well.

Peeling the Layers

Let’s sequentially unravel these core tasks and peek at how the Runner interacts with Gitlab.

For brevity, API request and response content has been condensed.

Registration

If you’ve ever set up a self-hosted Gitlab Runner, you might be familiar with this page:

Registering a Gitlab Runner
Registering a Gitlab Runner

Users snag a registration token from this interface, then employ the gitlab-runner register command to enlist their Runner instance with Gitlab.

This registration step essentially hits the POST /api/v4/runners endpoint, with a body like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
    "description": "A user-provided description",
    "info": {
        "architecture": "amd64",
        "features": {
          "trace_checksum": true,
          "trace_reset": true,
          "trace_size": true
        },
        "name": "gitlab-runner",
        "platform": "linux",
        "revision": "f98d0f26",
        "version": "15.2.0~beta.60.gf98d0f26"
    },
    "locked": true,
    "maintenance_note": "A user-provided maintenance note",
    "paused": false,
    "run_untagged": true,
    "token": "my-registration-token"
}

If the registration token is invalid, Gitlab responds with 403 Forbidden. Upon successful registration, you get:

1
2
3
4
5
{
    "id": 2, # Global serial of your runner on the GitLab instance
    "token": "bKzi84WitiHSN4N4TYU6", # auth token of the runner
    "token_expires_at": null # Not really used as far as I know
}

The Runner cares mostly about the token, which represents its identity and is used for authentication in subsequent API calls. This token, along with other settings, gets stored in ~/.gitlab-runner/config.toml.

Fetching Jobs

Runners are configured with a maximum number of concurrent jobs. When running below this limit, they poll POST /api/v4/jobs/request to fetch work, with a body somewhat similar to the registration call:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
{
    "info": {
        "architecture": "amd64",
        "executor": "docker",
        "features": { # If a runner does not have features that a job needs, then scheduler will not dispatch that job to the runner.
            "artifacts": true,
            "artifacts_exclude": true,
            "cache": "true",
            "cancelable": true,
            "image": true
        },
        "name": "gitlab-runner",
        "platform": "linux",
        "revision": "f98d0f26",
        "shell": "bash",
        "version": "15.2.0~beta.60.gf98d0f26"
    },
    "last_update": "d8a43f53bb125ec6599d778b9969a601", // Cursor for long polling
    "token": "bKzi84WitiHSN4N4TYU6" // Token received during registration
}

If there’s no job available, Gitlab responds with a 204 No Content status, including a cursor in the header for the next request. The cursor is a random string, utilized by Gitlab’s frontend proxy (Workhorse) to decide whether to make the Runner wait (for long polling) or to pass the request directly to the backend. The cursor in Redis is updated by the backend, which notifies Workhorse through Redis Pub/Sub. Job selection is implemented as a complex SQL query on the backend.

Upon receiving a new job, Gitlab returns 201 Created with a body like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
{
    "allow_git_fetch": true,
    "artifacts": null, # artifacts to upload
    "cache": [], # caches to use
    "credentials": [
        {
            "password": "jTruJD4xwEtAZo1hwtAp", # token for git clone and log/execution result uploading
            "type": "registry",
            "url": "gitlab.example.com",
            "username": "gitlab-ci-token" # constant
        }
    ],
    "dependencies": [],
    "features": {
        "failure_reasons": [ # failure reason enums that GitLab server supports
            "unknown_failure",
            "script_failure"
        ]
    },
    "git_info": {
        "before_sha": "6b55b6ffd17b57a2ec0cf8e7d7c66ff709343528",
        "depth": 20, # git clone --depth
        "ref": "master",
        "ref_type": "branch",
        "refspecs": [
            "+refs/pipelines/52:refs/pipelines/52",
            "+refs/heads/master:refs/remotes/origin/master"
        ],
        "repo_url": "http://gitlab-ci-token:[email protected]/flightjs/Flight.git",
        "sha": "cb4717728e8f885558a4e0bb28c58288b8bf4746" # commit hash
    },
    "id": 823, # job id which we will see a lot afterwards
    "image": null,
    "job_info": {
        "id": 823,
        "name": "build-job",
        "project_id": 6,
        "project_name": "Flight",
        "stage": "build"
    },
    "services": [],
    "steps": [ # scripts to execute, defined in .gitlab-ci.yml
        {
            "allow_failure": false,
            "name": "script",
            "script": [
                "echo \"sleeping 1\"",
                "sleep 5",
                "echo \"sleeping 2\"",
                "sleep 5"
            ],
            "timeout": 3600,
            "when": "on_success"
        }
    ],
    "token": "jTruJD4xwEtAZo1hwtAp", # job token, runner uses it to call GitLab API
    "variables": [ # all env vars, defined by GitLab or the user
        {
            "key": "CI_JOB_ID",
            "masked": false,
            "public": true,
            "value": "823"
        },
        {
            "key": "CI_JOB_URL",
            "masked": false,
            "public": true,
            "value": "http://gitlab.example.com/flightjs/Flight/-/jobs/823"
        },
        {
            "key": "CI_JOB_TOKEN",
            "masked": true,
            "public": false,
            "value": "jTruJD4xwEtAZo1hwtAp"
        }
    ]
}

Environment Prep and Repo Cloning

To ensure CI execution is stable and reproducible, Runner execution environments need a certain level of isolation. This is where Executors come into play, offering a variety of environments:

  • Shell Executor: Easy for debugging and straightforward, but offers low isolation.
  • Docker or Kubernetes Executor: Provides isolation except for the OS kernel. Highly reproducible jobs due to rich image ecosystem.
  • VirtualBox or Docker Machine Executor: OS-level isolation but can be resource-intensive.

Executors provide necessary APIs for Gitlab Runner calls:

  • Prepare the environment;
  • Execute provided scripts and return outputs and results;
  • Clean up the environment.

Cloning a repository is essentially executing a git clone within the environment:

1
git clone -b [branch/tag] --single-branch --depth [clone depth] https://gitlab-ci-token:[email protected]/user/repo.git [destination folder]

Executing Jobs and Uploading Logs

Runner organizes all executable work into scripts, passed to the Executor for execution. The environment variables are declared at the script’s start, using export for bash environments, for example.

An interesting note: the green-highlighted command lines seen in CI logs are generated by Runner using echo commands with terminal color codes:

1
2
echo -e $'\x1b[32;1m$ date\x1b[0;m' # To print the green $ date
date # actually calling the command
Behold, the magic of echo and shell color codes
Behold, the magic of echo and shell color codes

Runner captures standard output and error from the Executor, storing them in temporary files. Before job completion, Runner periodically uploads these logs to Gitlab using PATCH /api/v4/jobs/{job_id}/trace. The header of the HTTP request looks like:

Host: gitlab.example.com
User-Agent: gitlab-runner 15.2.0~beta.60.gf98d0f26 (main; go1.18.3; linux/amd64)
Content-Length: 314 # chunk length
Content-Range: 0-313 # chunk position
Content-Type: text/plain
Job-Token: jTruJD4xwEtAZo1hwtAp
Accept-Encoding: gzip

log content lies in the HTTP body, e.g.:

\x1b[0KRunning with gitlab-runner 15.2.0~beta.60.gf98d0f26 (f98d0f26)\x1b[0;m
\x1b[0K  on rockgiant-1 bKzi84Wi\x1b[0;m
section_start:1663398416:prepare_executor
\x1b[0K\x1b[0K\x1b[36;1mPreparing the "docker" executor\x1b[0;m\x1b[0;m
\x1b[0KUsing Docker executor with image ubuntu:bionic ...\x1b[0;m
\x1b[0KPulling docker image ubuntu:bionic ...\x1b[0;m

Gitlab, after ingesting the logs, responds with 202 Accepted, along with headers:

Job-Status: running # current job state recorded by the server
Range: 0-1899 # current range received
X-Gitlab-Trace-Update-Interval: 60 # proposed log uploading interval

Here is an interesting optimization - when a user is watching the CI log on GitLab webpage, X-Gitlab-Trace-Update-Interval is tuned to 3, meaning the runner should upload log chunk every 3 seconds so that the user can see live-streaming of job logs.

Reporting Execution Results

Upon script completion or failure, Runner:

  • Uploads any remaining logs;
  • Calls PUT /api/v4/jobs/{job_id} to update the job status with success or failure.

A successful job update might look like:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "checksum": "crc32:4a182676", # CRC32 checksum of job logs
  "info": { ... }, # omitted
  "output": {
    "bytesize": 1899,
    "checksum": "crc32:4a182676"
  },
  "state": "success",
  "token": "jTruJD4xwEtAZo1hwtAp"
}

For a failed job:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "checksum": "crc32:f67200bc",
  "exit_code": 42, # exit code of user script
  "failure_reason": "script_failure",
  "info": { ... },
  "output": {
    "bytesize": 1723,
    "checksum": "crc32:f67200bc"
  },
  "state": "failed",
  "token": "Lx1oBNfw2e9xhZvNKsdX"
}

Gitlab responds with 200 OK once it successfully receives the status update. If the server isn’t ready to accept the update (e.g., logs are still processing asynchronously), it returns 202 Accepted, indicating Runner should retry later, where a suggested interval lives in the response header X-GitLab-Trace-Update-Interval with a generation method like exponential backoff.

Diagram for the Process

sequenceDiagram
    autonumber
    Runner->>Gitlab: Request job (long poll)
    Gitlab-->>Runner: Job details & credentials
    Runner->>Runner: Prepare environment & clone repo
    
    loop Execute job and upload logs
        Runner->>Runner: Execute job
        Runner->>Gitlab: Incrementally upload logs
    end
    
    Runner->>Gitlab: Report job completion
    Runner->>Runner: Clean up and end task

Rolling Up Our Sleeves

With a comprehensive walkthrough of Gitlab Runner’s core duties, it’s time to dive into coding our very own Runner!

Naming Our Creation

I’m quite fond of egg tarts, so let’s nickname our DIY Runner “Tart”. And because every serious project needs a logo:

The logo for Tart, our DIY Runner, marrying the image of an egg tart with coding symbolism.
Logo of our DIY Runner, Tart. Combining an egg tart with a touch of coding symbolism.

Armed with the blessings of the patron saint of computing, Ada Lovelace, and a logo that screams “this is definitely a serious project,” we’re all set.

Planning Features

Like Gitlab Runner, Tart is a CLI program with these main features:

  • Register (register): Consumes a Gitlab registration token and spits out a TOML config to stdout. Redirecting output to a file circumvents the eternal dilemma of whether to overwrite existing configurations.
  • Run a single job (single): Waits for, executes a job, reports the result, then exits. Tailored for debugging.
  • Run continuously (run): Similar to single but loops indefinitely, executing jobs as they come.

With spf13/cobra, we can quickly scaffold the CLI:

$ tart 
An educational purpose, unofficial Gitlab Runner.

Usage:
  tart [command]

Available Commands:
  completion  Generate the autocompletion script for the specified shell
  help        Help about any command
  register    Register self to Gitlab and print TOML config into stdout
  run         Listen and run CI jobs
  single      Listen, wait and run a single CI job, then exit
  version     Print version and exit

Flags:
      --config string   Path to the config file (default "tart.toml")
  -h, --help            help for tart

Use "tart [command] --help" for more information about a command.

Building an Isolated Execution Environment

Crafting an isolated execution environment is arguably the most critical task for a Runner. Ideal traits include:

  • Isolation: Jobs should not affect each other or the host machine.
  • Reproducibility: Identical commits should yield identical CI results.
  • Host Safety: Jobs cannot compromise the host or other jobs.
  • Cache-friendliness: Leveraging space to save time.

Among Gitlab Runner’s Executors, each meets these criteria to varying degrees. For Tart, let’s opt for Firecracker to build our execution environment.

Firecracker is a lightweight VM manager developed by AWS, capable of launching secure, multi-tenant container and function-based services. It starts a VM in under a second with minimal overhead, using a stripped-down device model and KVM.

Launching a CI-ready MicroVM requires:

  • Linux Kernel Image: The foundation of any VM.
  • TAP Device: A virtual level-2 network device for connecting the VM to the outside world.
  • Root File System (rootFS): Similar to a Docker image, containing the OS and its filesystem.

For a detailed implementation, see Tart’s approach to rootFS. Notably, setting up rootFS entails:

  • Preparing the Environment: Clone a rootFS template and boot up the VM.
  • Executing Scripts: Here, we’ll delve into specifics shortly.
  • Cleaning Up: Shut down the VM and delete the rootFS copy.

Each VM operates on a copy of the rootFS, ensuring isolation and reproducibility.

Firecracker’s simplicity means we need an agent inside the VM to execute scripts and handle outputs. SSH fits the bill perfectly:

  • Pre-install sshd and public keys in the rootFS.
  • Upon VM startup, Tart connects via SSH to execute commands.
  • Commands run through SSH, with outputs and exit codes relayed back to Tart.

Script Execution

Creating and executing scripts involves concatenating Gitlab’s script arrays and environment variables into a single, executable bash script. Preceded by set -euo pipefail for error handling, the script includes:

  • Cloning the repository and cd into it.
  • Setting environment variables with export.
  • Using set +x to echo commands before execution.
  • Executing user-defined scripts.

SSH forwards the script for execution, with outputs and exit codes managed by Tart and logs incrementally uploaded to Gitlab.

Bootstrapping: Tart Running Its Own CI

To test our Runner, we can have it run its own CI jobs. Here’s a .gitlab-ci.yml for Tart to handle its CI:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# we have go and build-essential pre-installed
our-exceiting-job:
  script:
    - echo "run test"
    - go test ./...
    - echo "build tart"
    - make
    - echo "run tart"
    - cd bin
    - ./tart
    - ./tart version

After registering Tart as a Runner for the repository and disabling shared runners to ensure the job runs on Tart, trigger a CI run. It looks like it’s working!

Tart running its own CI job
Tart running its own CI job

A Glimpse into History

Early-days implementations of Gitlab CI / about.gitlab.com on web.archive.org, 2015-06-19
Early-days implementations of Gitlab CI / about.gitlab.com on web.archive.org, 2015-06-19

In the early days (around 2014-2015), Gitlab Runner had numerous active third-party implementations. Among these, Kamil Trzciński’s Go-based GitLab CI Multi-purpose Runner caught Gitlab’s eye. This implementation replaced Gitlab’s own Ruby-based version to become what we know today as Gitlab Runner. At that time, Trzciński was working at Polidea, making the GitLab CI Multi-purpose Runner a notable community contribution. It’s a fantastic example of how open-source collaboration can lead to widely adopted solutions.

References and Credits

Share on

nanmu42
WRITTEN BY
nanmu42
To build beautiful things beautifully.

What's on this Page