Maintainer	VCUE
Last Reviewed	14 Aug 2024
Reviewed by	Jonathan Coles (CSCS)
Status

Introduction

Containerized CI/CD allows you to build containers and run them at scale on CSCS systems. The basic idea is that you provide a Dockerfile with build instructions and run the newly created container. Most of the boilerplate work is being taken care by the CI implementation such that you can concentrate on providing build instructions and testing. The important information is provided to you from the CI side for the configuration of your repository.

We support any git provider that supports webhooks. This includes GitHub, GitLab and Bitbucket. A typical pipeline consists of at least one build job and one test job. The build job makes sure that a new container with your most recent code changes is built. The test step uses the new container as part of an MPI job; e.g., it can run your tests on multiple nodes with GPU support.

Building your software inside a container requires a Dockerfile and a name for the container in the registry where the container will be stored. Testing your software then requires the commands that must be executed to run the tests. No explicit container spawning is required (and also not possible). Your test jobs need to specify the number of nodes and tasks required for the test and the test commands.

Here is an example of a full helloworld project.

It is also helpful to consult the GitLab CI yaml reference documentation and the predefined pipeline variables reference.

Tutorial Hello World

In this example we are using the containerized hello world repository. This is a sample Hello World CMake project. The application only echos Hello from $HOSTNAME, but this should demonstrate the idea of how to run a program on multiple nodes. The pipeline instructions are inside the file ci/cscs.yml. Let's walk through the pipeline bit by bit.

include:
  - remote: 'https://gitlab.com/cscs-ci/recipes/-/raw/master/templates/v2/.ci-ext.yml'

This block includes a yaml file which contains definitions with default values to build and run containers. Have a look inside this file to see available building blocks.

stages:
  - build
  - test

Here we define two different stages, named build and test. The names can be chosen freely.

variables:
  PERSIST_IMAGE_NAME: $CSCS_REGISTRY_PATH/helloworld:$CI_COMMIT_SHORT_SHA

This block defines variables that will apply to all jobs. See CI variables.

build_job:
  stage: build
  extends: .container-builder-cscs-zen2
  variables:
    DOCKERFILE: ci/docker/Dockerfile.build

This adds a job named build_job to the stage build. This runner expects a Dockerfile as input, which is specified in the variable DOCKERFILE. The resulting container name is specified with the variable PERSIST_IMAGE_NAME, which has been defined already above, therefore it does not need to be explicitly mentioned in the variables block, again. There is further documentation of this runner at gitlab-runner-k8s-container-builder

test_job:
  stage: test
  extends: .container-runner-eiger-mc
  image: $PERSIST_IMAGE_NAME
  script:
    - /opt/helloworld/bin/hello
  variables:
    SLURM_JOB_NUM_NODES: 2
    SLURM_NTASKS: 2

This block defines a test job. The job will be executed by the daint-container-runner. This runner will pull the image on the vCluster Piz Daint and run the commands as specified in the script tag. In this example we are requesting 2 nodes with 1 task on each node, i.e. 2 tasks total. All Slurm environment variables are supported. The commands will be running inside the container specified by the image tag. The runner is hosted here and supports several other variables.

Containerized CI at CSCS

Enable CI for your project

While the procedure to enable CSCS CI for your repository consists of only a few outlined below, many of them require features in GitHub, GitLab or Bitbucket. The links in the text contain additional steps which may be needed. Some of those documents are non-trivial, especially if you do not have considerable background in the repository features. Plan sufficient time for the setup and contact a GitHub/GitLab/Bitbucket professional, if needed.

Register your project with CSCS: The first step to use containerized CI/CD is to register your Git repository with CSCS. Please open an Service Desk ticket for this step. Once your project has been registered you will be provided with a webhook-secret.
Set up CI: Head to the CI overview page, login with your CSCS credentials, and go to the newly registered project.
Add FirecREST tokens: Open the Admin config, and follow the guide (click on the small black triangle next to Firecrest client ID). Enter all fields for FirecREST, i.e., client-id, client-secret and default Slurm account for job submission.
(Optional) Private project: If your Git repository is a private repository make sure to check the Private repository box and follow the instructions to add an SSH key to your Git repository.
Add notification token: On the setup page you will also find the field Notification token. The token is live tested, and you will see a green checkmark when the token is valid and can be used by the CI. It is mandatory to add a token so that your Git repository will be notified about the status of the build jobs. You cannot save anything as long as the notification token is invalid. (Click on the small triangle to get further instructions)
Add webhook: On the setup page you will find the Setup webhook details button. If you click on it you will see all the entries which have to be added to a new webhook in your Git repository. Follow the link given there to your repository, and add the webhook with the given entries.
Default trusted users and default CI-enabled branches: Provide the default list of trusted users and CI-enabled branches. The global configuration will apply to all pipelines that do not overwrite it explicitly.
Pipeline default: Your first pipeline has the name default. Click on Pipeline default to see the pipeline setup details. The name can be chosen freely but it cannot contain whitespaces (a short descriptive name). Update the entry point, trusted users and CI-enabled branches.
Submit your changes
(Optional) Add other pipelines: Add other pipelines with a different entry point if you need more pipelines.
Add entry point yaml files to Git repository: Commit the yaml entry point files to your repository. You should get notifications about the build status in your repository if everything is correct. See the Hello World Tutorial for a simple yaml-file.

Clarifications and pitfalls to the above-mentioned steps

This paragraph applies to GitHub. Other git providers have equivalent settings, but the labels are different.

The procedure above is deceptively simple: under the hood extremely complicated middleware is being configured. Some of the steps assume an extensive knowledge of GitLab/GitHub functionality: the pointers to the documentation therein may require extensive reading and preparation. If the steps are not completed rigorously, errors in the middleware will occur which are extremely difficult to debug. Here are some points to avoid the key pitfalls

Add webhook: Here you need to click on Webhook setup details (top of page) which then contains a pointer to the repository settings and below it the information which needs to be added when you click on that link. Follow the link and click on Add webhook.
- Here it is crucial to add the correct webhook secret provided by the CSCS-CI administrator.
- Which events would you like to trigger this webhook? "Just the push event" will literally only trigger the notification if you perform git push. It may be easier initially to enable 'Send me everything', as suggested in the instructions.
- If and when things do not work, you go to "Webhooks" in the repository settings and edit the webhook (button on right). In this edit mode one can click on "Recent Deliveries" to see the notifications. If you click on the individual deliveries, you can see the Request and the Response, which is sometimes helpful with debugging.
Add notification token: This field will start out with three asterisks (i.e., a hidden token) with a checkmark/red-cross, indicating whether the token is valid/invalid and can write commit statuses.
- Clicking on the little triangle will lead you to well-written but voluminous GitHub documentation. We discourage the use of fine-grained tokens. Fine-grained tokens are unsupported, and come with many pitfalls. They can work, but must be enabled at the organization level by an admin, and must be created in the correct organization.
- You must choose the correct resource owner, i.e., the organization that the project belongs to. If the organization is not listed, then it has disabled fine-grained tokens at the organization level. It can only be enabled globally on an organization by an admin. As for the repository you can restrict it to only the repository that you want to notify with this token or all repositories. Even if you choose "All repositories", it is still restricted to the organization, and does not grant the access to any repository outside of the resource owner.
- Once the token is generated, you will see it exactly once, so copy it immediately into your setup. You will not be able to go back later on to pick it up, and if you do miss the opportunity, the best procedure is to delete existing tokens and generate new ones.

Understanding when CI is triggered

Push events

Every pipeline can define its own list of CI-enabled branches
If a pipeline does not define a list of CI-enabled branches, the global list will be used
If you push changes to a branch every pipeline that has this branch in its list of CI-enabled branches will be triggered
If the global list and all pipelines have an empty list of CI-enabled branches, then CI will never be triggered on push events

Pull requests (Merge requests)

For simplicity we use PR to mean Pull Request, although some providers call it a Merge request. It is the same thing.
Every pipeline can define its own list of trusted users.
If a pipeline does not define a list of trusted users, the global list will be used.
If a PR is opened/edited and targets a CI-enabled branch, and the source branch is not from a fork, then all pipelines will be started that have the target branch in its list of CI-enabled branches.
If a PR is opened/edited and targets a CI-enabled branch, but the source branch is from a fork, then a pipeline will be automatically started if and only if the fork is from a user in the pipeline's trusted user list and the target branch is in the pipeline's CI-enabled branches.

`cscs-ci run` comment

You have an open PR
You want to trigger a specific pipeline
Write a comment inside the PR with the text cscs-ci run PIPELINE_NAME_1,PIPELINE_NAME_2
Special case: You have only one pipeline, then you can skip the pipeline names and write only the comment cscs-ci run
The pipeline will only be triggered, if the commenting user is in the pipeline's trusted users list.
Only the first line of the comment will be evaluated, i.e. you can add context from line 2 onwards.
The target branch is ignored, i.e. you can test a pipeline even if the target branch is not in the pipeline's CI-enabled branches.
Advanced cscs-ci run command is possible to inject variables into the pipeline (exposed as environment variables)
- cscs-ci run PIPELINE_NAME;MY_VARIABLE=some_value;ANOTHER_VAR=other_value, this will trigger the pipeline PIPELINE_NAME, and in your jobs there will be the environment variables MY_VARIABLE and ANOTHER_VAR available.
- Disallowed characters for PIPELINE_NAME, variable name and variable value are ,;=, because they serve as separators of the different components.

API call triggering

It is possible to trigger a pipeline via an API call
Create a file named data.yaml, with the content

ref: main
pipeline: pipeline_name
variables:
  MY_VARIABLE: some_value
  ANOTHER_VAR: other_value

Send a POST request to the middleware curl -X POST -u 'repository_id:webhook_secret' --data-binary @data.yaml https://cicd-ext-mw.cscs.ch/ci/pipeline/trigger - replace repository_id and webhook_secret with your credentials

Understanding the underlying workflow

Typical users do not need to know the underlying workflow behind the scenes, so you can stop reading here. However, it might put the above-mentioned steps into perspective. It also can give you background for inquiring if and when something in the procedure does not go as expected.

Workflow (exemplified on icon-exclaim)

(Prerequisite) icon-exclaim will have a webhook set up
You make some change in the icon-exclaim repository
GitHub sends a webhook event to cicd-ext-mw.cscs.ch (CI middleware)
A technical detail, absolutely unimportant for the user to know what's happening here: CI middleware fetches your repository from GitHub and pushes a mirror to GitLab
GitLab sees a change in the repository and starts a pipeline (i.e. it uses the CI yaml as entry point)
If the repository uses git submodules, GIT_SUBMODULE_STRATEGY: recursive has to be specified (see GitLab documentation)
The specified runner, which has as input a Dockerfile (specified in the variable DOCKERFILE ), will take this Dockerfile and execute docker build -f $DOCKERFILE . , where the build context is the whole (recursively) cloned repository

CI variables

Many variables exist during a pipeline run, they are documented at Gitlab's predefined variables. Additionally to CI variables available through Gitlab, there are a few CSCS specific pipeline variables:

Variable	Value	Additional information
`CSCS_REGISTRY`	jfrog.svc.cscs.ch	CSCS internal registry, preferred registry to store your container images
`CSCS_REGISTRY_PATH`	jfrog.svc.cscs.ch/docker-ci-ext/<repository-id>	The prefix path in the CSCS internal container image registry, to which your pipeline has write access. Within this prefix, you can choose any directory structure. Images that are pushed to a path matching `/public/` , can be pulled by anybody within CSCS network
`CSCS_CI_MW_URL`	https://cicd-ext-mw.cscs.ch/ci	The URL of the middleware, the orchestrator software.
`CSCS_CI_DEFAULT_SLURM_ACCOUNT`	d123	The project to which accounting will go to. It is set up on the CI setup page in the Admin section. It can be overwritten via `SLURM_ACCOUNT` for individual jobs.
`CSCS_CI_ORIG_CLONE_URL`	https://github.com/my-org/my-project (public project) git@github.com:my-org/my-project (private project)	Clone URL for git. This is needed for some implementation details of the gitlab-runner custom executor. This is the clone URL of the registered project, i.e. this is not the clone URL of the mirror project.

Example projects

Here are a couple of projects which use this CI setup. Please have a look there for more advanced usage:

dcomex-framework: entry point is ci/prototype.yml
utopia: two pipelines, with entry points ci/cscs/mc/gitlab-daint.yml and ci/cscs/gpu/gitlab-daint.yml
mars: two pipelines, with entry points ci/gitlab/cscs/gpu/gitlab-daint.yml and ci/gitlab/cscs/mc/gitlab-daint.yml
sparse_accumulation: entry point is ci/pipeline.yml
gt4py: entry point is ci/cscs-ci.yml
SIRIUS: entry point is ci/cscs-daint.yml
sphericart: entry point is ci/pipeline.yml