Jobs and CronJobs

Environment Variables

We are going to use some environment variables in this tutorial. Please make sure you have set them correctly.

# check if the environment variables are set if not set them
export NAMESPACE=<namespace>
echo $NAMESPACE

Jobs are different from normal Deployments: Jobs execute a time-constrained operation and report the result as soon as they are finished; think of a batch job. To achieve this, a Job creates a Pod and runs a defined command. A Job isn’t limited to creating a single Pod, it can also create multiple Pods. When a Job is deleted, the Pods started (and stopped) by the Job are also deleted.

For example, a Job is used to ensure that a Pod is run until its completion. If a Pod fails, for example because of a Node error, the Job starts a new one. A Job can also be used to start multiple Pods in parallel.

More detailed information can be retrieved from the Kubernetes documentation.

Task 1: Create a Job for a database dump

We want to create a Job that creates a postgresql database dump and stores it in a file. The Job should run once and then terminate.

Create a file called job.yaml with the following content:

apiVersion: batch/v1
kind: Job
metadata:
  name: pg-dump
spec:
  template:
    spec:
      containers:
      - name: pg-dump
        image: postgres:14.5
        command: 
        - 'bash'
        - '-eo'
        - 'pipefail'
        - '-c'
        - >
          trap "echo Backup failed; exit 0" ERR;
          FILENAME=backup-$(date +%Y-%m-%d_%H-%M-%S).sql.gz;
          echo "creating .pgpass file";
          echo "$POSTGRES_HOST:$POSTGRES_PORT:$POSTGRES_DB:$POSTGRES_USER:$POSTGRES_PASSWORD" > ~/.pgpass;
          chmod 0600 ~/.pgpass;
          echo "creating dump";
          pg_dump -h $POSTGRES_HOST -p $POSTGRES_PORT -U $POSTGRES_USER $POSTGRES_DB | gzip > $FILENAME;
          echo "";
          echo "Backup created: $FILENAME"; du -h $FILENAME;
        env:
        - name: POSTGRES_USER
          value: "<postgresql user>"
        - name: POSTGRES_HOST
          value: "<postgresql host>"
        - name: POSTGRES_PORT
          value: "5432"
        - name: POSTGRES_DB
          value: "<postgresql database>"
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: job-secret
              key: POSTGRES_PASSWORD
        resources:
          limits:
            cpu: 40m
            memory: 64Mi
          requests:
            cpu: 10m
            memory: 32Mi
      restartPolicy: Never

The Job uses the postgres:14.5 image to create a database dump. The database password credentials are stored in a secret called job-secret. The Job is configured to run only once and then terminate.

Create the secret file job-secret.yaml with the following content:

apiVersion: v1
kind: Secret
metadata:
  name: job-secret
stringData:
  POSTGRES_PASSWORD: "<postgresql password>"

Execute the following commands to create the Job and the secret:

kubectl apply -f job-secret.yaml --namespace $NAMESPACE
kubectl apply -f job.yaml --namespace $NAMESPACE

Task 2: Check the Job status

Check the status of the Job:

kubectl get jobs --namespace $NAMESPACE

The output should look like this:

NAME      COMPLETIONS   DURATION   AGE
pg-dump   1/1           19s        4m47

The Job is completed after 19 seconds. The Job created a Pod and executed the command defined in the command section of the Job. The output of the command is stored in the Job status.

Check the Job pod logs:

kubectl logs -f jobs/pg-dump --namespace $NAMESPACE

The output should look like this:

creating .pgpass file
creating dump

Backup created: backup-2022-11-21_09-38-50.sql.gz
4.0K    backup-2022-11-21_09-38-50.sql.gz

The Job created a database dump and stored it in a file called backup-2022-11-21_09-38-50.sql.gz.

To show all Pods belonging to a Job in a human-readable format, the following command can be used:

kubectl get pods --selector=job-name=pg-dump --output=go-template='{{range .items}}{{.metadata.name}}{{end}}' --namespace $NAMESPACE

Task 3: Create a CronJob

A CronJob is nothing else than a resource which creates a Job at a defined time, which in turn starts (as we saw in the previous section) a Pod to run a command. Typical use cases are cleanup Jobs, which tidy up old data for a running Pod, or a Job to regularly create and save a database dump as we just did during this tutorial.

The CronJob’s definition will remind you of the Deployment’s structure, or really any other control resource. There’s most importantly the schedule specification in cron schedule format, some more things you could define and then the Job’s definition itself that is going to be created by the CronJob.

Try to create a CronJob that runs every hour at minute 0 and creates a database dump. The CronJob should be named pg-dump-cronjob and the Job should be named pg-dump-job. The Job should be created in the same namespace as the CronJob.

solution

apiVersion: batch/v1
kind: CronJob
metadata:
  name: pg-dump-cronjob
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: pg-dump
            image: postgres:14.5
            command: 
            - 'bash'
            - '-eo'
            - 'pipefail'
            - '-c'
            - >
              trap "echo Backup failed; exit 0" ERR;
              FILENAME=backup-$(date +%Y-%m-%d_%H-%M-%S).sql.gz;
              echo "creating .pgpass file";
              echo "$POSTGRES_HOST:$POSTGRES_PORT:$POSTGRES_DB:$POSTGRES_USER:$POSTGRES_PASSWORD" > ~/.pgpass;
              chmod 0600 ~/.pgpass;
              echo "creating dump";
              pg_dump -h $POSTGRES_HOST -p $POSTGRES_PORT -U $POSTGRES_USER $POSTGRES_DB | gzip > $FILENAME;
              echo "";
              echo "Backup created: $FILENAME"; du -h $FILENAME;
            env:
            - name: POSTGRES_USER
              value: "<postgresql user>"
            - name: POSTGRES_HOST
              value: "<postgresql host>"
            - name: POSTGRES_PORT
              value: "5432"
            - name: POSTGRES_DB
              value: "<postgresql database>"
            - name: POSTGRES_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: job-secret
                  key: POSTGRES_PASSWORD
            resources:
              limits:
                cpu: 40m
                memory: 64Mi
              requests:
                cpu: 10m
                memory: 32Mi
          restartPolicy: Never

Further information can be found in the Kubernetes CronJob documentation.