Using Matchbox to Install Container Linux

netbooting a collection of Up boards and installing CoreOS Container Linux on them

Table of Contents

Installation

Matchbox

matchbox is an HTTP and gRPC service that renders signed Ignition configs, cloud-configs, network boot configs, and metadata to machines to create CoreOS Container Linux clusters.

Aborted Copr Attempt:

My pre-existing server is running Fedora, so to get matchbox installed, I tried to do it the easy way by enabling the Copr repo:

dnf copr enable @CoreOS/matchbox
dnf install matchbox

Naturally, this gives an error:

Failed to synchronize cache for repo 'group_CoreOS-matchbox', disabling.

It turns out the one in the Copr repo is a year old anyway, so it’s no big loss.

Via Docker:

Open up port 8080 and 8081:

firewall-cmd --permanent --zone=k8s --add-port=8080/tcp
firewall-cmd --permanent --zone=k8s --add-port=8081/tcp

Using my LetsEncrypt wildcart keys, I set up /etc/matchbox:

cp /etc/letsencrypt/archive/ressman.org/privkey.pem /etc/matchbox/server.key
cp /etc/letsencrypt/archive/ressman.org/fullchain1.pem /etc/matchbox/ca.crt
cp /etc/letsencrypt/archive/ressman.org/cert1.pem /etc/matchbox/server.crt

and up we go with the container:

# docker run \
    --net=host \
    --rm \
    -v /var/lib/matchbox:/var/lib/matchbox:Z \
    -v /etc/matchbox:/etc/matchbox:Z,ro \
    quay.io/coreos/matchbox:latest \
    -address=0.0.0.0:8080 \
    -rpc-address=0.0.0.0:8081 \
    -log-level=debug

time="2018-05-08T20:22:14Z" level=info msg="Starting matchbox gRPC server on 0.0.0.0:8081" 
time="2018-05-08T20:22:14Z" level=info msg="Using TLS server certificate: /etc/matchbox/server.crt" 
time="2018-05-08T20:22:14Z" level=info msg="Using TLS server key: /etc/matchbox/server.key" 
time="2018-05-08T20:22:14Z" level=info msg="Using CA certificate: /etc/matchbox/ca.crt to authenticate client certificates"

Looks good!

Run Container Automatically

I just wanted to get it up and running, so instead of doing anything fancy with podman or runc or anything, I just made a /etc/systemd/system/docker.matchbox systemd unit file:

[Unit]
Description=matchbox (Docker)
After=docker.service
Requires=docker.service
 
[Service]
TimeoutStartSec=0
Restart=always

ExecStartPre=-/usr/bin/docker kill matchbox
ExecStartPre=-/usr/bin/docker rm matchbox
ExecStartPre=-/usr/bin/docker pull "quay.io/coreos/matchbox:v0.7.0"
ExecStart=/usr/bin/docker run --name=matchbox --net=host --rm -v /var/lib/matchbox:/var/lib/matchbox:Z -v /etc/matchbox:/etc/matchbox:Z,ro quay.io/coreos/matchbox:latest -address=0.0.0.0:8080 -rpc-address=0.0.0.0:8081 -log-level=debug
ExecStop=/usr/bin/docker stop matchbox
 
[Install]
WantedBy=multi-user.target

Configuration

Okay, so matchbox is installed and running (I assume) correctly. Let’s give it something to do:

mkdir -p /var/lib/matchbox/{ignition,generic,groups,profiles,assets}

Assets

Download some CoreOS images:

[root@oliver tmp]# curl -sOL https://raw.githubusercontent.com/coreos/matchbox/master/scripts/get-coreos
[root@oliver tmp]# bash get-coreos stable 1688.5.3 /var/lib/matchbox/assets
Creating directory /var/lib/matchbox/assets/coreos/1688.5.3
Downloading CoreOS stable 1688.5.3 images and sigs to /var/lib/matchbox/assets/coreos/1688.5.3
CoreOS Image Signing Key
####################################################################################################### 100.0%
gpg: key 93D2DCB4: "CoreOS Buildbot (Offical Builds) <buildbot@coreos.com>" not changed
gpg: Total number processed: 1
gpg:              unchanged: 1
version.txt
####################################################################################################### 100.0%
coreos_production_pxe.vmlinuz...
####################################################################################################### 100.0%
coreos_production_pxe.vmlinuz.sig
####################################################################################################### 100.0%
coreos_production_pxe_image.cpio.gz
####################################################################################################### 100.0%
coreos_production_pxe_image.cpio.gz.sig
####################################################################################################### 100.0%
coreos_production_image.bin.bz2
####################################################################################################### 100.0%
coreos_production_image.bin.bz2.sig
####################################################################################################### 100.0%

Groups

Groups define selectors which match zero or more machines. Machine(s) matching a group will boot and provision according to the group’s Profile

[root@oliver groups]# jq . /var/lib/matchbox/groups/k8s-master01.json
{
  "name": "k8s-master01",
  "profile": "etcd",
  "selector": {
    "mac": "00:07:32:4e:0c:67"
  },
  "metadata": {
    "domain_name": "k8s-master01.ressman.org",
    "fleet_metadata": "role=etcd,name=k8s-master01",
    "etcd_name": "k8s-master01",
    "etcd_initial_cluster": "node1=http://k8s-master01.ressman.org:2380"
  }
}

Profiles

Profiles reference an Ignition config by name and define network boot settings

[root@oliver groups]# jq . /var/lib/matchbox/profiles/etcd.json
{
  "id": "etcd",
  "name": "Container Linux with etcd3",
  "ignition_id": "etcd3.yaml",
  "boot": {
    "kernel": "/assets/coreos/1688.5.3/coreos_production_pxe.vmlinuz",
    "initrd": [
      "/assets/coreos/1688.5.3/coreos_production_pxe_image.cpio.gz"
    ],
    "args": [
      "coreos.config.url=http://oliver.ressman.org:8080/ignition?uuid=${uuid}&mac=${mac:hexhyp}",
      "coreos.first_boot=yes",
      "coreos.autologin"
    ]
  }
}

Ignition/Container Linux Config Templates

Using the example template:

[root@oliver matchbox]# cat /var/lib/matchbox/ignition/etcd3.yaml 
---
systemd:
  units:
    - name: etcd-member.service
      enable: true
      dropins:
        - name: 40-etcd-cluster.conf
          contents: |
            [Service]
            Environment="ETCD_IMAGE_TAG=v3.2.0"
            Environment="ETCD_NAME={{.etcd_name}}"
            Environment="ETCD_ADVERTISE_CLIENT_URLS=http://{{.domain_name}}:2379"
            Environment="ETCD_INITIAL_ADVERTISE_PEER_URLS=http://{{.domain_name}}:2380"
            Environment="ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379"
            Environment="ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380"
            Environment="ETCD_INITIAL_CLUSTER={{.etcd_initial_cluster}}"
            Environment="ETCD_STRICT_RECONFIG_CHECK=true"
    - name: locksmithd.service
      dropins:
        - name: 40-etcd-lock.conf
          contents: |
            [Service]
            Environment="REBOOT_STRATEGY=etcd-lock"
{{ if index . "ssh_authorized_keys" }}
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        {{ range $element := .ssh_authorized_keys }}
        - {{$element}}
        {{end}}
{{end}}

Testing

A test request seems to indicate it’s working okay:

[root@oliver matchbox]# curl -s 'http://oliver.ressman.org:8080/ignition?mac=00:07:32:4e:0c:67' | jq .
{
  "ignition": {
    "config": {},
    "timeouts": {},
    "version": "2.1.0"
  },
  "networkd": {},
  "passwd": {},
  "storage": {},
  "systemd": {
    "units": [
      {
        "dropins": [
          {
            "contents": "[Service]\nEnvironment=\"ETCD_IMAGE_TAG=v3.2.0\"\nEnvironment=\"ETCD_NAME=k8s-master01\"\nEnvironment=\"ETCD_ADVERTISE_CLIENT_URLS=http://k8s-master01.ressman.org:2379\"\nEnvironment=\"ETCD_INITIAL_ADVERTISE_PEER_URLS=http://k8s-master01.ressman.org:2380\"\nEnvironment=\"ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379\"\nEnvironment=\"ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380\"\nEnvironment=\"ETCD_INITIAL_CLUSTER=node1=http://k8s-master01.ressman.org:2380\"\nEnvironment=\"ETCD_STRICT_RECONFIG_CHECK=true\"\n",
            "name": "40-etcd-cluster.conf"
          }
        ],
        "enable": true,
        "name": "etcd-member.service"
      },
      {
        "dropins": [
          {
            "contents": "[Service]\nEnvironment=\"REBOOT_STRATEGY=etcd-lock\"\n",
            "name": "40-etcd-lock.conf"
          }
        ],
        "name": "locksmithd.service"
      }
    ]
  }
}

This is new to me, so I don’t know if this is right, but at least it parses, so that must be a good sign, right?

Inevitable Failure

Actually, everything works pretty well all-in-all. The boxes boot, PXE, download iPXE, download the correct ignition configs and get the kernel and initrd.

They’re currently hanging at boot with this error:

dev-disk-by\x2dlabel-OEM.device: Job dev-disk-by\x2dlabel-OEM.device/start timed out.

But that’s in Linux, so I’m reasonably please about the whole thing. I’ll troubleshoot this error for a while but probably re-redeploy the cluster on Fedora Atomic just so I can get working. Fortunately, when I fix this, it will be trivial to go back and forth between Atomic and Container Linux.