GitHub for server automation — SSH certificates

ci cd

Recently, I have come across a HackerNews post on SSH certificate. While SSH certificate probably doesn’t makes sense for someone with only a handful of servers, I decided to do it anyway, just for fun.

SSH Certificate in a Nutshell

It’s 2022. Everybody has started to use at least public key authentication, which is sketched below. The server and the client each holds a private key. The server has a list of authorized client public keys in ~/.ssh/authorized_keys. The client has a list of known server public keys in ~/.ssh/known_hosts. During some SSH handshake, the two parties exchange their public keys and perform authentication.

All is good, except that given N clients and M servers, we have a N × M problem of public key delivery. Using SSH certificate, we can instead ask each client to trust some fixed certificate authority (CA), and each some server some other CA, reducing the complexity of delivery to N + M.

Certificate authentication is rather simple. A CA is nothing more than an OpenSSH private key. A client can trust a server-side CA (host CA) by adding its public key to its known_hosts file. Similarly, a server can trust client-side CAs (user CA) by adding their public keys to its sshd configuration. We then use the host CA to sign a server’s public key, along with an expiration date, its domain name, etc. Likewise, we use the user CA to sign a client’s public key, along with an expiration date, its allowed usernames, and allowed SSH capabilities. At the beginning of some SSH connection, each side presents its certificate to the other. The other side checks its signature against trusted CA’s public keys.

Notice:

For a concrete set-up tutorial, There is a fantastic tutorial by Teleport, which guides my little experiment.

Automation

I choose to only automate the renewal of server certificates. Client certificates are manually renewed because I only have one regularly used Mac. There is little to be gained by automating it with launchd.

There are a few platforms that I can use to sign and deliver the certificates:

So I end up with serverless solutions. My first thought is, of course, AWS Lambda. However, AWS Lambda has a few issues:

So I probably should use GitHub and continuously deploy to AWS Lambda via GitHub actions. But wait, why not simply use GitHub Actions?

Turns out GitHub Actions is the perfect candidate. Its Ubuntu image is battery included — it even has Swift preinstalled! And the first 2000 minutes of every month is free.

After hours of experimenting, I came to the following CI workflow. Both my host and user CAs are uploaded as encrypted secrets. The workflow’s first job is to create a temporary OpenSSH keypair, sign it, and set up permissions correctly, which gives the runner SSH access to all my servers. After that, it reads from a file containing a list of servers to renew certificates for. That file is version controlled. Finally, renewing certificates is no more than downloading the public key from the server, signing it locally, and uploading it back.

The workflow is scheduled to run every day. I also set the workflow_dispatch flag, so I can manually trigger it. During debug, I also set it to run on push.

One big mistake I made was that I thought I could directly reference all secrets of a repo using environment variables. I cannot, which seems obvious afterwards. I need to declare a new environment variable for that step/use, and reference the secret there. This use case is explicitly demoed at the very end of the encrypted secrets documentation, which I should have read more carefully.

A more time-consuming mistake is related to echo and shell. When writing a environment variable containing the OpenSSH private key to file using

echo $SSH_KEY > ~/.ssh/id_rsa

, one would get a file where all newline characters become the plain space. OpenSSH will report that key as invalid. This cost me nearly one hour to debug, because GitHub kindly masked any occurenece of any secret, making me unable to log out the content of the dumped key. Little tip: you can log your secret in base64 and GitHub cannot mask that.

The issue is due to the way shell interprets scripts. Shell splits the input by whitespace characters into a list of arguments, and echo inserts a space character between each argument as its output. To overcome this, simply do

echo "$SSH_KEY" > ~/.ssh/id_rsa

instead.

One last thing. One of my server is hosted in Aliyun. Aliyun will send a warning whenever there is SSH login from a new place. I have no idea where the GitHub hosted runners are hosted. Therefore, if I log into my server from runners directly, I might be spammed with false warnings, and let the actual useful ones slip. To solve this, I use ProxyJump to ask SSH to proxy all access to one of my other server, which has a fixed IP. That server essentially becomes a bastion host.