Adding a new health subsystem in SPIFFE/SPIRE

Sachin Kumar Singh
7 min readNov 29, 2021

TL;DR: In this post, I’ll introduce the basic concepts around SPIFFE and SPIRE and also give a brief description of the work done during the LFX mentorship on the new SPIRE health subsystem.

Introduction

A common problem in cloud-based distributed systems is how to authenticate access between components of a system. The Secure Production Identity Framework For Everyone (SPIFFE) enables a service to securely obtain an identity from a trusted SPIFFE provider, giving the service a way to authenticate itself in a platform-agnostic way.

The SPIFFE Runtime Environment (SPIRE) is a production-ready open-source implementation of the SPIFFE specifications. The SPIRE project (as well as SPIFFE) is hosted by the Cloud Native Computing Foundation (CNCF).

Background and Concepts

The SPIFFE standards

The Secure Production Identity Framework For Everyone (or SPIFFE) is a set of open-source standards for software identity.

  • SPIFFE ID

A SPIFFE ID is a string that functions as the unique name for a service. It is modeled as a URI and is made up of several parts. For example, spiffe://example.com/myservice is a valid SPIFFE ID, where spiffe:// is the URI scheme, example.com is the trust domain name, and myservice is the name of the workload.

  • Trust Domain

Trust domains are used to manage administrative and security boundaries within and between organizations, and every SPIFFE ID has the name of its trust domain embedded in it. A trust domain could represent an individual, organization, environment or department running their own independent SPIFFE infrastructure.

  • SPIFFE Verifiable Identity Document (SVID)

The SPIFFE Verifiable Identity Document (SVID) is a cryptographically-verifiable identity document that is used to prove a service’s identity to a peer. An SVID is considered valid if it has been signed by an authority within the SPIFFE ID’s trust domain. Two types of identity documents are defined for use as an SVID by the SPIFFE specifications: X.509 and JWT.

  • SPIFFE Workload API

The SPIFFE Workload API is a local, non-networked API that workloads use to get their current identity documents, trust bundles, and related information. More details are given here.

Workload API (source [1])
  • Trust Bundle

A trust bundle is a collection of one or more certificate authority (CA) root certificates that the workload should consider trustworthy. Each SVID type has a specific way that it is represented in this bundle (e.g. for X509-SVID, CA certificates representing the public key(s) are included).

SPIRE, the SPIFFE Runtime Environment

SPIRE is a production-ready implementation of the SPIFFE APIs that performs node and workload attestation in order to securely issue SVIDs to workloads and verify the SVIDs of other workloads, based on a predefined set of conditions.

A SPIRE deployment is composed of a SPIRE Server and one or more SPIRE Agents. A server acts as a signing authority for identities issued to a set of workloads via agents.

SPIRE server and agent (source [2])

The detailed description and components of server and agent is given here and here respectively.

The basic concepts of SPIRE are as follows:

  • Workload Registration

Workload registration tells SPIRE how to identify the workload and which SPIFFE ID to give it. A registration entry maps an identity — in the form of a SPIFFE ID — to a set of properties known as selectors that the workload must possess in order to be issued a particular identity. During workload attestation, the agent uses these selector values to verify the workload’s identity.

  • Attestation

Attestation is the process through which information about workloads and their environment is discovered and asserted.

There are two flavors of attestation in SPIRE: node attestation and workload attestation.

  1. Node Attestation

Node attestation occurs when an agent starts for the first time. In node attestation, the agent contacts the SPIRE Server and enters into an exchange in which the server aims to positively identify the node the agent is running on and all its related selectors.

To accomplish this, a platform-specific plugin is exercised in both the agent and the server.

Below is a given example of node attestation of a node running in AWS.

Node Attestation (source [1])
  1. The agent gathers proof of the node’s identity by calling an AWS API.
  2. The agent sends this proof of identity to the server.
  3. The server validates proof of identity obtained in step 2 by calling out to the AWS API and then creates a SPIFFE ID for the agent.

2. Workload Attestation

Workload attestation is the process of determining the workload identity that will result in an identity document being issued and delivered. The attestation occurs any time a workload calls and establishes a connection to the SPIFFE Workload API (on every RPC call a workload makes to the API), and the process from there on is driven by a set of plugins on the SPIRE Agent.

The agent uses certain properties of the locally available authorities (such as the node’s OS kernel, or a local kubelet running on the same node) in order to determine the properties of the process calling the Workload API. Properties include the User ID (uid), Group ID (gid), filesystem path on a UNIX system, and the namespace and service account that the process is running in Kubernetes.

Workload Attestation (source [1])

1. A workload calls the Workload API to request an SVID.

2. The agent interrogates the node’s kernel to get the attributes of the calling process.

3. The agent gets the discovered selectors.

4. The agent determines the workload’s identity by comparing discovered selectors to registration entries and returns the correct SVID to the workload.

  • Registration entries

For SPIRE to issue workload identities, it must first be taught about the workloads expected or allowed in its environment; what workloads are supposed to run where, what their SPIFFE IDs and general shape should be. SPIRE learns this information via registration entries, which are objects that are created and managed using SPIRE APIs that contain the aforementioned information.

Three core attributes of registration entries are Parent ID (where a particular workload should be running), SPIFFE ID (when we see this workload, what SPIFFE ID should we issue it), and Binding info (binds SPIFFE IDs to the nodes and workloads that they are meant to represent).

A registration entry can describe either a group of nodes or a workload.

  1. Node Entries

Registration entries that describe a node (or a group of nodes) use selectors generated by node attestation to assign a SPIFFE ID, which can be referenced later when registering workloads. Node entries have their Parent ID set to the SPIFFE ID of the SPIRE Server, as it is the server that is performing attestation and asserting that the node in question does indeed match the selectors defined by the entry.

2. Workload Entries

Registration entries that describe a workload use selectors generated by workload attestation to assign a SPIFFE ID to workloads when a certain set of conditions are met. When the Parent ID and selectors conditions are met, the workload can receive a SPIFFE ID.

Work Done

During the mentorship, I worked on implementing a new SPIRE health subsystem. SPIRE uses go-health to make health calls to different subsystems and getting the response. For /live checks, a simple HTTP 200 response validated the liveness of the subsystem and for /ready checks, the server’s ability to fetch a bundle as a health check for readiness of the server.

The new implementation uses different liveness and readiness checks for each subsystem and accumulates them to determine the global readiness and liveness of server and agent. For example, for CA health checks, both readiness and liveness are determined by whether or not the X509 CA was successfully signed.

For the server, we used CA, manager, and catalog datastore as the most prominent subsystems to determine health checks. This is just the initial design and more subsystems might be included to better determine the health checks.

To enable the server health checks, add the following lines in conf/server/server.conf:

health_checks {
listener_enabled = true
bind_address = “localhost”
bind_port = “8080”
live_path = “/live”
ready_path = “/ready”
}

Now we can ping (using cURL) the localhost at port 8080 to find the live and readiness checks result:

$ curl http://localhost:8080/live # for liveness check
$ curl http://localhost:8080/ready # for readiness check

Conclusion

LFX mentorship was a wonderful experience for me. I learned a lot of useful stuff around microservice security and authentication, used tools like delve, SPIRE, and learned Golang best practices. I will try my best to further contribute to SPIFFE/SPIRE. Thanks to Evan Gilman and Andrew Harding for all their help and guidance and for making this mentorship a fun learning experience. If you’re interested in contributing to SPIFFE or SPIRE, please join the community!

References

  1. Solving the Bottom Turtle
  2. SPIFFE/SPIRE Documentations

--

--