Overview

MergeTB is a large distributed system with a number of moving parts. The primary components in a Merge deployment are the Merge portal and a collection of testbed facilities. The portal provides user accounts, projects, shared storage, experiment control, access to running experiments, a web interface, and other user facing services. Testbed facilities provide the resources that underpin experiments. A good analogy here comes from operating systems. The portal is like user space and the testbed facility is like kernel space.

A single Merge portal can preside over many testbed facilities and support experimentation that spans those facilities. The portal runs on Kubernetes, and testbed facilities run on the Cogs.

The diagram above shows how things are organized. In the sections below we describe the various components in terms of API functionalities accessible to users.

Portal

The portal is hosted largely in a Kubernetes cluster. The one exception is the portal ingress systems, which are a combination application router and firewall that provide access to the Merge API and experiment development containers (XDC).

Below the ingress layer sits the MergeAPI, a bastion deployment that provides ssh access to XDCs called jumpc and an HTTP proxy deployment that provides web access to Jupyter interfaces that run on the XDC.

The Merge API is how clients interact with Merge. Currently there are two main clients the Merge web interface and a command line client. The API is an OpenAPI 2.0 spec that allows clients to do things like create/realize/materialize experiments, manage projects, attach XDCs to experiment networks, etc.

The policy layer sits between the API and the core services. It polices calls made to the API by clients to see if the caller is authorized to take the requested action. All requests from the API go through the policy layer before hitting the core services.

The core services implement the MergeAPI. This includes creating projects and managing membership, creating and destroying experiments, realizing and materializing experiments, compiling analyzing and reticulating experiments, and managing the source code experiments are based on.

Experiment development containers (XDC) are on-demand experiment development and automation environments for users. These are launched through the Merge API and are accessible through jumpc and the http proxy. Within Kubernetes they are managed by a Kubernetes operator, called the xdc-operator.

XDCs attach to materialized experiment networks through the Wireguard coordinator wgcoord. When a user requests that their XDC be attached to an experiment, the wgcoord sets up a Wireguard based VPN between the XDC and the testbed facilities the experiment is materialized on.

Facilities

Testbed facilities are built on the Merge testbed technology stack. The top level component of a MergeTB facility is the commander. The commander presents the Merge materialization API - a protocol between a Merge portal and testbed facilities that allows experiments to be materialized. However, the commander itself does not implement the API. Instead it provides hooks for components called drivers to register to receive Merge materialization commands. So it's primarily a delegation point.

Drivers are a part of the Cogs testbed automation system. When they receive delegated commands from a commander their job is to take the relatively high-level information provided in the materialization request and turn it into an actionable task that results in the provisioning of the requested resources. The drivers also calculate dependencies between tasks. The materialization interface between the portal and testbed facilities is a batch interface, and many materialization fragments can be received at once. The driver takes all the fragments received in a batch and creates a directed acyclic graph (DAG) based on the interdependencies between tasks and saves the DAG do the Cogs runtime configuration database.

The rex service watches the runtime configuration database and executes any pending tasks it finds. Rex uses the DAG structure of the task graph to maximize parallelism when executing tasks, forking asynchronous jobs whenever possible. In order to actually accomplish tasks, Rex has to interact with a number of testbed subsystems such as DHCP/DNS, node configuration, etc. This is where the testbed technology stack comes in.

The MergeTB testbed technology stack is a set of API driven modular components that cover the core capabilities any network testbed needs. Every testbed technology stack component is completely self contained, allowing individual tech stack components to be deployed within testbed facilities as needed. The binding glue that allows Rex to automate these technologies collectively is that they all implement gRPC interfaces. So for example, when Rex is setting up DHCP/DNS for an experiment, it does so through a well defined RPC interface implemented by the Nex service. The various testbed technologies available are listed below in the Resource Space section.

Source Code

All of the MergeTB source code is on GitLab.

In addition to being distributed across a number of runtime components, Mergetb is also distributed across a number of code repositories. MergeTB is broadly partitioned into two problem spaces

  • Support for experimentation
  • Support for managing and operating resources

The distinction between these two problem spaces is quite similar to user space and the kernel in operating systems.

Experimentation Space

Experimentation space revolves around supporting experimenters directly via

The portal is a centralized entity that presides over a network of testbed facilities.

Resource Space

Resource space revolves around resource management and materialization of experiments through automated provisioning. MergeTB testbed facilities are built on the MergeTB technology stack. This technology stack is a set of modular components, each providing a specific testbed facility functionality. The components are orchestrated by an experiment materialization engine called the Cogs.

The primary elements of the MergeTB testbed technology stack include the following. The common theme for all of these components is they are all API driven, all expose gRPC interfaces and are all completely independent systems.

  • Foundry: A node configuration service and client for configuring nodes according to experiment specifications.
  • Images: A set of base operating system image builds.
  • Gobble: An EVPN service endpoint for connecting testbed services to experiment nodes over isolated networks.
  • Beluga: A modular power controller built on a plugin architecture for controlling the power states of nodes across multiple types of power distribution units.
  • Rally: A network mass storage system built on Ceph, capable for provisioning file systems and block devices for experimentation.
  • Sled: A node imaging system for stamping operating system images onto nodes.
  • Nex: An automation friendly DHCP/DNS server.
  • Canopy: A virtual network synthesis framework for building VXLAN and VLAN based virtual networks across as switching mesh.
  • Cogs: A materialization automation system that has hooks for all the aforementioned.