Table of Contents
Lab Topology Builder Documentation¶
The Lab Topology Builder (LTB) is an open source project that allows users to deploy networking labs on Kubernetes. It is a tool that enables you to build a topology of virtual machines and containers, that are connected to each other according to the network topology you have defined.
To get started, please refer to the User Guide.
Features¶
- Management of networking labs on Kubernetes
- Status querying of lab deployments
- Remote access to lab nodes' console via web browser
- Remote access to lab nodes' OOB management (e.g. SSH)
- Management of custom node types
Roadmap¶
User Guide¶
Installation Pre-requisites¶
Tool | Version | Installation | Description |
---|---|---|---|
Kubernetes | ^1.26.0 | Installation | Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. |
Kubevirt | 0.59.0 | Installation | Kubevirt is a Kubernetes add-on to run virtual machines on Kubernetes. |
Multus-CNI | 3.9.0 | Installation | Multus-CNI is a plugin for K8s to attach multiple network interfaces to pods. |
Operator Lifecycle Manager | ^0.24.0 | Installation | Operator Lifecycle Manager (OLM) helps users install, update, and manage the lifecycle of all Operators and their associated services running across their Kubernetes clusters. |
Alternative OLM installation:
curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.25.0/install.sh | bash -s v0.25.0
Change the version to the desired one.
Installation of the LTB K8s Operator¶
-
Install the LTB Operator by creating a catalog source and subscription.
kubectl apply -f https://raw.githubusercontent.com/Lab-Topology-Builder/LTB-K8s-Backend/main/install/catalogsource.yaml -f https://raw.githubusercontent.com/Lab-Topology-Builder/LTB-K8s-Backend/main/install/subscription.yaml
-
Wait for the LTB Operator to be installed (this might take a few seconds).
kubectl get csv -n operators -w
Usage¶
To create a lab you'll need to create at least one node type and one lab template. Node types define the basic properties of a node. For VMs this includes everything that can be defined in a Kubevirt VirtualMachineSpec and for pods everything that can be defined in a Kubernetes PodSpec.
In order to provide better reusability of node types, you can use Go templating Syntax to include information from the lab template (like configuration or node name) in the node type. The following example node types show how this can be done. You can use them as a starting point for your own node types.
Example Node Type¶
This is an example of a VM node type. It creates a VM with 2 vCPUs and 4GB of RAM, using the Ubuntu 22.04 container disk image from quay.io/containerdisks/ubuntu and the cloudInitNoCloud
volume source to provide a cloud-init configuration to the VM.
Everything that is defined in the node
field of the lab template is available to the node type via the .
variable.
Example: {{ .Name }}
will be replaced with the name of the node from the lab template.
Currently, you cannot provide the cloud-init configuration as a YAML string via the .Config field of the lab template. Instead, you have to encode it as base64 string and therefore use the userDataBase64
field of the volume source, because of indentation issues while rendering configuration.
apiVersion: ltb-backend.ltb/v1alpha1
kind: NodeType
metadata:
name: nodetypeubuntuvm
spec:
kind: vm
nodeSpec: |
running: true
template:
spec:
domain:
resources:
requests:
memory: 4096M
cpu:
cores: 2
devices:
disks:
- name: containerdisk
disk:
bus: virtio
- name: cloudinitdisk
disk:
bus: virtio
terminationGracePeriodSeconds: 0
volumes:
- name: containerdisk
containerDisk:
image: quay.io/containerdisks/ubuntu:22.04
- name: cloudinitdisk
cloudInitNoCloud:
userDataBase64: {{ .Config }}
This is an example of a generic pod node type. It creates a pod with a single container. The container name, container image, command and ports to expose are taken from the lab template.
apiVersion: ltb-backend.ltb/v1alpha1
kind: NodeType
metadata:
name: genericpod
spec:
kind: pod
nodeSpec: |
containers:
- name: {{ .Name }}
image: {{ .NodeTypeRef.Image}}:{{ .NodeTypeRef.Version }}
command: {{ .Config }}
ports:
{{- range $index, $port := .Ports }}
- name: {{ $port.Name }}
containerPort: {{ $port.Port }}
protocol: {{ $port.Protocol }}
{{- end }}
After you have defined some node types, you can create a lab template. A lab template defines the nodes that should be created for a lab, how they should be configured and how they should be connected.
Example Lab Template¶
This is an example of a lab template, that can be used as a starting point for your own labs.
It uses the previously defined node types to create a VM and two pods. They are referenced via the nodeTypeRef
field.
The provided ports will be exposed to the host network and can be accessed via the node's IP address and the port number assigned by Kubernetes. You can retrieve the IP address of a node by running kubectl get node -o wide
and the port number by running kubectl get svc
.
Currently, there is no support for point to point connections between nodes. Instead, they are all connected to the same network. In the future, we plan to add support for point to point connections, that can be defined as neighbors in the lab template. The syntax for this is not yet definite, but it will probably look something like this:
neighbors:
- "sample-node-1:1,sample-node-2:1"
- "sample-node-2:2-sample-node-3:1"
This would connect the first port of sample-node-1
to the first port of sample-node-2
and the second port of sample-node-2
to the first port of sample-node-3
.
apiVersion: ltb-backend.ltb/v1alpha1
kind: LabTemplate
metadata:
name: labtemplate-sample
spec:
nodes:
- name: "sample-node-1"
nodeTypeRef:
type: "nodetypeubuntuvm"
config: "I2Nsb3VkLWNvbmZpZwpwYXNzd29yZDogdWJ1bnR1CmNocGFzc3dkOiB7IGV4cGlyZTogRmFsc2UgfQpzc2hfcHdhdXRoOiBUcnVlCnBhY2thZ2VzOgogLSBxZW11LWd1ZXN0LWFnZW50CiAtIGNtYXRyaXgKcnVuY21kOgogLSBbIHN5c3RlbWN0bCwgc3RhcnQsIHFlbXUtZ3Vlc3QtYWdlbnQgXQo="
ports:
- name: "ssh"
port: 22
protocol: "TCP"
- name: "sample-node-2"
nodeTypeRef:
type: "genericpod"
image: "ghcr.io/insrapperswil/network-ninja"
version: "latest"
ports:
- name: "ssh"
port: 22
protocol: "TCP"
config: '["/bin/bash", "-c", "apt update && apt install -y openssh-server && service ssh start && sleep 365d"]'
- name: "sample-node-3"
nodeTypeRef:
type: "genericpod"
image: "ubuntu"
version: "22.04"
ports:
- name: "ssh"
port: 22
protocol: "TCP"
config: '["/bin/bash", "-c", "apt update && apt install -y openssh-server && service ssh start && sleep 365d"]'
With the lab template defined, you can create a lab instance.
Example Lab Instance¶
This is an example of a lab instance, that can be used as a starting point for your own labs.
The lab instance references the previously defined lab template with the labTemplateReference
field.
You also need to provide a DNS address via the dnsAddress
field. This address will be used to create routes for the web terminal to the lab nodes.
For example, if you use the address example.com
, the console of a node called sample-node-1
will be available at https://labinstance-sample-sample-node-1.example.com/
via a web terminal.
Currently, there is no support to edit the lab instance after it has been created. If you want to change the lab, you have to delete the lab instance and create a new one.
apiVersion: ltb-backend.ltb/v1alpha1
kind: LabInstance
metadata:
name: labinstance-sample
spec:
labTemplateReference: "labtemplate-sample"
dnsAddress: "example.com"
Uninstall¶
-
Delete the subscription
kubectl delete subscriptions.operators.coreos.com -n operators ltb-subscription
-
Delete the CSV
kubectl delete csv -n operators ltb-operator.<version>
-
Delete the CRDs
kubectl delete crd labinstances.ltb-backend.ltb labtemplates.ltb-backend.ltb nodetypes.ltb-backend.ltb
-
Delete operator
kubectl delete operator ltb-operator.operators
-
Delete the CatalogSource
kubectl delete catalogsource.operators.coreos.com -n operators ltb-catalog
Concepts¶
Lab Topology Builder¶
The Lab Topology Builder is a network emulator that allows you to build a topology of virtual machines and containers, which are connected to each other according to the network topology you have defined.
Network Topology¶
The arrangement or pattern in which all nodes on a network are connected together is referred to as the network’s topology.
Here is an example of a network topology:
Lab¶
In our context, a lab refers to a networking lab consisting of interconnected nodes following a specific network topology.
LTB Operator¶
The LTB Operator is a K8s Operator for the LTB application, which is responsible for creating, configuring, and managing the emulated network topologies of the LTB application inside a Kubernetes cluster. It also automatically updates the status of the labs based on the current state of the associated containers and virtual machines, ensuring accurate and real-time lab information.
Lab Template¶
A LabTemplate
is a Kubernetes custom resource (CR), that defines a template for a lab. It contains information about which nodes are part of the lab, their configuration, and how they are connected to each other.
Lab Instance¶
A LabInstance
is a custom resource (CR) that describes a specific lab intended for deployment within a Kubernetes cluster.
It has a reference to the LabTemplate
you want to use and also has a status field that is updated by the LTB Operator. This status field shows how many pods and VMs are running in the lab and the status of the LabInstance
itself. In addition, it also has a dns address field, that will be used to access the nodes using the web-based terminal.
Node Type¶
In a network, a node represents any device that is part of the lab. A NodeType
is a CR that defines a type of node that can be part of a lab. You reference the NodeType
you want to have in your lab in the LabTemplate
.
Within LTB, a node can be either a KubeVirt virtual machine or a regular Kubernetes pod.
Links¶
If you would like to familiarize yourself with the Kubernetes concepts mentioned above, please refer to the following links:
API Reference¶
Packages¶
ltb-backend.ltb/v1alpha1¶
Resource Types¶
LabInstance¶
A lab instance is created as a specific instance of a deployed lab, using the configuration from the corresponding lab template.
Field | Description |
---|---|
apiVersion string |
ltb-backend.ltb/v1alpha1 |
kind string |
LabInstance |
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
spec LabInstanceSpec |
LabInstanceNodes¶
Configuration for a lab node.
Appears in: - LabTemplateSpec
Field | Description |
---|---|
name string |
The name of the lab node. |
nodeTypeRef NodeTypeRef |
The type of the lab node. |
interfaces NodeInterface array |
Array of interface configurations for the lab node. (currently not supported) |
config string |
The configuration for the lab node. |
ports Port array |
Array of ports which should be publicly exposed for the lab node. |
LabInstanceSpec¶
LabInstanceSpec define which LabTemplate should be used for the lab instance and the DNS address.
Appears in: - LabInstance
Field | Description |
---|---|
labTemplateReference string |
Reference to the name of a LabTemplate to use for the lab instance. |
dnsAddress string |
The DNS address, which will be used to expose the lab instance. It should point to the Kubernetes node where the lab instance is running. |
LabTemplate¶
Defines the lab topology, its nodes and their configuration.
Field | Description |
---|---|
apiVersion string |
ltb-backend.ltb/v1alpha1 |
kind string |
LabTemplate |
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
spec LabTemplateSpec |
LabTemplateSpec¶
LabTemplateSpec defines the Lab nodes and their connections.
Appears in: - LabTemplate
Field | Description |
---|---|
nodes LabInstanceNodes array |
Array of lab nodes and their configuration. |
neighbors string array |
Array of connections between lab nodes. (currently not supported) |
NodeInterface¶
Interface configuration for the lab node (currently not supported)
Appears in: - LabInstanceNodes
Field | Description |
---|---|
ipv4 string |
IPv4 address of the interface. |
ipv6 string |
IPv6 address of the interface. |
NodeType¶
NodeType defines a type of node that can be used in a lab template
Field | Description |
---|---|
apiVersion string |
ltb-backend.ltb/v1alpha1 |
kind string |
NodeType |
metadata ObjectMeta |
Refer to Kubernetes API documentation for fields of metadata . |
spec NodeTypeSpec |
NodeTypeRef¶
NodeTypeRef references a NodeType with the possibility to provide additional information to the NodeType.
Appears in: - LabInstanceNodes
Field | Description |
---|---|
type string |
Reference to the name of a NodeType. |
image string |
Image to use for the NodeType. Is available as variable in the NodeType and functionality depends on its usage. |
version string |
Version of the NodeType. Is available as variable in the NodeType and functionality depends on its usage. |
NodeTypeSpec¶
NodeTypeSpec defines the Kind and NodeSpec for a NodeType
Appears in: - NodeType
Field | Description |
---|---|
kind string |
Kind can be used to specify if the nodes is either a pod or a vm |
nodeSpec string |
NodeSpec is the PodSpec or VirtualMachineSpec configuration for the node with the possibility to use go templating syntax to include LabTemplate variables (see User Guide) See PodSpec and VirtualMachineSpec |
Port¶
Port of a lab node which should be publicly exposed.
Appears in: - LabInstanceNodes
Field | Description |
---|---|
name string |
Arbitrary name for the port. |
protocol Protocol |
Choose either TCP or UDP. |
port integer |
The port number to expose. |
Contributions ↵
Contributor Guide¶
Contributions are welcome and appreciated.
How it works¶
This project aims to follow the Kubernetes Operator pattern
It uses Controllers, which provides a reconcile function responsible for synchronizing resources until the desired state is reached on the cluster.
Development Environment¶
You’ll need a Kubernetes cluster to run against. You can find instructions on how to setup your dev cluster in the Dev Cluster Setup section.
Note: Your controller will automatically use the current context in your kubeconfig file (i.e. whatever cluster kubectl cluster-info
shows).
We recommend using VSCode with the Remote - Containers extension. This will allow you to use our devcontainer, which has all the tools needed to develop and test the LTB Operator already installed.
Prerequisites for recommended IDE setup¶
- Docker
- VSCode
- Remote - Containers extension
Getting Started¶
- Clone the repository
- Open the repository in VSCode
- Click the popup or use the command palette to reopen the repository in a container (Dev Containers: Reopen in Container)
Now you are ready to start developing!
Running the LTB Operator locally¶
You can run the LTB Operator locally on your machine. This is useful for quick testing and debugging. However, you will need to be aware that the LTB Operator will use the kubeconfig file on your machine, so you will need to make sure that the context is set to the cluster you want to run against. Therefore, it also does not use the RBAC rules it would usually be deployed with.
- Install the CRDs into the cluster:
make install
- Run your controller (this will run in the foreground, so switch to a new terminal if you want to leave it running):
make run
NOTE: You can also run this in one step by running: make install run
Uninstall CRDs¶
To delete the CRDs from the cluster:
make uninstall
Running the LTB Operator on the cluster¶
You can also run the LTB Operator on the cluster. This is useful for testing it in a more realistic environment. However, you will first need to login to some container registry that the cluster can access, so that you can push the LTB Operator image to that registry. This will allow you to test the LTB Operator's RBAC rules.
Make sure to replace <some-registry>
with the location of your container registry and <tag>
with the tag you want to use.
- Install Instances of Custom Resources:
kubectl apply -f config/samples/
- Build and push your image to the location specified by
IMG
:
make docker-build docker-push IMG=<some-registry>/ltb-operator:<tag>
- Deploy the controller to the cluster with the image specified by
IMG
:
make deploy IMG=<some-registry>/ltb-operator:<tag>
Undeploy controller¶
Undeploy the controller from the cluster:
make undeploy
Modifying the API definitions¶
If you are editing the API definitions, generate the manifests such as CRs or CRDs using:
make manifests
NOTE: Run make --help
for more information on all potential make
targets
You can find more information on how to develop the operator in the Operator-SDK Documentation and the Kubebuilder Documentation
Coding Conventions¶
We are following the Effective Go guidelines for coding conventions. The following is a summary of the most important conventions.
Naming¶
The following naming conventions are used in the project:
Naming conventions in Go¶
- camelCase for variables and functions, which are not exported
- PascalCase for types and functions that need to be exported
Examples¶
- labInstanceStatus: variable name for a status of a lab instance
- UpdateLabInstanceStatus: name for an exported function, starts with a capital letter
Formatting¶
We are using the gofmt from the Go standard library to format our code.
staticcheck is used as a linter in addition to the formatting guidelines from Effective Go, because it is the default linter of the Go extension for VS Code.
Development cluster setup¶
Steps to setup a Kubernetes development cluster for testing and development of the LTB Operator.
Prerequisites Remote Cluster¶
- Server with Linux OS (Recommended Ubuntu 22.04)
Prepare Node¶
sudo apt update
sudo apt upgrade -y
sudo swapoff -a
sudo sed -ri '/\sswap\s/s/^#?/#/' /etc/fstab
RKE2 Server Configuration¶
sudo mkdir -p /etc/rancher/rke2
sudo vim /etc/rancher/rke2/config.yaml
# /etc/rancher/rke2/config.yaml
write-kubeconfig-mode: "0644"
kube-apiserver-arg: "allow-privileged=true"
cni: multus,cilium
disable-kube-proxy: true
Cilium Configuration for Multus¶
sudo mkdir -p /var/lib/rancher/rke2/server/manifests
sudo vim /var/lib/rancher/rke2/server/manifests/rke2-cilium-config.yaml
# /var/lib/rancher/rke2/server/manifests/rke2-cilium-config.yaml
# k8sServiceHost/Port IP of Control Plane node default Port 6443
---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
cni:
chainingMode: "none"
exclusive: false
kubeProxyReplacement: strict
k8sServiceHost: "<NodeIP>"
k8sServicePort: 6443
operator:
replicas: 1
Install and start Server and check logs¶
curl -sfL https://get.rke2.io | INSTALL_RKE2_VERSION=v1.26.0+rke2r2 sudo -E sh -
sudo systemctl enable rke2-server.service
sudo systemctl start rke2-server.service
sudo journalctl -u rke2-server -f
Add Kubernetes tools to path and set kubeconfig¶
Adds kubectl, crictl and ctr to path.
echo 'export PATH="$PATH:/var/lib/rancher/rke2/bin"' >> ~/.bashrc
echo 'source <(kubectl completion bash)' >> ~/.bashrc
echo 'alias k=kubectl' >> ~/.bashrc
echo 'complete -o default -F __start_kubectl k' >>~/.bashrc
source ~/.bashrc
mkdir ~/.kube
ln -s /etc/rancher/rke2/rke2.yaml ~/.kube/config
Get Token for Agent¶
sudo cat /var/lib/rancher/rke2/server/node-token
RKE2 Agent Configuration (Optional)¶
sudo mkdir -p /etc/rancher/rke2
sudo vim /etc/rancher/rke2/config.yaml
# /etc/rancher/rke2/config.yaml
---
server: https://<server>:9345
token: <token from server node>
Install and start Agent and check logs¶
curl -sfL https://get.rke2.io | INSTALL_RKE2_TYPE="agent" INSTALL_RKE2_VERSION=v1.26.0+rke2r2 sudo -E sh -
sudo systemctl enable rke2-agent.service
sudo systemctl start rke2-agent.service
sudo journalctl -u rke2-agent -f
Install Cluster Network Addons Operator¶
The Cluster Network Addons Operator can be used to deploy additional networking components. Multus and Cilium are already installed via RKE2. The Open vSwitch CNI Plugin can be installed via this operator.
First install the operator itself:
kubectl apply -f https://github.com/kubevirt/cluster-network-addons-operator/releases/download/v0.85.0/namespace.yaml
kubectl apply -f https://github.com/kubevirt/cluster-network-addons-operator/releases/download/v0.85.0/network-addons-config.crd.yaml
kubectl apply -f https://github.com/kubevirt/cluster-network-addons-operator/releases/download/v0.85.0/operator.yaml
Then you need to create a configuration for the operator example CR:
kubectl apply -f https://github.com/kubevirt/cluster-network-addons-operator/releases/download/v0.85.0/network-addons-config-example.cr.yaml
Wait until the operator has finished the installation:
kubectl wait networkaddonsconfig cluster --for condition=Available
Kubevirt¶
Kubevirt is a Kubernetes add-on to run virtual machines.
Validate Hardware Virtualization Support¶
sudo apt install libvirt-clients
sudo virt-host-validate qemu
Install Kubevirt¶
Latest Release: export RELEASE=$(curl https://storage.googleapis.com/kubevirt-prow/release/kubevirt/kubevirt/stable.txt)
export RELEASE=v0.58.1
# Deploy the KubeVirt operator
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-operator.yaml
# Create the KubeVirt CR (instance deployment request) which triggers the actual installation
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${RELEASE}/kubevirt-cr.yaml
# wait until all KubeVirt components are up
kubectl -n kubevirt wait kv kubevirt --for condition=Available
Install Containerized Data Importer¶
export CDI_VERSION=v1.55.2
kubectl create ns cdi
kubectl -n cdi apply -f https://github.com/kubevirt/containerized-data-importer/releases/download/$CDI_VERSION/cdi-operator.yaml
kubectl -n cdi apply -f https://github.com/kubevirt/containerized-data-importer/releases/download/$CDI_VERSION/cdi-cr.yaml
Install virtctl via Krew¶
First install Krew and then install virtctl via Krew
(
set -x; cd "$(mktemp -d)" &&
OS="$(uname | tr '[:upper:]' '[:lower:]')" &&
ARCH="$(uname -m | sed -e 's/x86_64/amd64/' -e 's/\(arm\)\(64\)\?.*/\1\2/' -e 's/aarch64$/arm64/')" &&
KREW="krew-${OS}_${ARCH}" &&
curl -fsSLO "https://github.com/kubernetes-sigs/krew/releases/latest/download/${KREW}.tar.gz" &&
tar zxvf "${KREW}.tar.gz" &&
./"${KREW}" install krew
)
echo 'export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
kubectl krew install virt
kubectl virt help
You are ready to go!
You're now ready to use the cluster for development or testing purposes.
MetalLB¶
Optionally, you can install MetalLB, but currently it is not required for using the LTB Operator. MetalLB is a load-balancer implementation for bare metal Kubernetes clusters.
Install Operator Lifecycle Manager (OLM)¶
Install Operator Lifecycle Manager (OLM), a tool to help manage the operators running on your cluster.
curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.24.0/install.sh | bash -s v0.24.0
Install the operator by running the following command:
kubectl create -f https://operatorhub.io/install/metallb-operator.yaml
This operator will be installed in the "operators" namespace and will be usable from all namespaces in the cluster.
After install, watch your operator come up using next command.
kubectl get csv -n operators
Now create a MetalLB IPAddressPool CR to configure the IP address range that MetalLB will use:
sudo vim metallb-ipaddresspool.yaml
# metallb-ipaddresspool.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: default
namespace: operators
spec:
addresses:
- X.X.X.X/XX
Create a L2Advertisement to tell MetalLB to responde to ARP requests for all IP address pools (no named ip address pool, means all pools):
sudo vim l2advertisment.yaml
# l2advertisment.yaml
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: default
namespace: operators
spec:
ipAddressPools:
- default
Apply the configuration:
kubectl apply -f metallb-ipaddresspool.yaml
kubectl apply -f l2advertisment.yaml
Storage¶
To store your virtual machine images and disks, you may want to use a storage backend. Currently no storage backend has been tested with the LTB Operator, but you can try to use Trident. Trident is a dynamic storage provisioner for Kubernetes, and it supports many storage backends, including NetApp, AWS, Azure, Google Cloud, and many more.
Following you will find some instructions that may help you to install Trident on your cluster.
You always can find more information about Trident in the official documentation.
Check connectivity to NetApp Storage:
kubectl run -i --tty ping --image=busybox --restart=Never --rm -- \
ping <NetApp Management IP>
Download and extract the Trident installer:
export TRIDENT_VERSION=23.01.0
wget https://github.com/NetApp/trident/releases/download/v$TRIDENT_VERSION/trident-installer-$TRIDENT_VERSION.tar.gz
tar -xf trident-installer-$TRIDENT_VERSION.tar.gz
cd trident-installer
mkdir setup
vim ./setup/backend.json
Configure the installer:
# ./backend.json
{
"version": 1,
"storageDriverName": "ontap-nas",
"managementLIF": "<NetApp Management IP>",
"dataLIF": "<NetApp Data IP>",
"svm": "svm_k8s",
"username": "admin",
"password": "<NetApp Password>",
"storagePrefix": "trident_",
"nfsMountOptions": "-o nfsvers=4.1 -o mountport=2049 -o nolock",
"debug": true
}
Install Trident:
./tridentctl install -n trident -f ./setup/backend.json
Check the installation:
kubectl get pods -n trident
Local development cluster¶
K3d, Minikube or Kind can be used to run a local Kubernetes cluster, if you don't have access to a remote cluster/server.
Make sure to install the following tools:
KubeVirt may not work properly on local development clusters
KubeVirt may not work properly on local development clusters, because it requires nested virtualization support, which is not available on all local development clusters. Make sure to enable nested virtualization on your local machine, if you want to run KubeVirt on a local development cluster.
Tools and Frameworks¶
The tools and frameworks used in the project are listed below.
Go-based Operator-SDK framework¶
To create the LTB Operator, we used the Go-based Operator-SDK framework. It provides a set of tools to simplify the process of building, testing and packaging our operator.
Kubevirt¶
Kubevirt is a tool that provides a virtual machine management layer on top of Kubernetes. It allows us to deploy virtual machines on Kubernetes.
Kubernetes¶
We use Kubernetes as the container orchestration platform for the LTB application.
Multus CNI¶
To create multiple network interfaces for the pods, Multus CNI is used.
Test Concept¶
This document outlines the approaches, methodologies, and types of tests that ensure that the LTB Operator components are functioning as expected.
Test categories¶
The tests primarily focus on Functionality and Logic. Security and Performance tests should be added in the future as the project matures.
Tools¶
The following tools are used to test the LTB Operator, you can find more information why they were chosen in the Testing Framework Decision:
- Testing: The default Go testing library that provides support for automated testing of Go packages. It is intended to be used together with the "go test" command, which automates execution of any function of the form
func TestXxx(*testing.T)
, where Xxx does not start with a lowercase letter. - Ginkgo: a Go testing framework for Go to help you write expressive, readable, and maintainable tests. It is best used with the Gomega matcher library.
- Gomega: a Go matcher library that provides a set of matchers to perform assertions in tests. It is best used with the Ginkgo testing framework.
Strategies: Test Approach¶
We focused on unit tests for the LTB Operator, as they were easy to implement and provide a good coverage of the code. At a later stage of the project, we could add integration tests to ensure that the LTB Operator works as expected with other components like the LTB API.
We aspire to achieve at least 90% test coverage, to increase the maintainability and stability of the LTB Operator.
Unit Tests¶
Unit rely on the Fake Client package from the controller-runtime library to create a fake client that can be used to mock interactions with the Kubernetes API. This allows us to test functions that interact with the Kubernetes API without mocking the complete API or using a real Kubernetes cluster.
Integration Tests¶
The integration tests could be implemented by using the EnvTest package from the controller-runtime library.
Continuos Integration¶
We use GitHub Actions as our CI/CD tool. Currently, we have two workflows:
Deploy docs¶
This workflow is used to deploy the mkdocs documentation to GitHub Pages.
It is triggered on every push to the main
branch that affects the documentation. Specifically, it is triggered when a file in the docs
directory or the mkdocs.yaml
configuration file has changed.
Check the deploy-docs-ci.yaml action for more details.
Operator CI¶
This workflow is used to test, build and push the LTB Operator image, CatalogSource and Bundle to the GitHub Container Registry.
It is triggered for every push to a pull request and for every push to the main
branch.
You can find this action here: operator-ci.yaml
Test¶
The test step runs the unit tests of the LTB Operator and fails the pipeline if any test fails. Additionally, a coverage report is generated and uploaded to Codecov. Pull requests are checked for the code coverage and will block the merge if the coverage drops below 80%. This is ensured by the Codecov GitHub integration and defined in the codecov.yml file.
Build¶
We use the GitHub actions provided by Docker to build and push the LTB Operator image to the GitHub Container Registry. Additionally, we use cosign to sign the images, so that users can verify the authenticity of the image.
Additional deployment artifacts¶
To be able to deploy the LTB Operator with the Operator Lifecycle Manager, a Bundle and a CatalogSource must be created. These artifacts are created with the Operator-SDK, to simplify the pipeline these tasks have been exported to the Makefile
Ended: Contributions
Architecture ↵
Kubernetes Lab Topology Builder Architecture¶
The main components of the Kubernetes based LTB are:
The following diagram shows how the components interact with each other:
Frontend¶
The frontend can be implemented in any language and framework, it just needs to be able to communicate via an HTTP API with the LTB API. The frontend is responsible for the following tasks:
- Providing a web UI for the user to interact with the labs.
- Providing a web UI for the admin to manage:
- Lab templates
- Lab deployments
- Reservations
There is a possibility to reuse parts of the existing frontend from the KVM/Docker-based LTB.
API¶
The API is responsible for the following tasks:
- Create, update and delete LTB resources (node types, lab templates, lab instances)
- Expose status of lab instances
- Expose information on how to access the deployed lab nodes
- Authentication via an external authentication provider
No parts from the existing KVM/Docker-based LTB can be reused for the API.
Authentication and Authorization¶
The authentication can be implemented by using an external authentication provider like Keycloak. Keycloak can be configured to act as an authentication broker with external identity providers like LDAP, OpenID Connect, SAML, etc. This has the benefit that the LTB does not need to implement any authentication logic and can focus on the lab deployment. Additionally, it enables the LTB to be integrated into an existing authentication infrastructures, with the benefit that users do not need to create a new account. On the other hand, it has the drawback that the LTB needs an external authentication provider to work and that the users access rights would need to be managed in Keycloak.
Authorization can also be implemented using Keycloak and its Authorization Services.
Operator¶
The operator is responsible for the following tasks:
- Deploy and destroy the containers and vms
- Check validity of LTB resources (node types, lab templates, lab instances)
- Enable the user to access the deployed containers and vms via different protocols
- Provide remote access to the lab node console via a web terminal
- Manage reservations (create, delete, etc.)
- Provide remote Wireshark capture capabilities
The operator is implemented according to the Kubernetes operator pattern. It has multiple controllers that are responsible for managing a particular custom resource like lab template.
Network connectivity between lab nodes¶
The network connectivity between lab nodes can be implemented with Multus, which is a "meta-plugin" that enables attaching multiple CNI plugins to a kubernetes pod/vm. Multus uses NetworkAttachmentDefinitions (NAD) to describe, which CNI plugin should be used and how it should be configured.
Currently, we use a linux bridge as a secondary CNI plugin, with the drawback that the links between the lab nodes are not pure layer 2 links, but layer 3 links. Additionally, the connection between the lab nodes only work on the same Kubernetes host, because the linux bridge does not implement any kind of cross-host networking.
Remote access to lab nodes¶
Remote access to the lab nodes has two variants:
- Console access via a web terminal
- Access to configurable ports with any OOB management protocol
Console access via a web terminal¶
The console access via a web terminal is implemented with kube-ttyd, which is a tool based on ttyd, with the addition to use kubectl exec and virsh console to connect to the lab nodes.
kube-ttyd
was provided by Yannick Zwicker from the INS specifically for this project.
Access to the web terminal is routed through an NGINX ingress controller, and a Kubernetes service of type ClusterIP
.
The authentication feature of the NGINX ingress controller can be used to restrict access to the web terminal to authenticated users. It might be possible to use the same authentication provider as the LTB API, but this needs to be tested.
Access to configurable ports with any OOB management protocol¶
Access to lab nodes via freely choosable OOB management protocols is implemented by providing a Kubernetes service of type LoadBalancer
for each lab node, which is configured to expose the ports specified in the lab template.
Access control needs to be implemented by the lab node itself, because the Kubernetes service of type LoadBalancer
does not provide any authentication or authorization features.
An example for this would be to provide SSH keys for the lab nodes inside the lab template config field.
Scheduling lab instances and resource reservation¶
A feature to schedule the deployment and deletion of a lab instance to a specific time is not implemented, but could be implemented by adding additional fields (creationTimestamp, deletionTimestamp) to the lab instance's CRD. Then, the lab instance controller can examine these fields and proceed to deploy or delete the lab instance at the specified time. There are multiple ways to implement this: either by regularly checking the lab instance, or by requeuing the creation/deletion event of the lab instance to the specified time.
If there are any issues with the requeuing of these events over such a long period of time, writing a Kubernetes informer could be a solution.
Resource reservation in a capacity planning sense is not provided by Kubernetes. A manual solution could be implemented by using limit ranges, resource quotas and the Kubernetes node resources. Planned resource management is a huge topic, and we would recommend to create a dedicated project for this.
Comparison to the KVM/Docker-based LTB¶
The diagram below illustrates the components of the KVM/Docker-based LTB, highlighting the changes introduced by the Kubernetes LTB.
C4 Model¶
The following diagrams show the C4 model of the Kubernetes-based LTB, offering a high-level overview of the application's architecture.
System Context Diagram¶
Container Diagram¶
Component Diagram¶
Legend¶
- Dark blue: represents Personas (User, Admin)
- Blue: represents Internal Components (Frontend Web UI, LTB K8s Backend)
- Light blue: represents Components which will be implemented in this project (LTB Operator, LTB Operator API)
- Dark gray: represents External Components (K8s, Keycloak)
Preliminary Work KVM/Docker-based LTB Architecture¶
The Kubernetes based LTB was inspired by a previous implementation of the LTB, which was based on direct KVM and Docker usage.
The following diagram shows the components of the KVM/Docker-based LTB, the lines indicate communication between the components:
Currently the KVM/Docker-based LTB is composed of the following containers:
- Frontend built with React
- Backend built with Django
- Databases (PostgreSQL, Redis)
- Beat
- Celery
- Web-SSH
- Prometheus
- Traefik
- Nginx
Backend¶
The backend is accessible via API and an Admin Web UI. It is responsible for the following tasks:
- Parse the yaml topology files
- Deploy/destroy the containers and vms
- Expose status of lab deployments
- Expose information on how to access the deployed containers and vms
- Provide remote ssh capabilities
- Provide remote Wireshark capture capabilities
- Manage reservations (create, delete, etc.)
- Expose node resource usage
- User management
- Expose information about a device (version, groups, configuration, etc)
It is composed of the following components:
The orchestration component is responsible for creating different tasks using Celery and executing them on a remote host. There are 4 different types of tasks:
- DeploymentTask
- Deploys containers in docker
- Deploys VMs using KVM
- Creates connections between containers and VMs using an OVS bridge
- RemovalTask
- Removes a running lab
- MirrorInterfaceTask
- Creates a mirror interface on a connection
- SnapshotTask
- Takes a snapshot of a running lab
Reservations¶
The reservation component is responsible for reserving system resources in advance, comprising the following tasks:
- Create a reservation
- Delete a reservation
- Update a reservation
Running lab store¶
This component is responsible for storing information about running labs, such as:
- The devices taking part in the running lab, including the interfaces
- Connection information
Template store¶
This component is responsible for storing lab templates.
Authentication¶
This component is responsible for user authentication and management.
Databases¶
The following databases are used:
- PostgreSQL
- Redis for caching
The Databases are used by the following components:
- Backend
- Beat
- Celery
Beat¶
The beat component is responsible for scheduling periodic tasks. To be more precise, it is responsible for scheduling the deployment and deletion of labs, according to the reservation information.
Celery¶
Celery is used to execute the commands to create and delete the lab nodes and connections.
Web-SSH¶
Web-SSH is used to provide a web based ssh client, that can be used to access the deployed containers and vms aka lab nodes.
Prometheus¶
Prometheus is used to collect metrics about CPU, memory and disk usage of the hypervisor nodes.
Traefik¶
Traefik is used as a proxy for the following components:
- Frontend
- Web-SSH
- Prometheus
- Nginx
Nginx¶
Nginx is used as a reverse proxy for the backend.
Frontend¶
The frontend provides a web UI with the following features:
- User authentication
- Management of lab templates
- Management of reservations for labs
- Start/stop running labs
- Resource usage overview
- Provides information on how to access the deployed containers and vms
- Create wireshark capture interfaces
Ended: Architecture
Decisions ↵
Use Markdown Architectural Decision Records¶
Context and Problem Statement¶
We want to record design decisions made in this project. Which format and structure should these records follow?
Considered Options¶
- MADR 2.1.2 – The Markdown Architectural Decision Records
- Michael Nygard's template – The first incarnation of the term "ADR"
- Sustainable Architectural Decisions – The Y-Statements
- Other templates listed at https://github.com/joelparkerhenderson/architecture_decision_record
- Formless – No conventions for file format and structure
Decision Outcome¶
Chosen option: "MADR 2.1.2", because
- Implicit assumptions should be made explicit. Design documentation is important to enable people understanding the decisions later on. See also A rational design process: How and why to fake it.
- The MADR format is lean and fits our development style.
- The MADR structure is comprehensible and facilitates usage & maintenance.
- The MADR project is vivid.
- Version 2.1.2 is the latest one available when starting to document ADRs.
- A Visual Studio Code extension ADR Manager for MADR exits, which makes managing ADRs easy.
Operator-SDK¶
Context and Problem Statement¶
It's best practice to use an SDK to build operators for Kubernetes. The SDK provides a higher level of abstraction for creating Kubernetes operators, making it easier to write and manage operators. There are multiple SDKs available for building operators. We need a SDK that's flexible and easy to use and can be used with Go.
Considered Options¶
- Operator-SDK (Operator Framework)
- KubeBuilder
- Kopf
- KUDO
- Metacontroller
Decision Outcome¶
Chosen option: "Operator-SDK", because it provides a high level of abstraction for creating Kubernetes operators, making it easier to write and manage operators. Additionally, the Operator-SDK incorporates tools and libraries for building, testing and packaging operators, offering a user-friendly experience and is compatible with Go.
Links¶
Remote-Access¶
Context and Problem Statement¶
For the lab instances to be useful for the students, they need to be able to access the pods (containers) and VMs. Access to pods/VMs should only be granted to user with the appropriate access rights. It should be possible to access the console of the pods/VMs and employ various out-of-band (OOB) protocols such as SSH, RDP, VNC, and more.
Considered Options¶
- Kubernetes Service
- Gotty
- ttyd
Decision Outcome¶
Chosen option: "ttyd and Kubernetes Service", because ttyd can be used as a jump host to access the pods/VMs' console. A Kubernetes service can be used to allow access the pods/VMs via OOB protocols. Security for the console access will be easy to implement. Secure access for OOB protocols was considered, but needs to be researched further. Currently, it depends on the chosen OOB protocol and the security features it provides.
Operator Scope¶
Context and Problem Statement¶
The Operator could be a namespace-scoped or cluster-scoped.
Considered Options¶
- Namespace-scoped
- Cluster-scoped
Decision Outcome¶
Chosen option: "Cluster-scoped", because cluster-scoped operators enable you to manage namespaces or resources in the entire cluster. This is needed to ensure that each lab instance can be deployed within its own namespace. Cluster-scoped operators are also capable of managing infrastructure-level resources, such as nodes. Additionally, cluster-scoped operators provide greater visibility and control over the entire cluster.
Links¶
API and Operator Deployment¶
Context and Problem Statement¶
The operator and the API could be separated and deployed as two services/containers or they could be deployed as one service/container.
Considered Options¶
- One container
- Separate containers
Decision Outcome¶
Chosen option: "Separate containers", because it provides more flexibility and scalability. It also makes it easier to update the operator and the API separately. Additionally, it is easy to separate the API and the operator into two different services, as they talk to each other via the Kubernetes API.
Dev Container¶
Context and Problem Statement¶
Every team member could set up their development environment manually or we could use a dev container to provide a consistent development environment for all team members and future contributors.
Considered Options¶
- Dev Container
- Manual Setup
Decision Outcome¶
Chosen option: "Dev Container", because a dev container setup lets you create the same development environment for all team members to ensure consistency. It also provides a completely isolated development environment, which helps to avoid software incompatibility issues, such as Operator-SDK not working on Windows. Moreover, a dev container is easily portable and works on all operating systems that support Docker. The only downside is that not all IDEs support dev containers, but at least two of the currently most popular IDEs, namely VS Code and Visual Studio support dev containers.
Links¶
Replace KVM/Docker-based LTB Backend¶
Context and Problem Statement¶
The LTB K8s Backend could replace the KVM/Docker-based LTB Backend fully or partially by reusing parts, such as
Considered Options¶
- Replace KVM/Docker-based LTB Backend fully
- Replace KVM/Docker-based LTB Backend partially
Decision Outcome¶
Chosen option: "Replace KVM/Docker-based LTB Backend fully", because huge parts of the KVM/Docker-based LTB Backend would need to be rewritten to be compatible with the new LTB K8s operator and it would be easier to rewrite the whole backend. Additionally, the same programming language can be used throughout the whole Backend, being Go.
Lab Instance Set¶
Context and Problem Statement¶
One approach could involve creating a custom resource (CR) named LabInstanceSet
and specifying our desired quantity of LabInstances
to the operator. For instance, we could provide a single LabInstance
along with a generator, such as a list of names, to indicate that we want 10 LabInstances
. Alternatively, we could directly provide the LTB Operator with 10 separate LabInstances
to create the desired quantity of 10.
Considered Options¶
- With LabInstanceSet
- Without LabInstanceSet
Decision Outcome¶
Chosen option: "Without LabInstanceSet", because we currently don't see a need for it. This could change in the future, but for now we will not implement it.
Interaction with Operator¶
Context and Problem Statement¶
There are multiple ways to interact with the operator, such as a GitOps approach, using the frontend via the API or using kubectl
.
Considered Options¶
- GitOps
- Use frontend
- Use kubectl
Decision Outcome¶
Chosen option: "All considered options", because these options are not mutually exclusive and can be used together, therefore we want to support all of them.
Namespace Per LabInstance¶
Context and Problem Statement¶
Lab instances could be created in separate namespaces or one namespace for all lab instances.
Considered Options¶
- One Namespace for all lab instances
- Namespace per lab instance
Decision Outcome¶
Chosen option: "Namespace per lab instance", because it will be easier in the future to implement features like network policies and resource quotas and limits. This approach ensures easier management and isolation of each lab instance within its dedicated namespace.
K8s Aggreted API Over Standalone API¶
Context and Problem Statement¶
We want a declarative way of creating LTB labs inside Kubernetes using Kubernetes native pods and KubeVirt virtual machines.
We could either create a standalone API which interacts with the Kubernetes API and does not follow the Kubernetes API conventions. It would therefore not be compatible with Kubernetes tools, such as dashboards or kubectl
, but would allow complete control over the API design.
Or we could create an aggregated API which uses the Kubernetes aggregation layer to extend the Kubernetes API, which would allow us to use Kubernetes tools such as dashboards or kubectl
, but would limit the control over the API design.
Considered Options¶
- Standalone API
- Aggregated API
Decision Outcome¶
Chosen option: "Aggregated API", because our new types will be readable and writeable using kubectl and other Kubernetes tools, such as dashboards. We also can leverage Kuberbetes API support features this way. Additionally, our resources are scoped to a cluster or namespaces of a cluster. Finally, the Operator Pattern is simpler to implement this way.
Links¶
Extending LTB with New Node Types¶
Context and Problem Statement¶
It should be possible to create, update and delete node types (e.g. Ubuntu, XRD, XR, IOS, Cumulus, etc.). Node types should be used inside lab templates and expose a way to provide a node with configuration (cloud-init, zero-touch, etc.) The amount of available network interfaces is dynamic and depends on how many connections a node has according to a specific lab template.
Decision Drivers¶
- Certain operating systems' images like XR, and XRD need a specific interface configuration which depends on how many interfaces a certain node will receive.
- The chosen solution should support multiple versions of a type in an easy to use way (e.g. Ubuntu 22.04, 20.04, ...).
- For XRd images, interfaces need to have environment variables set for each interface they use, and the interface count needs to be dynamically set according to the lab template.
- For XR virtual machine images, the first interface is the management interface and then there are two empty interfaces that need a special configuration.
- For mount from config might be different
- Cumulus VX images need a privileged container
- XRd need additional privileges
Considered Options¶
- Custom Resources
- Go
Decision Outcome¶
Chosen option: "Custom Resources", because it will be possible to support all the cases mentioned in the decision drivers using Go templates and CRs. Implementing the types in Go does not seem to bring any major advantages, whereas using CRs will be easier for external users to extend the system with new node types.
Positive Consequences¶
- Easy to extend during runtime
- Easy to extend for external users
- All decision drivers will be supported
Negative Consequences¶
- Go templates are not as powerful as Go, which could make it harder to implement certain node types.
- A Custom Resource is also a little bit less flexible than a Go type, but this should not be a problem for the use cases we have.
Links¶
Testing Framework¶
Context and Problem Statement¶
Every project needs to be tested. There are multiple testing libraries and frameworks to test Go applications, which can be used in addition to the default Go testing library.
Considered Options¶
- Testify
- Ginkgo/Gomega
- GoSpec
- GoConvey
Decision Outcome¶
Chosen option: "Ginkgo/Gomega", because it is widely used in the Kubernetes community to test Kubernetes operators. It is also used in the Kubevirt project, that is used by the LTB Operator. Additionally, tests written with Ginkgo/Gomega are easy to read and understand.
Programming Language¶
Context and Problem Statement¶
We need to choose a programming language for the project. The considered options are based on the supported languages of the Operator-SDK.
Considered Options¶
- Go
- Helm
- Ansible
Decision Outcome¶
Chosen option: "Go", because many cloud native projects are written in Go, and it is a compiled language, which is more performant than interpreted languages. Also, Go is a statically typed language, making it easier to maintain and refactor the code. It is also easier to write complicated logic and tests in Go than in Helm or Ansible.
Ended: Decisions
Comparison to other similar projects¶
The Lab Topology Builder is just one among many open source projects available for building emulated network topologies. The aim of this project comparison is to provide a concise overview of the key features offered by some of the most well-known projects, assisting users in selecting the best suited solution for their use case.
vrnetlab - VR Network Lab¶
vrnetlab is a network emulator that runs virtual routers using KVM and Docker. It is similar to the KVM/Docker-based LTB, but is more simple and only provides the deployment functionality.
Containerlab¶
Containerlab is a tool to deploy network labs with virtual routers, firewalls, load balancers, and more, using Docker containers. It is based on vrnetlab and provides a declarative way to define the lab topology using a YAML file. Containerlab is not capable of deploying lab topologies over multiple host nodes, which is a key feature that the K8s-based LTB aims to provide in the future.
Netlab¶
Netlab is an abstraction layer to deploy network topologies based on containerlab or Vagrant. It provides a declarative way to define the lab topology using a YAML file. It mainly provides an abstracted way to define lab topologies with preconfigured lab nodes.
Kubernetes Network Emulator¶
Kubernetes Network Emulator is a network emulator that aims to provide a standard interface so that vendors can produce a standard container implementation of their network operating system that can be used in a network emulation environment. Currently, it does not seem to support many network operating systems and additional operators are required to support different vendors.
Mininet¶
Mininet is a network emulator that runs a collection of end-hosts, switches, routers, and links on a single Linux kernel. It is mainly used for testing SDN controllers and can not deploy a lab with a specific vendor image.
GNS3¶
GNS3 is a network emulator that can run network devices as virtual machines or Docker containers. It primarily focuses on providing an emulated network environment for a single user and its deployment and usage can be quite complex. Additionally, it does not provide a way to scale labs over multiple host nodes.
About¶
This project was created as part of a bachelor thesis at the Eastern Switzerland University of Applied Sciences in Rapperswil, in cooperation with the Institue for Network and Security.