Kubernetes plugin for RoCE NIC
Caution
Kubernetes plugin for the RoCE NIC is only supported on the Ubuntu clusters (not on RHEL/Rocky Linux clusters).
Few important things to keep in mind before proceeding with the installation:
Defined network interfaces with their respective IP ranges (start and end) should be assigned.
Number of entries in the
input/roce_plugin_config.ymlshould be equal to number of RoCE interfaces available in the RoCE pod.VLAN NICs are not supported.
This playbook supports the deployment of up to 8 RoCE NIC interfaces.
In a scenario where there are two nodes with two separate NICs, the admin must ensure to use aliasing to make the NIC names similar before executing
deploy_roce_plugin.yml.Omnia does not validate any parameter entries in the
input/roce_plugin_config.yml. It is the user’s responsibility to provide correct inputs for the required parameters. In case of any errors encountered due to incorrect entries, delete and re-install the plugin with the correct inputs. For more information, click here.
Install the plugin
Prerequisites
Ensure Kubernetes is set up on the cluster with
flannelorcalicoas the input for thek8s_cniparameter present ininput/omnia_config.yml. For the complete list of parameters, click here.Ensure that the Broadcom RoCE drivers are installed on the nodes.
Ensure that additional NICs have been configured using the
server_spec_update.ymlplaybook. For more information on how to configure additional NICs, click here.Ensure that the
{"name": "roce_plugin"}entry is present in thesoftware_config.jsonand the same config has been used while executing thelocal_repo.ymlplaybook.Ensure to update the below mentioned parameters in
input/roce_plugin_config.yml:
Parameters |
Details |
|---|---|
name
Required |
This field captures the interface name of the RoCE NIC. Example: |
range
Required |
This field captures the IP range for the RoCE interface. The IPs within this range are assigned to the RoCE pod. This parameter is provided in the CIDR format, that is, Example: |
range_start
Optional |
This field specifies the starting IP address within the defined IP range. Example: |
range_end
Optional |
This specifies the ending IP address within the defined IP range. Example: |
gateway
Optional |
This field captures the gateway value to the RoCE NIC interface. Example: |
route
Optional |
Example: |
Here is an example of the input/roce_plugin_config.yml:
interfaces:
- name: eth1
range: 192.168.1.0/24
range_start:
range_end:
gateway: 192.168.1.1
route: 192.168.1.0/24
- name: eth2
range: 192.168.2.0/24
range_start:
range_end:
gateway:
route:
- name: eth3
range: 192.168.3.0/24
range_start:
range_end:
gateway:
route:
- name: eth4
range: 192.168.4.0/24
range_start:
range_end:
gateway:
route:
- name: eth5
range: 192.168.5.0/24
range_start:
range_end:
gateway:
route:
- name: eth6
range: 192.168.6.0/24
range_start:
range_end:
gateway:
route:
- name: eth7
range: 192.168.7.0/24
range_start:
range_end:
gateway:
route:
- name: eth8
range: 192.168.8.0/24
range_start:
range_end:
gateway:
route:
To install the plugin, run the deploy_roce_plugin.yml playbook
Run the playbook using the following command:
cd omnia/scheduler
ansible-playbook deploy_roce_plugin.yml -i inventory
Where the inventory should be the same as the one used to setup Kubernetes on the cluster.
Note
A config file named roce_plugin.json is located in omnia\input\config\ubuntu\<os version>\. This config file contains all the details about the Kubernetes plugin for the RoCE NIC. Here is an example of the config file:
{
"roce_plugin": {
"cluster": [
{
"package": "k8s-rdma-shared-dev-plugin",
"url": "https://github.com/Mellanox/k8s-rdma-shared-dev-plugin.git",
"type": "git",
"version": "v1.5.2"
},
{
"package": "ghcr.io/k8snetworkplumbingwg/multus-cni",
"tag": "v4.1.4-thick",
"type": "image"
},
{
"package": "ghcr.io/k8snetworkplumbingwg/whereabouts",
"tag": "v0.8.0",
"type": "image"
},
{
"package": "ghcr.io/mellanox/k8s-rdma-shared-dev-plugin",
"tag": "v1.5.2",
"type": "image"
},
{
"package": "docker.io/roman8rcm/roce-test",
"tag": "229.2.32.0",
"type": "image"
}
]
}
}
Caution
After running the deploy_roce_plugin.yml playbook, the RDMA pods will be in CrashLoopBackOff state and the RoCE pods will be in pending state. This is a known issue and the resolution can be found here.
Delete the plugin
To delete the plugin, run the delete_roce_plugin.yml playbook using the following command:
cd omnia/scheduler
ansible-playbook delete_roce_plugin.yml -i inventory
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.