Adding new nodes
Provisioning the new node
A new node can be provisioned using the following ways, based on the discovery mechanism used:
Using a mapping file:
Update the existing mapping file by appending the new entry (without the disrupting the older entries) or provide a new mapping file by pointing
pxe_mapping_file_pathinprovision_config.ymlto the new location.
Note
Any IP overlap between the mapping files will result in PXE boot failure. This can be resolved by running the Delete Node script or the Clean Up script. Re-run
discovery_provision.ymlonce the node has been deleted.Run
discovery_provision.ymlansible-playbook discovery_provision.yml
Manually PXE boot the target servers after the
discovery_provision.ymlplaybook (ifbmc_ipis not provided in the mapping file) is executed and the target node lists as booted in the nodeinfo table.
Using BMC method:
Update
discover_rangesunderbmc_networkininput/network_spec.ymlwith the desired range of IPs to be discovered. For more information, click here.Run
discovery_provision.ymlansible-playbook discovery_provision.yml
Using switch-based method:
Edit or append JSON list stored in
switch_based_detailsininput/provision_config.yml.
Note
All ports residing on the same switch should be listed in the same JSON list element.
Ports configured via Omnia should be not be removed from
switch_based_detailsininput/provision_config.yml.
Run
discovery_provision.ymlansible-playbook discovery_provision.yml
Manually PXE boot the target servers after the
discovery_provision.ymlplaybook is executed and the target node lists as booted in the nodeinfo table.
Verify that the node has been provisioned successfully by checking the Omnia nodeinfo table.
Adding new compute nodes to the cluster
Insert the new IPs in the existing inventory file following the below example.
Existing kubernetes inventory
[kube_control_plane]
10.5.0.101
[kube_node]
10.5.0.102
10.5.0.103
[auth_server]
10.5.0.101
[etcd]
10.5.0.110
Updated kubernetes inventory with the new node information
[kube_control_plane]
10.5.0.101
[kube_node]
10.5.0.102
10.5.0.103
10.5.0.105
10.5.0.106
[auth_server]
10.5.0.101
[etcd]
10.5.0.110
Existing Slurm inventory
[slurm_control_node]
10.5.0.101
[slurm_node]
10.5.0.102
10.5.0.103
[login]
10.5.0.104
[auth_server]
10.5.0.101
Updated Slurm inventory with the new node information
[slurm_control_node]
10.5.0.101
[slurm_node]
10.5.0.102
10.5.0.103
10.5.0.105
10.5.0.106
[login]
10.5.0.104
[auth_server]
10.5.0.101
In the above examples, nodes 10.5.0.105 and 10.5.0.106 have been added to the cluster as compute nodes.
Note
The
[etcd]group only supports an odd number of servers in the group. Adding nodes to[etcd]groups is not supported in re-run scenarios.Do not change the
kube_control_plane,slurm_control_nodeand/orauth_serverin the existing inventory file. Simply add the new node information in thekube_nodeand/orslurm_nodegroup.When re-running
omnia.ymlto add a new node, ensure that theinput/security_config.ymlandinput/omnia_config.ymlare not edited between runs.
Once the new node IPs have been provided in the inventory, you can install security tools (OpenLDAP, FreeIPA), job schedulers (Kubernetes, Slurm), and storage tools (NFS, BeeGFS) on the nodes by executing
omnia.ymlwith the updated inventory file:ansible-playbook omnia.yml -i inventory
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.