Omnia: Everything at once!
Ansible playbook-based deployment of Slurm and Kubernetes on servers running an RPM-based Linux OS.
Omnia, derived from the Latin word for “all” or “everything”, serves as a deployment tool designed to transform servers equipped with RPM-based Linux images into fully operational Slurm/Kubernetes clusters.
Omnia is an open source project hosted on GitHub. Go to GitHub to view the source, open issues, ask questions, and participate in the project.
Licensing
Omnia is made available under the Apache 2.0 license.
Note
Omnia playbooks are licensed under the Apache 2.0 license. Once an end-user initiates Omnia, that end-user will deploy other open-source and/or third-party software that is licensed separately by their respective developer communities and/or third parties. For a comprehensive list of software and their licenses, click here. Dell (or any other contributors) shall have no liability regarding (and no responsibility to provide support for) an end-users use of any open- source and/or third-party software and OMNIA users are solely responsible for ensuring that they are complying with all such licenses. Omnia is provided “as is” without any warranty, express or implied. Dell (or any other contributors) shall have no liability for any direct, indirect, incidental, punitive, special, or consequential damages for an end-user’s use of Omnia.
For a better understanding of what Omnia does, check out our docs!
Omnia Community Members
Table Of Contents
- Omnia: Overview
- Upgrade Omnia
- Quick Installation Guide
- Running prereq.sh
- Local repositories for the cluster
- Installing the provision tool
- Creating node inventory
- Configuring the cluster
- Input parameters for the cluster
- Before you build clusters
- Building clusters
- Install Kubernetes
- Kubernetes plugin for RoCE NIC
- Install Slurm
- Configuring UCX and OpenMPI on the cluster
- Centralized authentication on the cluster
- Granting Kubernetes access
- BeeGFS bolt on
- NFS
- Install the ROCm platform for AMD GPUs
- Installing AI tools
- Adding new nodes
- Re-provisioning the cluster
- Configuring switches
- Configuring PowerVault
- Running HPC benchmarks on omnia clusters
- Download custom packages/images to the cluster
- Remove Slurm/K8s configuration from a node
- Soft reset the cluster
- Delete provisioned node
- Uninstalling the provision tool
- Features
- Omnia Logs
- Troubleshooting
- Known issues
- Frequently asked questions
- Troubleshooting guide
- Troubleshooting Kubeadm
- Connecting to internal databases
- Checking and updating encrypted parameters
- Checking pod status on the control plane
- Using telemetry information to diagnose node issues
- Troubleshooting image download failures while executing local_repo.yml playbook
- Troubleshooting task failures during omnia.yml playbook execution
- Security Configuration Guide
- Sample Files
- inventory file
- software_config.json for Ubuntu
- software_config.json for RHEL/Rocky Linux
- inventory file for IP rule assignment
- inventory file for additional NIC configuration
- inventory file to delete node from the cluster
- pxe_mapping_file.csv
- switch_inventory
- powervault_inventory
- NFS Server inventory file
- Inventory for iDRAC telemetry
- Limitations
- Best Practices
- Contributing To Omnia
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.