MPI-Operator configuration for DeepSpeed deployment
While deploying Kubernetes on a cluster, Omnia sets the mpi-operator API version to v2beta1. But if you choose to deploy Kubeflow on that same Kubernetes cluster, the mpi-operator API version automatically changes to v1.
In order to configure Kubeflow with mpi-operator API version v2beta1, execute the following command:
cd tools
ansible-playbook configure_mpi_operator.yml -i <kubeflow inventory> --tags mpiv2beta1
Expected result: The mpi-operator API version v1 and the training operator of Kubeflow is uninstalled. The mpi-operator API version v2beta1 is installed.
[Optional] Revert back to the default configuration
If you want to revert back to the default configuration, execute the following commands step-by-step:
Step 1:
kubectl delete -f <DeepSpeed_configuration_filename>.yml
where <DeepSpeed_configuration_filename>.yml is the YAML configuration file applied to deploy the DeepSpeed MPIJob.
Step 2:
kubectl delete -f <PVC_filename>.yml
where <PVC_filename>.yml is the PVC configuration file applied to deploy the DeepSpeed MPIJob.
Step 3:
kubectl delete ns workloads
Step 4:
cd tools ansible-playbook configure_mpi_operator.yml -i <kubeflow inventory> --tags mpiv1
Expected result:
In the process, the following actions are performed:
The YAML configuration file used to deploy the DeepSpeed MPIJob is deleted.
The PVC configuration file is deleted.
The namespace for DeepSpeed jobs is deleted.
The mpi-operator API version v2beta1 is uninstalled.
The mpi-operator API version v1 is installed.
The training operator of Kubeflow is also installed.
If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.