Intel Gaudi metrics collected by the Prometheus exporter

The supported Intel Gaudi metrics are:

Intel Gaudi metrics for Prometheus

Intel Gaudi Metrics

habanalabs_clock_soc_max_mhz

habanalabs_clock_soc_mhz

habanalabs_device_config

habanalabs_ecc_feature_mode

habanalabs_energy

habanalabs_kube_info

habanalabs_memory_free_bytes

habanalabs_memory_total_bytes

habanalabs_memory_used_bytes

habanalabs_pci_link_speed

habanalabs_pci_link_width

habanalabs_pcie_receive_throughput

habanalabs_pcie_replay_count

habanalabs_pcie_rx

habanalabs_pcie_transmit_throughput

habanalabs_pcie_tx

habanalabs_pending_rows_state

habanalabs_pending_rows_with_double_bit_ecc_errors

habanalabs_pending_rows_with_single_bit_ecc_errors

habanalabs_power_default_limit_mW

habanalabs_power_mW

habanalabs_temperature_onboard

habanalabs_temperature_onchip

habanalabs_temperature_threshold_gpu

habanalabs_temperature_threshold_memory

habanalabs_temperature_threshold_shutdown

habanalabs_temperature_threshold_slowdown

habanalabs_utilization

If you have any feedback about Omnia documentation, please reach out at omnia.readme@dell.com.