How to Check Node CPU Utilization in OpenShift
Monitoring CPU utilization at the node level is one of the most important operational tasks in any OpenShift cluster. Whether you're troubleshooting performance bottlenecks, planning capacity, or simply keeping an eye on cluster health, knowing where to look — and what the numbers actually mean — makes a significant difference in how effectively you manage your infrastructure.
What "Node CPU Utilization" Actually Means in OpenShift
In OpenShift (built on Kubernetes), a node is a physical or virtual machine that runs workloads as pods. Each node has a fixed amount of CPU capacity, measured in cores or millicores (1 core = 1000 millicores).
CPU utilization refers to how much of that capacity is actively being consumed at any given moment. This is distinct from CPU requests and limits — which are resource reservations defined in pod specs — and from actual real-time usage, which can fluctuate significantly depending on workload patterns.
OpenShift surfaces CPU data through several layers: the underlying Linux kernel metrics, the cAdvisor agent embedded in the kubelet, Prometheus (the built-in metrics engine), and the OpenShift web console's monitoring dashboards.
Method 1: Using the oc Command-Line Interface
The fastest way to get a snapshot of node CPU usage is through the oc CLI with the adm top command:
oc adm top nodes This returns output like:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% node-1 520m 13% 4200Mi 54% node-2 1200m 30% 7100Mi 89% node-3 85m 2% 1800Mi 23% CPU(cores) shows actual current consumption in millicores. CPU% shows that consumption as a percentage of the node's total allocatable CPU.
This command relies on the Metrics Server being available in your cluster. If it isn't deployed or isn't functioning, you'll receive an error instead of data.
To drill into a specific node's resource usage by pod:
oc adm top pods --all-namespaces --sort-by=cpu This helps you identify which pods on a given node are driving the CPU load.
Method 2: OpenShift Web Console Monitoring Dashboards
The OpenShift web console includes built-in monitoring dashboards powered by Prometheus and Grafana-style visualizations. To access node CPU data:
- Navigate to Observe → Dashboards in the Administrator view
- Select the "Node Exporter / USE Method / Node" or "Kubernetes / Compute Resources / Node" dashboard
- Filter by node name to isolate the metrics you need
These dashboards show historical CPU utilization trends, not just point-in-time snapshots. This matters when diagnosing whether a spike is persistent or transient. 📊
Method 3: Querying Prometheus Directly
For more granular or custom analysis, OpenShift exposes a Prometheus endpoint that you can query using PromQL (Prometheus Query Language).
Access it via Observe → Metrics in the console, or use the Prometheus route if you have direct access.
Useful PromQL queries for node CPU:
| Query | What It Shows |
|---|---|
instance:node_cpu_utilisation:rate5m | 5-minute CPU utilization rate per node |
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) | Percentage of CPU time not idle |
sum by (node) (kube_pod_container_resource_requests{resource="cpu"}) | Total CPU requests per node |
node_load1 | 1-minute load average per node |
The distinction between requested CPU and actual CPU consumption is particularly important here. A node can show low utilization despite high CPU requests if pods are underutilizing their reservations — or conversely, be under CPU pressure even with modest requests if workloads burst aggressively.
Method 4: Node Describe and Events
For context around CPU pressure events rather than raw metrics, the oc describe node command is useful:
oc describe node <node-name> This surfaces Conditions (including MemoryPressure, DiskPressure, and PIDPressure — though not a direct CPU pressure flag), Allocated resources, and Events that may indicate throttling or eviction activity related to resource contention.
CPU throttling at the container level is tracked separately via the metric container_cpu_cfs_throttled_seconds_total, queryable in Prometheus.
Key Variables That Affect What You're Seeing 🔍
Several factors shape how you should interpret CPU utilization data in OpenShift:
- Cluster version: OpenShift 4.x clusters have deeper native Prometheus integration than older 3.x deployments, affecting what metrics are available by default
- Node type: Worker nodes, control plane nodes (masters), and infrastructure nodes have different baseline CPU consumption profiles
- Workload type: Batch jobs, web services, and ML workloads all exhibit very different CPU usage patterns
- CPU limits enforcement: Containers with strict CPU limits may be throttled even when node-level utilization appears low
- Overcommit ratio: If your cluster is configured to allow CPU overcommitment, node-level utilization percentages require more careful interpretation
- Monitoring stack health: If Prometheus scrapers or the Metrics Server are lagging or misconfigured, the numbers you see may be stale or incomplete
The Spectrum of Setups You Might Be Running
A developer running a small OpenShift Local (formerly CodeReady Containers) cluster on a laptop will interact with these tools very differently than an SRE managing a multi-zone production cluster with hundreds of nodes. On the small end, oc adm top nodes may be entirely sufficient. On the larger end, teams typically build custom alerting rules in Prometheus, set HorizontalPodAutoscaler thresholds tied to CPU metrics, and integrate with external observability platforms.
The method that's most useful — and the threshold at which CPU utilization becomes "a problem" — depends heavily on your node sizing, workload SLAs, scheduling policies, and whether you're running on bare metal, on-premises VMs, or a managed cloud provider's infrastructure. ⚙️
What your numbers actually mean for your cluster is the piece only your specific environment can answer.