Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[monitoring] Add KubeVirt VM dashboards#2198

Open
mattia-eleuteri wants to merge 1 commit intocozystack:maincozystack/cozystack:mainfrom
mattia-eleuteri:monitoring/vm-dashboardsmattia-eleuteri/cozystack:monitoring/vm-dashboardsCopy head branch name to clipboard
Open

[monitoring] Add KubeVirt VM dashboards#2198
mattia-eleuteri wants to merge 1 commit intocozystack:maincozystack/cozystack:mainfrom
mattia-eleuteri:monitoring/vm-dashboardsmattia-eleuteri/cozystack:monitoring/vm-dashboardsCopy head branch name to clipboard

Conversation

@mattia-eleuteri
Copy link
Contributor

@mattia-eleuteri mattia-eleuteri commented Mar 10, 2026

Summary

  • Adds dashboards/vm/virtual-machines.json — KubeVirt VM resource metrics (CPU, memory, network, storage) with namespace and VM template variables
  • Adds dashboards/vm/console-logs.json — VM serial console log viewer via VictoriaLogs with namespace filtering
  • Adds vm/virtual-machines and vm/console-logs entries to dashboards.list

Problem

dashboards.list references vm/virtual-machines and vm/console-logs but the corresponding JSON files don't exist.

The console-logs dashboard includes kubernetes_namespace_name filtering to prevent cross-tenant log exposure.

Relates to #2194

Test plan

  • Verify JSON is valid for both dashboards
  • Import dashboards in Grafana and verify panels render with KubeVirt metrics
  • Verify console-logs queries include namespace filter
Add KubeVirt VM dashboards (virtual-machines and console-logs)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • VM Console Logs dashboard: view live Kubernetes VM/container console output with namespace and VM filtering, time range and fast refresh, log formatting and details.
    • Virtual Machines dashboard: comprehensive per-VM monitoring (CPU, memory, network, disk I/O, latency, IOPS, swap/page faults) with templated namespace/VM selection, overview table, time series, thresholds and color-coded visualizations.
    • Monitoring index updated to include the new VM and additional dashboards for discovery.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement New feature or request labels Mar 10, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Adds two new Grafana dashboards for VM monitoring: a "VM Console Logs" logs panel and a comprehensive "Virtual Machines" Prometheus-backed dashboard; registers both in the monitoring dashboards list.

Changes

Cohort / File(s) Summary
VM Dashboards
dashboards/vm/console-logs.json, dashboards/vm/virtual-machines.json
Adds two Grafana JSON dashboards. console-logs.json is a logs panel using victoriametrics-logs-datasource with templated namespace/VM variables. virtual-machines.json is a large Prometheus-backed dashboard (templating, multi-select vars, many panels) covering CPU, memory, network, disk I/O, per-drive metrics, and extensive transformations and field configs.
Dashboard Registry
packages/system/monitoring/dashboards.list
Adds multiple new dashboard entries including vm/virtual-machines and vm/console-logs (plus other dashboard registrations).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through JSON fields today,
I stitched logs, metrics, charts in play.
CPU ticks and disk-heart beams,
I wove them into VM dreams.
Hooray — let dashboards nibble away! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[monitoring] Add KubeVirt VM dashboards' directly and clearly describes the main change: adding two new Grafana dashboards for KubeVirt VM monitoring and console logs.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
dashboards/vm/virtual-machines.json (1)

120-145: Clarify whether this is CPU % or CPU cores.

These panels plot raw rate(..._seconds_total) values, so they scale with the number of busy vCPUs. Rendering them as percentunit makes the result ambiguous for multi-vCPU guests. Either normalize by vCPU count before formatting as %, or rename/unit the panels as cores.

Also applies to: 375-414

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dashboards/vm/virtual-machines.json` around lines 120 - 145, The "Average
CPU" stat panel is using avg(rate(kubevirt_vmi_cpu_usage_seconds_total...)) but
is labeled/unit'd as "percentunit", which is incorrect for raw CPU-seconds (it
yields cores * time); update the panel to either (A) normalize the expression by
the vCPU count metric (e.g., divide by a vCPU count metric such as
kubevirt_vmi_vcpu_count or another available label) and keep the unit as
percentunit if you convert to usage fraction*100, or (B) change the unit from
"percentunit" to a core-based unit and rename the "Average CPU" title to
"Average vCPUs" (and apply the same change to the other similar panels
referenced at lines ~375-414) so the legend/unit matches the raw avg(rate(...))
output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboards/vm/console-logs.json`:
- Around line 21-30: The "namespace" textbox field currently named "namespace"
with type "textbox" and default/current value "*" allows users to override
tenant scope and causes cluster-wide log queries; fix by removing or converting
this field to a non-editable fixed value (e.g., change type from "textbox" to
"const"/hidden and set a tenant-specific namespace value) or remove it and
enforce namespace via the data source or RBAC so the query referenced (the
namespace filter used in the panel query) cannot be set to "*" by users; update
the "namespace" field definition accordingly and ensure any panel queries use
the enforced namespace variable rather than the editable textbox.

In `@dashboards/vm/virtual-machines.json`:
- Around line 194-220: The panel queries (refId net_rx, net_tx, iops_r, iops_w)
produce one series per interface/drive and must be aggregated before the
downstream table merge: wrap each rate(...) expression with sum by (name)(...)
so the PromQL becomes sum by
(name)(rate(kubevirt_vmi_network_receive_bytes_total{...}[$__rate_interval]))
(and similarly for net_tx and for iops_r/iops_w while keeping the
drive!="cloudinitdisk" filter), leaving legendFormat, refId and instant settings
unchanged.

---

Nitpick comments:
In `@dashboards/vm/virtual-machines.json`:
- Around line 120-145: The "Average CPU" stat panel is using
avg(rate(kubevirt_vmi_cpu_usage_seconds_total...)) but is labeled/unit'd as
"percentunit", which is incorrect for raw CPU-seconds (it yields cores * time);
update the panel to either (A) normalize the expression by the vCPU count metric
(e.g., divide by a vCPU count metric such as kubevirt_vmi_vcpu_count or another
available label) and keep the unit as percentunit if you convert to usage
fraction*100, or (B) change the unit from "percentunit" to a core-based unit and
rename the "Average CPU" title to "Average vCPUs" (and apply the same change to
the other similar panels referenced at lines ~375-414) so the legend/unit
matches the raw avg(rate(...)) output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a55ed75d-9122-4948-bb8c-1deedc257c76

📥 Commits

Reviewing files that changed from the base of the PR and between a13481b and 96cd231.

📒 Files selected for processing (3)
  • dashboards/vm/console-logs.json
  • dashboards/vm/virtual-machines.json
  • packages/system/monitoring/dashboards.list

dashboards/vm/console-logs.json Show resolved Hide resolved
dashboards/vm/virtual-machines.json Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
dashboards/vm/console-logs.json (1)

29-40: ⚠️ Potential issue | 🟠 Major

Namespace filtering still isn't an isolation boundary.

The textbox wildcard is gone, but namespace is still user-selectable via label_values(..., namespace). The filter on Line 74 only constrains the query to the chosen namespace; it does not stop a user from switching to another namespace that the datasource can see. If tenant isolation is part of this dashboard's contract, scope still needs to be enforced outside this variable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dashboards/vm/console-logs.json` around lines 29 - 40, The "namespace"
dashboard variable (query "label_values(kubevirt_vmi_memory_available_bytes,
namespace)") is still user-selectable and thus does not enforce tenant
isolation; replace this dynamic label_values source with a tenant-scoped
constant (or remove the variable entirely) and make all panels use the injected
tenant namespace instead of the "namespace" variable. Concretely: stop
populating "namespace" from label_values(kubevirt_vmi_memory_available_bytes,
namespace), either hardcode the tenant namespace or bind it to a secure injected
variable like ${tenant_namespace}, and update every panel/filter that currently
references the "namespace" variable (the panel filter mentioned around the
existing query on Line 74) to use the tenant-scoped value so users cannot switch
to other namespaces. Ensure datasource permission-based scoping or external
enforcement is used if per-request injection is required.
🧹 Nitpick comments (1)
dashboards/vm/console-logs.json (1)

29-57: Use a selector source that survives VM shutdown.

Lines 36 and 50 drive both dropdowns from kubevirt_vmi_memory_available_bytes, so a VM disappears from the selector as soon as it stops exporting that metric even if its recent console logs still exist. For a log dashboard, it's safer to back these variables with a source that outlives the VM instead of a live memory series.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dashboards/vm/console-logs.json` around lines 29 - 57, The namespace and vm
template variables ("namespace" and "vm") use the ephemeral metric
kubevirt_vmi_memory_available_bytes so VMs drop out when that metric stops;
change both queries to use a stable selector such as kubevirt_vmi_info or
kubevirt_vmi_labels (e.g. label_values(kubevirt_vmi_info, namespace) and
label_values(kubevirt_vmi_info{namespace="$namespace"}, name)) instead of
kubevirt_vmi_memory_available_bytes so selectors persist after VM shutdown.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboards/vm/virtual-machines.json`:
- Around line 577-655: The network charts use per-interface Prometheus metrics
(e.g., kubevirt_vmi_network_receive_bytes_total,
kubevirt_vmi_network_transmit_bytes_total,
kubevirt_vmi_network_receive_packets_total,
kubevirt_vmi_network_transmit_packets_total,
kubevirt_vmi_network_receive_errors_total,
kubevirt_vmi_network_transmit_errors_total,
kubevirt_vmi_network_receive_packets_dropped_total,
kubevirt_vmi_network_transmit_packets_dropped_total) but their legendFormat only
shows {{ name }}, collapsing multiple NIC series; either aggregate each query
(wrap with sum by (name)(...)) to show per-VM totals, or preserve per-interface
visibility by updating the legendFormat for all affected targets (Inbound
Throughput, Outbound Throughput (TX), Packets/s, Errors & Drops) to include the
interface label (e.g., change "{{ name }}" to "{{ name }} {{ interface }}" or
"{{ name }} {{ interface }} RX/TX" as appropriate) and apply the same change to
the errors/drops targets.

---

Duplicate comments:
In `@dashboards/vm/console-logs.json`:
- Around line 29-40: The "namespace" dashboard variable (query
"label_values(kubevirt_vmi_memory_available_bytes, namespace)") is still
user-selectable and thus does not enforce tenant isolation; replace this dynamic
label_values source with a tenant-scoped constant (or remove the variable
entirely) and make all panels use the injected tenant namespace instead of the
"namespace" variable. Concretely: stop populating "namespace" from
label_values(kubevirt_vmi_memory_available_bytes, namespace), either hardcode
the tenant namespace or bind it to a secure injected variable like
${tenant_namespace}, and update every panel/filter that currently references the
"namespace" variable (the panel filter mentioned around the existing query on
Line 74) to use the tenant-scoped value so users cannot switch to other
namespaces. Ensure datasource permission-based scoping or external enforcement
is used if per-request injection is required.

---

Nitpick comments:
In `@dashboards/vm/console-logs.json`:
- Around line 29-57: The namespace and vm template variables ("namespace" and
"vm") use the ephemeral metric kubevirt_vmi_memory_available_bytes so VMs drop
out when that metric stops; change both queries to use a stable selector such as
kubevirt_vmi_info or kubevirt_vmi_labels (e.g. label_values(kubevirt_vmi_info,
namespace) and label_values(kubevirt_vmi_info{namespace="$namespace"}, name))
instead of kubevirt_vmi_memory_available_bytes so selectors persist after VM
shutdown.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 74986ee1-b4b3-4f5d-bfca-4b09402aab36

📥 Commits

Reviewing files that changed from the base of the PR and between 96cd231 and 60d258b.

📒 Files selected for processing (2)
  • dashboards/vm/console-logs.json
  • dashboards/vm/virtual-machines.json

dashboards/vm/virtual-machines.json Show resolved Hide resolved
@mattia-eleuteri mattia-eleuteri force-pushed the monitoring/vm-dashboards branch from 60d258b to bcf42b3 Compare March 10, 2026 15:57
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
dashboards/vm/virtual-machines.json (1)

577-655: ⚠️ Potential issue | 🟡 Minor

Network legends still collapse multi-NIC series.

These targets are still per-interface, but the legends only show the VM and direction. On multi-NIC VMs Grafana will plot multiple indistinguishable lines. Either aggregate to per-VM totals with sum by (name) or include {{ interface }} in every RX/TX/error/drop legend.

Example fix if you want to keep per-interface visibility
-          "legendFormat": "{{ name }}"
+          "legendFormat": "{{ name }} ({{ interface }})"

-          "legendFormat": "{{ name }}"
+          "legendFormat": "{{ name }} ({{ interface }})"

-          "legendFormat": "{{ name }} RX"
+          "legendFormat": "{{ name }} RX ({{ interface }})"

-          "legendFormat": "{{ name }} TX"
+          "legendFormat": "{{ name }} TX ({{ interface }})"

-          "legendFormat": "{{ name }} RX errors"
+          "legendFormat": "{{ name }} RX errors ({{ interface }})"

-          "legendFormat": "{{ name }} TX errors"
+          "legendFormat": "{{ name }} TX errors ({{ interface }})"

-          "legendFormat": "{{ name }} RX dropped"
+          "legendFormat": "{{ name }} RX dropped ({{ interface }})"

-          "legendFormat": "{{ name }} TX dropped"
+          "legendFormat": "{{ name }} TX dropped ({{ interface }})"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dashboards/vm/virtual-machines.json` around lines 577 - 655, The panel
legends for network panels ("Outbound Throughput (TX)", "Packets/s", "Errors &
Drops") collapse multi-NIC series because legendFormat uses only "{{ name
}}"/direction; update each target in those panels to either aggregate per-VM
(wrap the PromQL in sum by (name) e.g. sum by(name)(rate(...))) or keep
per-interface visibility by adding the interface label to the legend (change
legendFormat to include "{{ interface }}" such as "{{ name }} {{ interface }}
TX"/"RX"/"{{ interface }} RX errors" etc); modify the target expressions and/or
legendFormat in the targets for kubevirt_vmi_network_receive_*,
kubevirt_vmi_network_transmit_*, kubevirt_vmi_network_*_packets_dropped_total,
and kubevirt_vmi_network_*_errors_total accordingly.
🧹 Nitpick comments (1)
dashboards/vm/virtual-machines.json (1)

132-143: Revisit CPU thresholds now that these panels show cores.

The dashboard now renders CPU as short, but the cutoffs stayed at 0.7/0.9 and 0.5/0.8. Those values behave like leftover percentage thresholds and will color ordinary load as saturation on larger VMs. If the intent is utilization, normalize by allocated vCPUs; otherwise drop the color thresholds.

Also applies to: 278-294

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dashboards/vm/virtual-machines.json` around lines 132 - 143, The CPU panels
set fieldConfig.defaults.unit to "short" but keep thresholds.steps values like
0.7/0.9 and 0.5/0.8 (percentage-style), which will miscolor load when panels
show cores; update the panels that define fieldConfig.defaults.decimals and
fieldConfig.defaults.thresholds.steps (the entries with values 0.7/0.9 and
0.5/0.8, also present at the other occurrence around the 278-294 region) to
either (A) normalize the threshold values by allocated vCPUs (compute threshold
= desired utilization * vCPU count so thresholds become absolute core counts),
(B) change fieldConfig.defaults.unit to "percent" and keep the fractional
thresholds, or (C) remove/disable thresholds.steps entirely if no coloring is
desired; apply the chosen fix consistently to both occurrences and ensure
decimals remain appropriate for the selected unit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboards/vm/virtual-machines.json`:
- Around line 423-453: The Grafana panels using
rate(kubevirt_vmi_vcpu_wait_seconds_total[...] ) and
rate(kubevirt_vmi_vcpu_delay_seconds_total[...] ) are labeled with unit "s"
which is incorrect because rate() yields seconds/second (dimensionless); update
the panels (identified by title "vCPU Wait" and "vCPU Delay (Steal Time)" or the
target expressions rate(kubevirt_vmi_vcpu_wait_seconds_total{...}) and
rate(kubevirt_vmi_vcpu_delay_seconds_total{...})) to use a dimensionless unit
such as "short" or an explicit "s/s" (or alternatively change the query to
increase(...)/interval if you really want elapsed seconds over the range), and
adjust the fieldConfig.defaults.unit from "s" to the chosen unit so axis and
tooltips display correctly.

---

Duplicate comments:
In `@dashboards/vm/virtual-machines.json`:
- Around line 577-655: The panel legends for network panels ("Outbound
Throughput (TX)", "Packets/s", "Errors & Drops") collapse multi-NIC series
because legendFormat uses only "{{ name }}"/direction; update each target in
those panels to either aggregate per-VM (wrap the PromQL in sum by (name) e.g.
sum by(name)(rate(...))) or keep per-interface visibility by adding the
interface label to the legend (change legendFormat to include "{{ interface }}"
such as "{{ name }} {{ interface }} TX"/"RX"/"{{ interface }} RX errors" etc);
modify the target expressions and/or legendFormat in the targets for
kubevirt_vmi_network_receive_*, kubevirt_vmi_network_transmit_*,
kubevirt_vmi_network_*_packets_dropped_total, and
kubevirt_vmi_network_*_errors_total accordingly.

---

Nitpick comments:
In `@dashboards/vm/virtual-machines.json`:
- Around line 132-143: The CPU panels set fieldConfig.defaults.unit to "short"
but keep thresholds.steps values like 0.7/0.9 and 0.5/0.8 (percentage-style),
which will miscolor load when panels show cores; update the panels that define
fieldConfig.defaults.decimals and fieldConfig.defaults.thresholds.steps (the
entries with values 0.7/0.9 and 0.5/0.8, also present at the other occurrence
around the 278-294 region) to either (A) normalize the threshold values by
allocated vCPUs (compute threshold = desired utilization * vCPU count so
thresholds become absolute core counts), (B) change fieldConfig.defaults.unit to
"percent" and keep the fractional thresholds, or (C) remove/disable
thresholds.steps entirely if no coloring is desired; apply the chosen fix
consistently to both occurrences and ensure decimals remain appropriate for the
selected unit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd64ca1f-3810-462b-b6fc-ef2a53608e7d

📥 Commits

Reviewing files that changed from the base of the PR and between 60d258b and bcf42b3.

📒 Files selected for processing (3)
  • dashboards/vm/console-logs.json
  • dashboards/vm/virtual-machines.json
  • packages/system/monitoring/dashboards.list
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/system/monitoring/dashboards.list
  • dashboards/vm/console-logs.json

dashboards/vm/virtual-machines.json Show resolved Hide resolved
Copy link
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 10, 2026
@kvaps
Copy link
Member

kvaps commented Mar 10, 2026

Hey @mattia-eleuteri could you please fix conflict after merging

Signed-off-by: Mattia Eleuteri <mattia@hidora.io>
Signed-off-by: mattia-eleuteri <mattia@hidora.io>
@mattia-eleuteri mattia-eleuteri force-pushed the monitoring/vm-dashboards branch from bcf42b3 to 46104de Compare March 11, 2026 09:39
@mattia-eleuteri
Copy link
Contributor Author

Rebased on upstream/main — conflict with #2197 in dashboards.list resolved. Also applied the legend and unit fixes from review comments.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@dashboards/vm/virtual-machines.json`:
- Around line 127-134: The panel is using unit "percentunit" while the PromQL
expr returns CPU cores (seconds/second) via
kubevirt_vmi_cpu_usage_seconds_total, which makes multi-vCPU VMs show >100%;
update the panel to either change the fieldConfig.defaults.unit from
"percentunit" to a generic unit like "short" and update the legend/text to
indicate "cores", or normalize the query to produce a 0–1 percentage by dividing
rate(kubevirt_vmi_cpu_usage_seconds_total{...}[$__rate_interval]) by a vCPU
total metric such as kubevirt_vmi_vcpu_seconds_total{state="total"} (and keep
percentunit only if you perform that normalization).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 38339b5d-6bf4-4f6e-a2f8-30b96e01da1c

📥 Commits

Reviewing files that changed from the base of the PR and between bcf42b3 and 46104de.

📒 Files selected for processing (3)
  • dashboards/vm/console-logs.json
  • dashboards/vm/virtual-machines.json
  • packages/system/monitoring/dashboards.list
🚧 Files skipped from review as they are similar to previous changes (1)
  • dashboards/vm/console-logs.json

Comment on lines +127 to +134
"expr": "avg(rate(kubevirt_vmi_cpu_usage_seconds_total{namespace=\"$namespace\", name=~\"$vm\"}[$__rate_interval]))",
"legendFormat": ""
}
],
"options": { "colorMode": "value", "graphMode": "area", "textMode": "value" },
"fieldConfig": {
"defaults": {
"unit": "percentunit",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

CPU percentunit may be misleading for multi-vCPU VMs.

rate(kubevirt_vmi_cpu_usage_seconds_total) returns CPU cores consumed (seconds/second), not a 0-1 ratio. For a VM with 4 vCPUs at full utilization, this returns ~4.0, which Grafana displays as "400%" with percentunit.

Consider using short with a label like "cores" or normalize by vCPU count if you truly want percentage:

rate(kubevirt_vmi_cpu_usage_seconds_total{...}[$__rate_interval]) 
  / on(name, namespace) kubevirt_vmi_vcpu_seconds_total{state="total"}

Alternatively, adjust thresholds to reflect core counts rather than percentages.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@dashboards/vm/virtual-machines.json` around lines 127 - 134, The panel is
using unit "percentunit" while the PromQL expr returns CPU cores
(seconds/second) via kubevirt_vmi_cpu_usage_seconds_total, which makes
multi-vCPU VMs show >100%; update the panel to either change the
fieldConfig.defaults.unit from "percentunit" to a generic unit like "short" and
update the legend/text to indicate "cores", or normalize the query to produce a
0–1 percentage by dividing
rate(kubevirt_vmi_cpu_usage_seconds_total{...}[$__rate_interval]) by a vCPU
total metric such as kubevirt_vmi_vcpu_seconds_total{state="total"} (and keep
percentunit only if you perform that normalization).

@kvaps kvaps enabled auto-merge March 18, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.