You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The go.d prometheus collector is still Framework V1 — registers Create → CollectorV1 and returns Collect(ctx) map[string]int64
(src/go/plugin/go.d/collector/prometheus/init.go:29, collector.go:128). V2 (metrix store, chart templates, host scopes) is required for relabeling and profiles. The durable
compatibility contract is the chart context (prometheus.<metric> / prometheus.<app>.<metric>, charts.go:262-267); chart-ID strings are not a contract. Two framework gaps block a
clean V2 migration:
FW-1: autogen emits a bare <metric> context; context_namespace is honored only by the template compiler, not autogen (chartengine/autogen.go:604, compiler.go:35).
FW-2: strict metrix rejects/panics on NaN/sparse summary/histogram input (pkg/metrix/summary.go:188-230, histogram.go:200), but scraped summaries legitimately have NaN quantiles.
Clean end state
A V2 collector (CreateV2, metrix.CollectorStore, Collect(ctx) error, MetricStore(), ChartTemplateYAML(); no reachable V1 path) that:
native hist/summary done robustly: pre-validate before instrument declaration and ObservePoint (metrix panics at both; cache accepted schema by name); schema drift → skip +
rate-limited log; absent/NaN component → gap (FW-2);
chart-ID drift accepted; src/health/REFERENCE.md examples updated with real post-migration IDs.
Acceptance criteria
PR2 — compat manifest + spike (baseline, no runtime change; first PR of this issue, after the parser). Merges durable golden tests capturing current V1 behavior (contexts
primary; values float-tolerant; dims/divisors/labels+label_prefix/lifecycle/_info/fallback/selector+limit order/config defaults incl. update_every:10). Enumerates ALL current config
drift — autodetection_retry (schema 60 / metadata 0 / runtime unset), expected_prefix+app (in config_schema.json:39,49, omitted from metadata.yaml), Counter vs counter casing
(config_schema.json:130 vs collector.go:63). The spike is investigation (design notes for the autogen-context mechanism + the groups-less question); it does NOT merge an
emitted-context proof — that lands in PR3.
PR3 — FW-1 (autogen context-namespace), Full Design Gate. Autogen emits prometheus[.app].<metric> via a per-job override; a test asserts the emitted context.
PR4 — FW-2 (metrix gap-on-absent), Full Design Gate. Native Summary/Histogram emit a dimension gap for an absent/NaN quantile/bucket; partial + all-empty tests; strict validation
intact for other collectors.
PR5 — V2 migration. PR2 manifest passes against V2; native hist/summary with pre-validation (NaN/Inf count/sum, duplicate quantiles, non-monotonic buckets, reserved le/quantile user labels → drop+log) and schema-drift skip+log; flat gauge/counter; float; label_prefix/app preserved; raw label values accepted (confirm cosmetic); update_every
stays 10; no V1 path; REFERENCE.md updated; Counter/counter casing fixed; per-integration consistency artifacts.
Category
refactor
Scope boundaries
IN: compat-only V2 migration + FW-1 + FW-2 + the manifest/spike baseline. OUT: relabeling, profiles, the parser rewrite (Issue 1), new labels/host-scopes/Functions. FW-1/FW-2 are Full
Design Gate framework changes (separate gated PRs; design note + approval); they are general-purpose and may be split into standalone framework issues.
Validation
Golden manifest (contexts primary, values float-tolerant); emitted-context tests (FW-1); gap tests (FW-2); pre-validation + drift tests; real-node run; consistency/CI.
Risks / compatibility
Chart-ID drift breaks chart-ID references (contexts preserved; docs updated). Native hist/summary panics if fed malformed input → pre-validation mandatory. Schema drift drops+logs the
series (rare; accepted minor data loss). FW-1/FW-2 are high-blast-radius (all collectors) → design note + approval first.
Problem / root cause
The go.d
prometheuscollector is still Framework V1 — registersCreate→CollectorV1and returnsCollect(ctx) map[string]int64(
src/go/plugin/go.d/collector/prometheus/init.go:29,collector.go:128). V2 (metrix store, chart templates, host scopes) is required for relabeling and profiles. The durablecompatibility contract is the chart context (
prometheus.<metric>/prometheus.<app>.<metric>,charts.go:262-267); chart-ID strings are not a contract. Two framework gaps block aclean V2 migration:
<metric>context;context_namespaceis honored only by the template compiler, not autogen (chartengine/autogen.go:604,compiler.go:35).pkg/metrix/summary.go:188-230,histogram.go:200), but scraped summaries legitimately have NaN quantiles.Clean end state
A V2 collector (
CreateV2,metrix.CollectorStore,Collect(ctx) error,MetricStore(),ChartTemplateYAML(); no reachable V1 path) that:metrix.Histogram/Summary; float dimensions;_infoskip, selector/limit ordering, fallback types,update_every:10, lifecycle (expire_after_cycles:10), andlabel_prefix/app;ObservePoint(metrix panics at both; cache accepted schema by name); schema drift → skip +rate-limited log; absent/NaN component → gap (FW-2);
src/health/REFERENCE.mdexamples updated with real post-migration IDs.Acceptance criteria
primary; values float-tolerant; dims/divisors/labels+
label_prefix/lifecycle/_info/fallback/selector+limit order/config defaults incl.update_every:10). Enumerates ALL current configdrift —
autodetection_retry(schema 60 / metadata 0 / runtime unset),expected_prefix+app(inconfig_schema.json:39,49, omitted frommetadata.yaml),Countervscountercasing(
config_schema.json:130vscollector.go:63). The spike is investigation (design notes for the autogen-context mechanism + the groups-less question); it does NOT merge anemitted-context proof — that lands in PR3.
prometheus[.app].<metric>via a per-job override; a test asserts the emitted context.intact for other collectors.
le/quantileuser labels → drop+log) and schema-drift skip+log; flat gauge/counter; float;label_prefix/apppreserved; raw label values accepted (confirm cosmetic);update_everystays 10; no V1 path;
REFERENCE.mdupdated;Counter/countercasing fixed; per-integration consistency artifacts.Category
refactor
Scope boundaries
IN: compat-only V2 migration + FW-1 + FW-2 + the manifest/spike baseline. OUT: relabeling, profiles, the parser rewrite (Issue 1), new labels/host-scopes/Functions. FW-1/FW-2 are Full
Design Gate framework changes (separate gated PRs; design note + approval); they are general-purpose and may be split into standalone framework issues.
Validation
Golden manifest (contexts primary, values float-tolerant); emitted-context tests (FW-1); gap tests (FW-2); pre-validation + drift tests; real-node run; consistency/CI.
Risks / compatibility
Chart-ID drift breaks chart-ID references (contexts preserved; docs updated). Native hist/summary panics if fed malformed input → pre-validation mandatory. Schema drift drops+logs the
series (rare; accepted minor data loss). FW-1/FW-2 are high-blast-radius (all collectors) → design note + approval first.