A command-line tool for managing backups and restores for SUSE Observability platform running on Kubernetes.
This CLI tool replaces the legacy Bash-based backup/restore scripts with a single Go binary that can be run from an operator host. It uses Kubernetes port-forwarding to connect to services and automatically discovers configuration from ConfigMaps and Secrets.
Current Support:
- Elasticsearch snapshots and restores
- ClickHouse backups and restores
- Stackgraph backups and restores
- VictoriaMetrics backups and restores
- Settings backups and restores
Download pre-built binaries from the releases page.
go build -o sts-backup -ldflags '-s -w -X github.com/stackvista/stackstate-backup-cli/cmd/version.Version=0.0.1 -X github.com/stackvista/stackstate-backup-cli/cmd/version.Commit=abce -X github.com/stackvista/stackstate-backup-cli/cmd/version.Date=2025-10-15'sts-backup [command] [subcommand] [flags]--namespace- Kubernetes namespace (required)--kubeconfig- Path to kubeconfig file (default: ~/.kube/config)--configmap- ConfigMap name containing backup configuration (default: suse-observability-backup-config)--secret- Secret name containing backup credentials (default: suse-observability-backup-config)--output, -o- Output format: table, json (default: table)--quiet, -q- Suppress operational messages--debug- Enable debug output
Display version information.
sts-backup versionManage Elasticsearch snapshots and restores.
Configure Elasticsearch snapshot repository and SLM policy.
sts-backup elasticsearch configure --namespace <namespace>List Elasticsearch indices.
sts-backup elasticsearch list-indices --namespace <namespace>List available Elasticsearch snapshots.
sts-backup elasticsearch list --namespace <namespace>Restore Elasticsearch snapshot. Automatically scales down affected deployments before restore and scales them back up afterward.
sts-backup elasticsearch restore --namespace <namespace> [--snapshot <name> | --latest] [flags]Flags:
--snapshot, -s- Name of snapshot to restore (mutually exclusive with --latest)--latest- Restore from the most recent snapshot (mutually exclusive with --snapshot)--background- Run restore in background without waiting for completion--yes, -y- Skip confirmation prompt
Note: Either --snapshot or --latest must be specified (mutually exclusive).
Check the status of a restore operation and finalize if complete.
sts-backup elasticsearch check-and-finalize --namespace <namespace> --operation-id <snapshot> [--wait]Flags:
--operation-id- Operation ID of the restore operation (snapshot name) (required)--wait- Wait for restore to complete if still running
Use Case: This command is useful when a restore was started with --background flag or was interrupted (Ctrl+C).
Manage Stackgraph backups and restores.
List available Stackgraph backups from S3/Minio.
sts-backup stackgraph list --namespace <namespace>Restore Stackgraph from a backup archive. Automatically scales down affected deployments before restore and scales them back up afterward.
sts-backup stackgraph restore --namespace <namespace> [--archive <name> | --latest] [flags]Flags:
--archive- Specific archive name to restore (e.g., sts-backup-20210216-0300.graph)--latest- Restore from the most recent backup--background- Run restore job in background without waiting for completion--yes, -y- Skip confirmation prompt
Note: Either --archive or --latest must be specified (mutually exclusive).
Check the status of a background Stackgraph restore job and clean up resources.
sts-backup stackgraph check-and-finalize --namespace <namespace> --job <job-name> [--wait]Flags:
--job, -j- Stackgraph restore job name (required)--wait, -w- Wait for job to complete before cleanup
Use Case: This command is useful when a restore job was started with --background flag or was interrupted (
Ctrl+C).
Manage VictoriaMetrics backups and restores.
List available VictoriaMetrics backups from S3/Minio.
sts-backup victoriametrics list --namespace <namespace>Note: In HA mode, backups from both instances (victoria-metrics-0 and victoria-metrics-1) are listed. The restore command accepts either backup to restore both instances.
Restore VictoriaMetrics from a backup archive. Automatically scales down affected StatefulSets before restore and scales them back up afterward.
sts-backup victoriametrics restore --namespace <namespace> [--archive <name> | --latest] [flags]Flags:
--archive- Specific backup name to restore (e.g., sts-victoria-metrics-backup/victoria-metrics-0-20251030143500)--latest- Restore from the most recent backup--background- Run restore job in background without waiting for completion--yes, -y- Skip confirmation prompt
Note: Either --archive or --latest must be specified (mutually exclusive).
Check the status of a background VictoriaMetrics restore job and clean up resources.
sts-backup victoriametrics check-and-finalize --namespace <namespace> --job <job-name> [--wait]Flags:
--job, -j- VictoriaMetrics restore job name (required)--wait, -w- Wait for job to complete before cleanup
Use Case: This command is useful when a restore job was started with --background flag or was interrupted (
Ctrl+C).
Manage Settings backups and restores.
List available Settings backups from S3/Minio.
sts-backup settings list --namespace <namespace>Restore Settings from a backup archive. Automatically scales down affected deployments before restore and scales them back up afterward.
sts-backup settings restore --namespace <namespace> [--archive <name> | --latest] [flags]Flags:
--archive- Specific archive name to restore (e.g., sts-backup-20251117-1404.sty)--latest- Restore from the most recent backup--background- Run restore job in background without waiting for completion--yes, -y- Skip confirmation prompt
Note: Either --archive or --latest must be specified (mutually exclusive).
Check the status of a background Settings restore job and clean up resources.
sts-backup settings check-and-finalize --namespace <namespace> --job <job-name> [--wait]Flags:
--job, -j- Settings restore job name (required)--wait, -w- Wait for job to complete before cleanup
Use Case: This command is useful when a restore job was started with --background flag or was interrupted (
Ctrl+C).
Manage ClickHouse backups and restores.
List available ClickHouse backups from the backup API.
sts-backup clickhouse list --namespace <namespace>Restore ClickHouse from a backup. Automatically scales down affected StatefulSets before restore and scales them back up afterward.
sts-backup clickhouse restore --namespace <namespace> --backup-name <name> [flags]Flags:
--backup-name- Name of the backup to restore (required)--wait- Wait for restore to complete (default: true)
Check the status of a ClickHouse restore operation and finalize if complete.
sts-backup clickhouse check-and-finalize --namespace <namespace> --operation-id <id> [--wait]Flags:
--operation-id- Operation ID of the restore operation (required)--wait- Wait for restore to complete if still running
Use Case: This command is useful when checking the status of a restore operation or finalizing after completion.
The CLI uses configuration from Kubernetes ConfigMaps and Secrets with the following precedence:
- CLI flags (highest priority)
- Environment variables (prefix:
BACKUP_TOOL_) - Kubernetes Secret (overrides sensitive fields)
- Kubernetes ConfigMap (base configuration)
- Defaults (lowest priority)
Create a ConfigMap with the following structure:
elasticsearch:
snapshotRepository:
name: sts-backup
bucket: sts-elasticsearch-backup
endpoint: suse-observability-minio:9000
basepath: ""
slm:
name: auto-sts-backup
schedule: "0 0 3 * * ?"
snapshotTemplateName: "<sts-backup-{now{yyyyMMdd-HHmm}}>"
repository: sts-backup
indices: "sts*"
retentionExpireAfter: 30d
retentionMinCount: 5
retentionMaxCount: 30
service:
name: suse-observability-elasticsearch-master-headless
port: 9200
localPortForwardPort: 9200
restore:
repository: sts-backup
scaleDownLabelSelector: "observability.suse.com/scalable-during-es-restore=true"
indexPrefix: sts
datastreamIndexPrefix: .ds-sts_k8s_logs
datastreamName: sts_k8s_logs
indicesPattern: sts*,.ds-sts_k8s_logs*Apply to Kubernetes:
kubectl create configmap suse-observability-backup-config \
--from-file=config=config.yaml \
-n <namespace>For sensitive credentials, create a Secret with S3/Minio access keys:
kubectl create secret generic suse-observability-backup-config \
--from-literal=elasticsearch.snapshotRepository.accessKey=<access-key> \
--from-literal=elasticsearch.snapshotRepository.secretKey=<secret-key> \
-n <namespace>See internal/foundation/config/testdata/validConfigMapConfig.yaml for a complete example.
.
├── cmd/ # CLI commands (Layer 4)
│ ├── root.go # Root command and global flags
│ ├── version/ # Version command
│ ├── elasticsearch/ # Elasticsearch subcommands
│ │ ├── configure.go # Configure snapshot repository
│ │ ├── list-indices.go # List indices
│ │ ├── list.go # List snapshots
│ │ ├── restore.go # Restore snapshot
│ │ └── check-and-finalize.go # Check and finalize restore
│ ├── clickhouse/ # ClickHouse subcommands
│ │ ├── list.go # List backups
│ │ ├── restore.go # Restore backup
│ │ └── check-and-finalize.go # Check and finalize restore
│ ├── stackgraph/ # Stackgraph subcommands
│ │ ├── list.go # List backups
│ │ ├── restore.go # Restore backup
│ │ └── check-and-finalize.go # Check and finalize restore job
│ ├── victoriametrics/ # VictoriaMetrics subcommands
│ │ ├── list.go # List backups
│ │ ├── restore.go # Restore backup
│ │ └── check-and-finalize.go # Check and finalize restore job
│ └── settings/ # Settings subcommands
│ ├── list.go # List backups
│ ├── restore.go # Restore backup
│ └── check-and-finalize.go # Check and finalize restore job
├── internal/ # Internal packages (Layers 0-3)
│ ├── foundation/ # Layer 0: Core utilities
│ │ ├── config/ # Configuration management
│ │ ├── logger/ # Structured logging
│ │ └── output/ # Output formatting
│ ├── clients/ # Layer 1: Service clients
│ │ ├── k8s/ # Kubernetes client
│ │ ├── elasticsearch/ # Elasticsearch client
│ │ ├── clickhouse/ # ClickHouse client
│ │ └── s3/ # S3/Minio client
│ ├── orchestration/ # Layer 2: Workflows
│ │ ├── portforward/ # Port-forwarding lifecycle
│ │ ├── scale/ # Deployment/StatefulSet scaling
│ │ ├── restore/ # Restore job orchestration
│ │ │ ├── confirmation.go # User confirmation prompts
│ │ │ ├── finalize.go # Job status check and cleanup
│ │ │ ├── job.go # Job lifecycle management
│ │ │ └── resources.go # Restore resource management
│ │ └── restorelock/ # Parallel restore prevention
│ ├── app/ # Layer 3: Dependency container
│ │ └── app.go # Application context and DI
│ └── scripts/ # Embedded bash scripts
├── main.go # Entry point
└── ARCHITECTURE.md # Detailed architecture documentation
- Layered Architecture: Clear separation between commands (Layer 4), dependency injection (Layer 3), workflows (Layer 2), clients (Layer 1), and utilities (Layer 0)
- Dependency Injection: Centralized dependency creation via
internal/app/eliminates boilerplate from commands - Testability: All layers use interfaces for external dependencies, enabling comprehensive unit testing
- Clean Commands: Commands are thin (50-100 lines) and focused on business logic
- Restore Lock Protection: Prevents parallel restore operations that could corrupt data
The CLI prevents parallel restore operations that could corrupt data by using Kubernetes annotations on Deployments and StatefulSets. When a restore starts:
- The CLI checks for existing restore locks before proceeding
- If another restore is in progress for the same datastore, the operation is blocked
- Mutually exclusive datastores are also protected (e.g., Stackgraph and Settings cannot restore simultaneously because they share HBase data)
If a restore operation is interrupted or fails, the lock annotations may remain. To manually remove a stuck lock:
kubectl annotate deployment,statefulset -l <label-selector> \
stackstate.com/restore-in-progress- \
stackstate.com/restore-started-at- \
-n <namespace>See ARCHITECTURE.md for detailed information about the layered architecture and design patterns.
This project uses GitHub Actions and GoReleaser for automated releases:
- Push a new tag (e.g.,
v1.0.0) - GitHub Actions automatically builds binaries for multiple platforms
- GoReleaser creates a GitHub release and uploads artifacts to S3
go test ./...golangci-lint run --config=.golangci.yml ./...We're exploring AI-assisted development with OpenCode. If you'd like to try it for your tasks, here's how to get started.
- Install OpenCode following their installation guide
- Run
opencodein the repository root - Start asking questions or requesting changes
OpenCode can help with:
- Understanding the codebase: Ask about architecture, patterns, or how specific features work
- Implementing new features: Describe what you need, and it will follow project conventions
- Writing tests: Request table-driven tests following our testify patterns
- Code reviews: Ask it to review changes against project guidelines
Here's an example workflow for implementing a new feature:
You: I need to add a "prune" command to elasticsearch that deletes snapshots older than N days.
It should follow the existing patterns in this codebase.
OpenCode will:
1. Explore existing commands to understand patterns
2. Create cmd/elasticsearch/prune.go following the command runner pattern
3. Add any needed client methods to internal/clients/elasticsearch/
4. Generate table-driven tests
5. Run linting to verify compliance
- Reference the docs: Mention
ARCHITECTURE.mdorAGENTS.mdif you want it to follow specific guidelines - Ask for reviews: After implementing, ask OpenCode to review the code against project standards
- Iterate: If something doesn't look right, ask for adjustments
This repository includes a code review agent (.opencode/agents/code-reviewer.md) that understands our architecture and coding standards. Use it to validate changes before submitting PRs.
The OpenCode configuration files in .opencode/ are not set in stone. If you find ways to improve the agents or add new ones, feel free to update them. Better prompts, additional review checks, or new specialized agents are all welcome contributions.
Note: We're still experimenting with AI-assisted development. Share your experiences with the team - what works, what doesn't, and any tips you discover.
Copyright (c) 2025 SUSE