Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

   


🌿 WillowVibe

Open-source data infrastructure tools. Built in India. Used everywhere.
We build lightweight, self-hosted tooling that gives small data teams enterprise-grade pipeline observability, auditing, and automation — without vendor lock-in or SaaS bills.

         


🔍 What We Build

WillowVibe is a data engineering & AI tooling studio — solo-founded, contributor-driven, OSS-first.

  • Pipeline Auditing — point-in-time health checks on Airflow + dbt + warehouses; one command, one report
  • Data Observability — continuous monitoring for pipeline health, data freshness, volume anomalies, and schema drift
  • FinOps for Data — tracking Snowflake credits and BigQuery bytes billed, turning cloud cost chaos into actionable visibility
  • AI-Augmented Pipelines — embedding AI at the right layer of the data stack without replacing what already works
  • Open-Source First — every internal tool we build, we ship as OSS so the community benefits

We operate a solo + contributor model — lean by design, moving fast, building things that solve real problems for data teams.


🚀 Projects

🔬 PipelineProbeNew

Instant Data Pipeline Audit Report for Airflow + dbt + modern warehouses

Run a single command, get a full HTML audit report. PipelineProbe is a read-only CLI audit tool for data engineers who want a fast, objective health check of their pipeline stack — before a migration, after an incident, or as a recurring CI gate.

pip install pipelineprobe
pipelineprobe init      # generates pipelineprobe.yml
pipelineprobe audit     # produces pipelineprobe-report.html
  • Airflow checks — high failure-rate DAGs, missing retries, missing SLAs, stale pipelines
  • dbt checks — models with zero tests, failing test runs, orphaned models
  • Warehouse checks — oversized tables, missing audit timestamps (Postgres, BigQuery, Snowflake)
  • HTML + JSON report — traffic-light severity, health score 0–100, per-issue recommendations
  • CI-readyfail_on_critical exit code gates for GitHub Actions / GitLab CI
  • Zero mutations — 100% read-only; safe to run against production

Stack: Python · Typer · Pydantic · httpx · Jinja2 · psycopg2 · dbt artifacts

PipelineProbe Repo MIT License


Self-hosted Data Observability & FinOps Starter Kit for small data teams

ObservaKit gives 1–5 person data teams the 5 core observability pillars — Freshness, Volume, Quality, Schema Drift, and Pipeline Health — in a single docker-compose up. No Monte Carlo. No Metaplane. No SaaS bill.

  • ✅ Freshness Monitor — detects stale tables by tracking max(updated_at)
  • ✅ Volume Anomaly — Z-score detection against 7-day rolling averages
  • ✅ Quality Checks — Soda Core & Great Expectations templates, ready to use
  • ✅ Schema Drift Detector — snapshots information_schema, diffs on every run
  • ✅ Pipeline Health — Airflow/Prefect REST API + OpenTelemetry + Grafana
  • ✅ FinOps Tracker — Snowflake credits & BigQuery bytes billed, natively
  • ✅ Native dbt Integration — parses run_results.json directly, no extra packages

Stack: Python · FastAPI · SQLAlchemy · Alembic · Prometheus · Grafana · Docker Compose · dbt · Airflow / Prefect

ObservaKit Repo MIT License


🗂️ All Repositories

Repo Description Language Status
🔬 pipelineprobe Instant pipeline audit CLI — Airflow + dbt + warehouse Python active
🔭 ObservaKit Self-hosted data observability & FinOps starter kit Python active
🧰 toolscontainer Multi-purpose Python utility scripts & automations Python maintained
🕷️ scrapy-bot Scrapy + Flask web scraping bot experiment Python archived
💻 online-ide Lightweight online Python execution environment Python experimental

🛠️ Tech We Work With

Layer Tools
Data Engineering Python · dbt · Apache Airflow · Prefect · Apache Spark
Warehouses PostgreSQL · Snowflake · BigQuery · DuckDB
Observability Prometheus · Grafana · OpenTelemetry · Soda Core
Backend FastAPI · SQLAlchemy · Alembic · Pydantic
Infra & DevOps Docker · Docker Compose · Terraform · GitHub Actions
AI / ML LangChain · OpenAI APIs · Vector DBs (Qdrant / ChromaDB)

🌱 Our Open-Source Philosophy

"Build what the ecosystem needs. Share what you build. Let the community make it better."

Every project we open-source follows three rules:

  1. Zero vendor lock-in — runs on infra you own and control
  2. Quickstart in under 10 minutes — if onboarding is painful, it won't get adopted
  3. Progressive complexity — adopt one layer at a time; no all-or-nothing commitment

We actively maintain what we ship. Issues get responses. PRs get reviewed. Roadmaps get published.


🤝 Contributing

All public repos welcome contributions. Best places to start:

  • 🔬 PipelineProbegood first issues

    • Add a new warehouse connector (Redshift, DuckDB)
    • Add a new rule (task duration outliers, dbt source freshness)
    • Improve the HTML report template
  • 🔭 ObservaKitgood first issues

    • Add a new warehouse connector (Redshift, Delta Lake)
    • Write a Grafana dashboard for a new observability use case
    • Improve documentation or add a real-world example

Read CONTRIBUTING.md before opening a PR.


📬 Get In Touch

We are open to:

  • Collaborations on data tooling, AI pipelines, or observability infra
  • Consulting engagements — data platform audits, pipeline migrations, cost optimization
  • Freelance / contract data engineering for startups and scaleups
Channel Link
🐙 GitHub @willowvibe
🔬 PipelineProbe Issues Open an issue
🔭 ObservaKit Issues Open an issue
🔐 Security Reports See SECURITY.md

🌿 WillowVibe — Bengaluru, India  ·  Building in the open since 2024  ·  Try PipelineProbe 🔬  ·  Star ObservaKit ⭐

Pinned Loading

  1. ObservaKit ObservaKit Public

    Self-hosted Data Observability & FinOps starter kit. Automate pipeline audits, track data freshness/quality, and monitor cloud costs with zero vendor lock-in. Powered by Docker Compose, dbt, Airflo…

    Python 1 1

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 8 of 8 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…

Morty Proxy This is a proxified and sanitized view of the page, visit original site.