Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Fluxloop-AI/fluxloop

Open more actions menu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

190 Commits
190 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FluxLoop Logo

FluxLoop OSS

Status License SDK PyPI CLI PyPI

Agentic Testing for AI Agents

"Hey, test my agent for order cancellation with angry customers"
→ FluxLoop handles the rest: setup, CLI install, synthesis, execution, and analysis.

🤖 Agent-First Workflow

Your coding agent (Claude Code) orchestrates the entire testing flow. Just describe what you want to test—FluxLoop does the heavy lifting.

🎯 Simulate at Scale

Run thousands of realistic multi-turn scenarios in parallel. Find edge cases before production.

📊 Align to Your Standards

Capture your implicit decision criteria. Turn intuition into automated evaluation.


Getting Started

⭐ Claude Code Plugin (Recommended)

Install the plugin, then just talk.

/plugin install Fluxloop-AI/fluxloop-claude-plugin

That's it. Now say:

"test my agent for refund scenarios"

The Agent Test Skill handles everything:

  • ✅ Installs FluxLoop CLI (if needed)
  • ✅ Logs you in
  • ✅ Creates project/scenario
  • ✅ Synthesizes test inputs
  • ✅ Runs simulations
  • ✅ Analyzes results and suggests fixes

No commands to memorize. No manual setup. Just ask.

Example Conversation

User: "Test my chatbot for refund scenarios with frustrated customers"

Agent: Let me set up FluxLoop and run tests...

       ✓ FluxLoop CLI installed
       ✓ Logged in
       ✓ Project created
       ✓ 10 test inputs synthesized (40% hard cases)
       ✓ Running simulation...
       
       📊 Results: 8/10 passed (80%)
       ⚠️ Failed on edge case: customer requesting partial refund
       
       💡 Suggested fix: Add handling for partial refund requests
       
       Would you like me to analyze the failures in detail?

📖 Documentation: docs.fluxloop.ai/claude-code


📦 Packages

1. Claude Code Plugin ⭐

The primary way to use FluxLoop. Your coding agent orchestrates the entire testing workflow through natural conversation.

Feature Description
Agent Test Skill Auto-activates on "test my agent", handles everything
Zero Config Skill installs CLI, logs in, creates projects automatically
Context-Aware Knows your setup state, guides you through missing steps

📖 Location: packages/fluxloop-plugin/
📖 Docs: docs.fluxloop.ai/claude-code

2. CLI

For power users and CI/CD pipelines. Direct command-line control when you need it.

pip install fluxloop-cli
fluxloop test --scenario my-test

📖 Docs: docs.fluxloop.ai/cli
📦 PyPI: fluxloop-cli

3. SDK (Python 3.11+)

Core instrumentation library. Add @fluxloop.agent() decorator to trace agent execution.

import fluxloop

@fluxloop.agent()
def my_agent(input: str) -> str:
    # Your agent logic
    return response

📖 Docs: docs.fluxloop.ai/sdk
📦 PyPI: fluxloop


Key Features

🤖 Agentic Testing with Claude Code

Just talk naturally:

"Test my order-bot for cancellation scenarios"
"Generate edge cases for payment failures"
"Why did the last test fail?"

The skill understands context and adapts to your state.

🎯 Simple Instrumentation

Works with any Python agent framework:

@fluxloop.agent()
def my_agent(input: str) -> str:
    # LangChain, LlamaIndex, custom—anything works
    return response

📊 Evaluation-First Testing

Define criteria, run reproducible experiments, get actionable insights.

🧪 Offline-First Simulation

Run experiments locally with full control. No cloud dependency for testing.


☁️ Seamless Web Integration

FluxLoop combines local execution with cloud intelligence for a powerful testing workflow.

1. Cloud-Powered Synthesis

When you say "generate edge cases", FluxLoop Web synthesizes realistic, diverse test data using advanced LLMs. This data is instantly synced to your local environment for testing.

2. Deep Evaluation & Analysis

Test results are automatically uploaded to alpha.app.fluxloop.ai for deep inspection:

  • 🕵️ Trace Analysis: Step-by-step debugging of agent conversations
  • 📈 Performance Metrics: Success rates, latency, token usage trends
  • ⚖️ Comparison: Side-by-side view of how recent changes affected behavior

3. The Perfect Loop

  1. You: "Test my agent" (Claude Code)
  2. Web: Generates test scenarios (Cloud)
  3. CLI: Runs tests locally (Local)
  4. Web: Analyzes results (Cloud)
  5. You: Review summary in IDE & detailed report on Web

What You Can Do

Capability How
🤖 Conversational Testing "test my agent with angry customers"
🎯 Instrument Agents @fluxloop.agent() decorator
📝 Synthesize Inputs Skill generates realistic test data
🧪 Run Simulations Batch experiments with parallel execution
💬 Multi-Turn Conversations Auto-extend into dialogues
📊 Analyze Results Get insights and fix suggestions

Links

Resource URL
FluxLoop Web alpha.app.fluxloop.ai
Documentation docs.fluxloop.ai
Claude Code Plugin docs.fluxloop.ai/claude-code
CLI Docs docs.fluxloop.ai/cli
SDK Docs docs.fluxloop.ai/sdk

🤝 Why Contribute?

We're building the future of AI agent testing—where your coding agent tests your AI agents.

  • Improve agentic workflows: Make the Claude Code skill smarter
  • Build framework adapters: LangChain, LlamaIndex, CrewAI
  • Enhance synthesis: Better intent-to-input generation
  • Develop evaluation methods: Novel agent performance metrics

Check out our contribution guide and open issues.


🚨 Community & Support


📄 License

FluxLoop is licensed under the Apache License 2.0.

About

Open-source toolkit for running reproducible, offline-first simulations of AI agents against dynamic scenarios

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Morty Proxy This is a proxified and sanitized view of the page, visit original site.