RecHarness is an agent-agnostic recommendation quality layer for making general-purpose agents more reliable product recommenders. It is not a full shopping agent.
v0.2-alpha focuses on agent integration: stable agent-facing schemas, multi-catalog routing, MCP tools, framework-neutral tool callables, deterministic agent-loop demos, and local dogfooding utilities.
General agents are becoming shopping interfaces, but they can recommend products that violate user constraints, hallucinate attributes, overfit vague preferences, or lack evidence. RecHarness provides the harness-level structure needed to make recommendation flows inspectable and testable.
The current package metadata version is 0.2.0a0, corresponding to the
v0.2-alpha release notes in CHANGELOG.md. See
docs/release_notes_v0.2-alpha.md for a concise summary of the v0.2-alpha
release. The main branch includes:
- typed product, preference, constraint, recommendation, verification, and trace schemas
- deterministic local JSONL catalog loading
- catalog validation and field coverage stats
- rule-based preference extraction for common shopping constraints
- dot-path constraint verification against product records
- structured claim verification for prices, laptop fit, water resistance, weight, and availability
- deterministic local keyword and attribute-aware retrieval
- transparent simple ranking
RecHarness.assist()with recommended and rejected candidate bundlesverify_agent_recommendation()SDK flowAgentHarnessRouterfor stable agent-facing parse, assist, and verify envelopes- framework-neutral tool callables through
make_recharness_tool_functions() - CLI commands for catalog validation, assist, verify, eval, eval-assist, and optional MCP serving
- JSONL trace logging
- batch evaluation
- backpack and headphones example domains
- deterministic agent-loop demos and headphones dogfooding utilities
- pytest coverage for the foundation behavior
No external LLM API is required for the current deterministic harness. Retrieval is local and catalog-based. The MCP server is optional.
The project is configured for uv and hatchling.
uv sync --extra dev
uv run pytestIf uv is unavailable, use a local virtual environment:
python3 -m venv .venv
.venv/bin/python -m pip install -e '.[dev]'
.venv/bin/python -m pytestfrom recharness import JsonlCatalog
catalog = JsonlCatalog.load("examples/backpacks/catalog.jsonl")
report = catalog.validate()
stats = catalog.stats()
print(report.product_count)
print(stats.field_coverage["price"])from recharness import RuleBasedPreferenceParser
parser = RuleBasedPreferenceParser()
need = parser.parse(
"Find a commuting backpack under 1500 RMB that fits a 14-inch laptop and is not too business."
)
print(need.hard_constraints)
print(need.negative_preferences)from recharness import Constraint, ConstraintVerifier, Money, ProductItem
product = ProductItem(
product_id="bag_001",
title="UrbanLite Commuter Backpack 22L",
category="backpack",
price=Money(amount=899, currency="CNY"),
attributes={"laptop_size_inches": 14, "style": ["minimal", "casual"]},
)
report = ConstraintVerifier().verify_product(
product,
[
Constraint(field="price.amount", operator="<=", value=1500),
Constraint(field="attributes.laptop_size_inches", operator=">=", value=14),
],
)
print(report.status)from recharness import RecHarness
harness = RecHarness.from_jsonl_catalog(
"examples/backpacks/catalog.jsonl",
trace_path="runs/assist_traces.jsonl",
)
report = harness.verify_agent_recommendation(
user_query="Find a commuting backpack under 1500 RMB that fits a 14-inch laptop.",
agent_answer="I recommend RainGuard Metro Pack 24L. It costs 1599 RMB.",
)
print(report.status)
print(report.violations)
print(report.claim_issues)recharness verify \
--catalog examples/backpacks/catalog.jsonl \
--query "Find a commuting backpack under 1500 RMB that fits a 14-inch laptop" \
--answer "I recommend RainGuard Metro Pack 24L. It costs 1599 RMB."recharness verify \
--catalog examples/backpacks/catalog.jsonl \
--query "1500元以内、防水、适合通勤的双肩包" \
--answer "我推荐 NorthPeak Office Pack 28L,售价1299元,完全防水,而且很轻量。" \
--jsonrecharness verify \
--catalog examples/backpacks/catalog.jsonl \
--query "Find a commuting backpack under 1500 RMB" \
--answer "I recommend UrbanLite Commuter Backpack 22L. It costs 899 RMB." \
--trace-path runs/verify.jsonlrecharness verify \
--catalog examples/headphones/catalog.jsonl \
--query "想找1000元以内,适合通勤,有降噪的蓝牙耳机" \
--answer "我推荐 OfficeClear Call 32,售价799元,有主动降噪。"recharness verify \
--catalog examples/headphones/catalog.jsonl \
--query "想找1000元以内,适合通勤,有降噪的蓝牙耳机" \
--answer "我推荐 SonicLite AirBuds,售价699元,有主动降噪,续航30小时。" \
--repairfrom recharness import RecHarness
harness = RecHarness.from_jsonl_catalog("examples/backpacks/catalog.jsonl")
bundle = harness.assist(
user_query="Find a commuting backpack under 1500 RMB that fits a 14-inch laptop and is not too business.",
top_k=2,
)
for candidate in bundle.recommended:
print(candidate.product.title, candidate.final_score)
print(bundle.summary_for_agent)recharness assist \
--catalog examples/backpacks/catalog.jsonl \
--query "Find a commuting backpack under 1500 RMB that fits a 14-inch laptop and is not too business" \
--top-k 2recharness assist \
--catalog examples/backpacks/catalog.jsonl \
--query "1500元以内,适合通勤,能放14寸电脑,不要太商务的双肩包" \
--top-k 3 \
--jsonrecharness assist \
--catalog examples/backpacks/catalog.jsonl \
--query "Find a commuting backpack under 1500 RMB" \
--trace-path runs/assist.jsonl \
--variant fullUse AgentHarnessRouter when a general-purpose agent needs stable request and
response envelopes across multiple local catalogs:
from recharness import AgentHarnessRouter, AssistRequest, VerifyRequest
router = AgentHarnessRouter.from_config_file("examples/mcp/catalogs.json")
catalogs = router.list_catalogs()
assist = router.assist(
AssistRequest(
user_query="想找1000元以内,适合通勤,有降噪的蓝牙耳机",
domain="headphones",
top_k=3,
)
)
verify = router.verify(
VerifyRequest(
user_query="想找1000元以内,适合通勤,有降噪的蓝牙耳机",
domain="headphones",
agent_answer="我推荐 OfficeClear Call 32,售价799元,有主动降噪。",
)
)For best reliability, general agents should list catalogs first, choose the
most appropriate domain, and pass domain explicitly to assist and verify
calls. If no domain is provided, RecHarness tries parsed category routing and
then default-catalog fallback, which is convenient but less reliable for
ambiguous queries.
See docs/agent_integration.md, docs/mcp_config.md, and
docs/agent_loop_contract.md, docs/repair.md, and docs/domain_adapters.md
for integration details.
RecHarness can be used through:
- Python SDK
- MCP tools
- deterministic MCP-style demo
- deterministic tool-calling agent demo
- optional OpenAI Agents SDK demo
Useful links:
docs/agent_integration.mddocs/agent_loop_contract.mddocs/repair.mddocs/mcp_config.mddocs/mcp_client_demo.mddocs/tool_calling_agent_demo.mddocs/openai_agents.mdexamples/integrations/mcp_client_demo.pyexamples/integrations/tool_calling_agent_demo.pyexamples/integrations/run_headphones_dogfood.pyexamples/integrations/openai_agents_demo.py
Headphones dogfooding tasks can be run locally as a development utility:
python examples/integrations/run_headphones_dogfood.py
python examples/integrations/run_headphones_dogfood.py \
--out runs/headphones_dogfood/results.jsonlThe dogfood runner prints task-level diagnostics and optional raw JSONL output; it is not a benchmark report generator.
recharness eval \
--catalog examples/backpacks/catalog.jsonl \
--missions examples/backpacks/missions.jsonl \
--agent-outputs examples/backpacks/agent_outputs.jsonl \
--out runs/eval_baselineThe eval command writes metrics.json, per_mission_results.jsonl,
leaderboard.csv, and traces.jsonl.
Trace records include structured verification reports, including claim_issues
for factual claim diagnostics. Claim metrics distinguish unsupported,
overstated, and incorrect claims.
RecHarness includes local evaluation utilities for checking recommendation quality, constraint satisfaction, claim issues, and failure labels.
Use eval-assist to evaluate RecHarness output directly against local mission
files:
recharness eval-assist \
--catalog examples/backpacks/catalog.jsonl \
--missions examples/backpacks/missions.jsonl \
--out runs/assist_eval \
--top-k 3 \
--variant fullDiagnostic variants help users compare retrieval behavior:
full: hybrid keyword + constraint-aware retrievalkeyword_only: keyword retrieval onlyconstraint_only: constraint-aware scoring only
recharness eval-assist \
--catalog examples/backpacks/catalog.jsonl \
--missions examples/backpacks/missions.jsonl \
--out runs/assist_eval_keyword \
--top-k 3 \
--variant keyword_onlyrecharness eval-assist \
--catalog examples/backpacks/catalog.jsonl \
--missions examples/backpacks/missions.jsonl \
--out runs/assist_eval_constraint \
--top-k 3 \
--variant constraint_onlyAssist evaluation reports recommendation_count_avg,
hard_constraint_satisfaction_rate, hard_violation_rate,
gold_recall_at_k, avg_final_score, and avg_rejected_candidates.
Per-mission outputs include failure_labels such as
product_hallucination, hard_constraint_violation, overstated_claim,
incorrect_claim, and candidate_pool_contains_violations.
The checked-in backpack benchmark contains:
- 50 catalog products
- 50 recommendation missions
- 50 baseline agent outputs covering valid, over-budget, hallucinated, and overstated-claim answers
Example domains:
examples/backpacksexamples/headphones
SDK demos:
examples/assist_demo.pyexamples/verify_demo.pyexamples/evaluation_demo.pyexamples/agent_loops/verify_before_final_answer.pyexamples/agent_loops/repair_loop_demo.py
Additional docs:
docs/catalog_schema.mddocs/recommendation_bundle.mddocs/verification.mddocs/evaluation.mddocs/adding_a_new_domain.mddocs/agent_integration.mddocs/agent_loop_contract.mddocs/repair.mddocs/mcp_config.mddocs/mcp_client_demo.mddocs/tool_calling_agent_demo.mddocs/domain_adapters.mddocs/http_server.mddocs/openai_agents.md
- No real-time price or inventory lookup
- Rule-based parser covers common shopping constraints but is not comprehensive
- Chinese and English claim checks are pattern-based
- Product mention resolution is deterministic and may miss ambiguous references
- No checkout, payment, or order workflow
- No high-risk product suitability checking
The MCP integration is optional:
uv sync --extra dev --extra mcp
recharness mcp serve \
--catalog examples/backpacks/catalog.jsonlThe MCP server exposes tools for preference parsing, assist-mode bundles, and recommendation verification.
Catalog rows are JSON objects that validate as ProductItem records:
{"product_id":"bag_001","title":"UrbanLite Commuter Backpack 22L","category":"backpack","price":{"amount":899,"currency":"CNY"},"attributes":{"laptop_size_inches":14,"weight_kg":0.85}}MIT