Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

feat(evaluation): add task-aware metrics, reporting, and workflow automation#7

Open
cto-new[bot] wants to merge 1 commit intomasterTingelam/DeepLearningExamples:masterfrom
feature-eval-reporting-workflowTingelam/DeepLearningExamples:feature-eval-reporting-workflowCopy head branch name to clipboard
Open

feat(evaluation): add task-aware metrics, reporting, and workflow automation#7
cto-new[bot] wants to merge 1 commit intomasterTingelam/DeepLearningExamples:masterfrom
feature-eval-reporting-workflowTingelam/DeepLearningExamples:feature-eval-reporting-workflowCopy head branch name to clipboard

Conversation

@cto-new
Copy link

@cto-new cto-new bot commented Nov 23, 2025

Summary

  • Introduce a comprehensive evaluation and reporting framework for Street Scene, including task-aware metrics, report generation, and end-to-end workflow automation.

Details

  • Implement src/evaluation/metrics.py with detection, tracking, and classification metrics
  • Implement src/evaluation/reporting.py to generate JSON/Markdown/HTML reports and comparisons
  • Wire the pipeline to emit metric plots and reproduction checklists after training and evaluation
  • Add CLI scripts: run_workflow.py, compare_runs.py, verify_repro.py, and deploy.py (stub)
  • Update documentation (docs/evaluation_and_reporting.md, README) and dependencies (scikit-learn, motmetrics, seaborn)

Impact

  • Enables reproducible experiments and cross-run analysis with archived reports per run.

Warning: Task VM test is not passing, cto.new will perform much better if you fix the setup

…ow automation

This adds a full evaluation and reporting framework for Street Scene. Introduces task-aware metrics, a reporting system, and end-to-end workflow automation.

- Implemented src/evaluation/metrics.py with detection, tracking, and classification metrics
- Implemented src/evaluation/reporting.py to generate JSON/Markdown/HTML reports and comparisons
- Wired the pipeline to emit metric plots and reproduction checklists after training and evaluation
- Added CLI scripts: run_workflow.py, compare_runs.py, verify_repro.py, and deploy.py (stub)
- Updated documentation (docs/evaluation_and_reporting.md, README) and dependencies (scikit-learn, motmetrics, seaborn)

Impact: ensures reproducibility and easier cross-run analysis; reports saved per run.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.