Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

vayoa
Copy link

@vayoa vayoa commented Oct 4, 2025

This is a fix for my own issue. Fixes #260.

  • Edited apps/worker-py/.venv/lib/python3.11/site-packages/langextract/annotation.py.

    1. Inside _annotate_documents_single_pass, when creating annotated_doc before yielding (around the loop that flushes finished
      documents), wrap annotated_extractions in list(...) (or use copy()), then immediately reset annotated_extractions = [] so
      the next document gets its own list.
    2. Apply the same treatment in the final flush block at the end of the function so the last document is isolated as well.
    3. Ensure any other yield site in this function (including the sequential-pass helper if it reuses the same collector) also
      hands out a fresh list.
    4. Write a simple test.

Commits:

  • langextract/langextract/annotation.py:355 and langextract/langextract/annotation.py:404 now hand out a copied extraction list before each yield and immediately reset annotated_extractions so every document receives its own list without bleed- through.
  • langextract/tests/annotation_test.py:745 introduces a regression test with a fake resolver that asserts each annotated document keeps its own extraction payload and that the lists are distinct.

  - langextract/langextract/annotation.py:355 and langextract/langextract/annotation.py:404 now hand out a copied extraction
    list before each yield and immediately reset annotated_extractions so every document receives its own list without bleed-
    through.
  - langextract/tests/annotation_test.py:745 introduces a regression test with a fake resolver that asserts each annotated
    document keeps its own extraction payload and that the lists are distinct.
@github-actions github-actions bot added the size/S Pull request with 50-150 lines changed label Oct 4, 2025
Copy link

github-actions bot commented Oct 4, 2025

No linked issues found. Please link an issue in your pull request description or title.

Per our Contributing Guidelines, all PRs must:

  • Reference an issue with one of:
    • Closing keywords: Fixes #123, Closes #123, Resolves #123 (auto-closes on merge in the same repository)
    • Reference keywords: Related to #123, Refs #123, Part of #123, See #123 (links without closing)
  • The linked issue should have 5+ 👍 reactions from unique users (excluding bots and the PR author)
  • Include discussion demonstrating the importance of the change

You can also use cross-repo references like owner/repo#123 or full URLs.

Copy link

github-actions bot commented Oct 5, 2025

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Pull request with 50-150 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-Document extraction bleed (only last result captured)

1 participant

Morty Proxy This is a proxified and sanitized view of the page, visit original site.