Fix test single_example_collator to wrap index as tensor#1477
Fix test single_example_collator to wrap index as tensor#1477finbarrtimbers merged 3 commits intomainallenai/open-instruct:mainfrom fix-test-collator-indexallenai/open-instruct:fix-test-collator-indexCopy head branch name to clipboard
Conversation
The test collator returned raw int for index, causing TypeError in _iter_batches which calls len(batch["index"]). Match the production collator by wrapping as torch.tensor([index]). Co-authored-by: Cursor <cursoragent@cursor.com>
Summary of ChangesHello @hamishivi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a runtime error in the test suite's data loading mechanism. It corrects the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request addresses a bug in test_data_loader_gpu.py where the single_example_collator function returned an index as a plain integer, leading to a TypeError. The fix wraps the index in a torch.tensor, which is the correct approach and aligns the test helper with its production counterpart in data_loader.py. The change is clear, correct, and improves the robustness of the test suite.
Co-authored-by: Cursor <cursoragent@cursor.com>
Now that single_example_collator wraps index as a tensor, test assertions need .item() or .tolist() to compare with plain ints. Co-authored-by: Cursor <cursoragent@cursor.com>
|
ahhh can you re-run the single GPU script? I can do it tomorrow. I want to make sure that doesn't break |
|
ok ran the single GPU GRPO script: Beaker. |
|
And it passed! So I'm re-adding to the merge queue. |
Summary
The
single_example_collatorintest_data_loader_gpu.pyreturned the raw example dict withindexas a plainint.HFDataLoader._iter_batchescallslen(batch["index"])which fails withTypeError: object of type 'int' has no len().Fix: wrap index as
torch.tensor([index])to match the productionsingle_example_collatorindata_loader.py.Test plan
Made with Cursor