Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

lxg2015
Copy link
Contributor

@lxg2015 lxg2015 commented Jun 13, 2025

Checklist Before Starting

  • Searched for similar PR(s).
  • Checked PR Title format
    • In format of: [modules] type: Title
    • modules are in fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc
    • type is in feat, fix, refactor, chore
    • can involve multiple modules, seperated by , or space, like [megatron, fsdp, doc] feat: xxx

What does this PR do?

when I converter hf ckpt to mcore with --test, an AttributeError raised , this PR will fixed it

[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc.

High-Level Design

Demonstrate the high-level design if this PR is complex.

Specific Changes

List the specific changes.

API

Demonstrate how the API changes if any.

Usage Example

Provide usage example(s) for easier usage.

# Add code snippet or script demonstrating how to use this 

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title description if it breaks any API.
  • Update the documentation about your changes in the docs.
  • New CI unit test(s) are added to cover the code path.
  • Rely on existing unit tests on CI that covers the code path.

@ETOgaosion
Copy link
Collaborator

ETOgaosion commented Jun 13, 2025

Thanks for contribution!

Actually it may be hard to test converter as there is no reference, but test whether runnable is OK, we can enable test here.

@ETOgaosion
Copy link
Collaborator

@lxg2015 Could you help fix the checkpoint tests?

@ETOgaosion ETOgaosion merged commit 6681e25 into volcengine:main Jun 13, 2025
34 of 37 checks passed
yellowbee686 pushed a commit to yellowbee686/verl that referenced this pull request Jun 18, 2025
…buteError (volcengine#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Tyizhanshen pushed a commit to HyperdriveHustle/verl that referenced this pull request Jul 1, 2025
…buteError (volcengine#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
whatadayG pushed a commit to whatadayG/verl that referenced this pull request Sep 5, 2025
…buteError (volcengine#2010)

### Checklist Before Starting

- [x] Searched for similar PR(s).
- [x] Checked PR Title format
  - In format of: [modules] type: Title
- modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci,
training_utils, recipe, hardware, deployment, ray, worker,
single_controller, misc, perf, model, algo, env, tool, ckpt, doc`
  - type is in `feat, fix, refactor, chore`
- can involve multiple modules, seperated by `,` or space, like
`[megatron, fsdp, doc] feat: xxx`

### What does this PR do?

> when I converter hf ckpt to mcore with --test, an AttributeError
raised , this PR will fixed it

```sh
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 305, in convert_hf_to_mcore
[rank0]:     test_conversion(megatron_model_provider, tfconfig, output_path, model)
[rank0]:   File "verl/scripts/converter_hf_to_mcore.py", line 78, in test_conversion
[rank0]:     assert dut_data.shape == ref_state_dict.shape, f"{name=} {dut_data.shape=} {ref_data.shape=}"
[rank0]: AttributeError: 'dict' object has no attribute 'shape'
```

### Test

> For changes that can not be tested by CI (e.g., algorithm
implementation, new model support), validate by experiment(s) and show
results like training curve plots, evaluatuion results, etc.

### High-Level Design

> Demonstrate the high-level design if this PR is complex.

### Specific Changes

> List the specific changes.

### API

> Demonstrate how the API changes if any.

### Usage Example

> Provide usage example(s) for easier usage.

```python
# Add code snippet or script demonstrating how to use this 
```

### Checklist Before Submitting

- [x] Read the [Contribute
Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide).
- [x] Apply [pre-commit
checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting).
- [x] Add `[BREAKING]` to the PR title `description` if it breaks any
API.
- [ ] Update the documentation about your changes in the
[docs](https://github.com/volcengine/verl/tree/main/docs).
- [ ] New CI unit test(s) are added to cover the code path.
- [ ] Rely on existing unit tests on CI that covers the code path.

---------

Co-authored-by: lixiaoguang12 <lixiaoguang12@meituan.com>
Co-authored-by: ETOgaosion <gaoziyuan19@mails.ucas.ac.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.