Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Adding verl support#5498

Merged
rsareddy0329 merged 2 commits intoaws:master-v2aws/sagemaker-python-sdk:master-v2from
Harsh270519:verl-support-master-v2Harsh270519/sagemaker-python-sdk-main-repo:verl-support-master-v2Copy head branch name to clipboard
Feb 3, 2026
Merged

Adding verl support#5498
rsareddy0329 merged 2 commits intoaws:master-v2aws/sagemaker-python-sdk:master-v2from
Harsh270519:verl-support-master-v2Harsh270519/sagemaker-python-sdk-main-repo:verl-support-master-v2Copy head branch name to clipboard

Conversation

@Harsh270519
Copy link

@Harsh270519 Harsh270519 commented Jan 20, 2026

Issue #, if available:

Description of changes:
Adding verl support to v2

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Steps for manual testing

  1. From the private-sagemaker-hyperpod-recipes-staging (https://github.com/aws/private-sagemaker-hyperpod-recipes-staging) repo use the below commands to launch the jobs on SMTJ
  2. Make sure to set the cluster.sm_jobs_config.api_type=model_trainer to launch a job using the model_trainer. Another option is the estimator.

Launch command for llmft job

HYDRA_FULL_ERROR=1 python main.py \
  cluster=sm_jobs \
  cluster_type=sm_jobs \
  cluster.sm_jobs_config.api_type=model_trainer \
  instance_type="ml.p4de.24xlarge" \
  recipes=fine-tuning/llama/llmft_llama3_2_1b_instruct_seq4k_gpu_sft_lora \
  base_results_dir="$(pwd)/results" \
  ++cluster.sm_jobs_config.inputs.s3=null \
  ++cluster.sm_jobs_config.inputs.file_system.id=fs-079b3411789c02c3f \
  ++cluster.sm_jobs_config.inputs.file_system.type=FSxLustre \
  ++cluster.sm_jobs_config.inputs.file_system.directory_path=/olyr5bev \
  cluster.sm_jobs_config.output_path="s3://hyperpod-recipes-validation-artifacts/validation_run" \
  "cluster.sm_jobs_config.tensorboard_config=''" \
  cluster.sm_jobs_config.wait=False \
  ++cluster.sm_jobs_config.additional_estimator_kwargs.max_run=30000 \
  ++cluster.sm_jobs_config.additional_estimator_kwargs.instance_count=1 \
  '++cluster.sm_jobs_config.additional_estimator_kwargs.subnets=["subnet-0193afce112b25931"]' \
  '++cluster.sm_jobs_config.additional_estimator_kwargs.security_group_ids=["sg-0cd8958d241530753"]' \
  ++cluster.sm_jobs_config.recipe_overrides.run.results_dir="/opt/ml/model" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.datasets.train_data.name="tatqa_train" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.datasets.train_data.file_path="/opt/ml/input/data/training/hp-recipe-validator/datasets/tatqa/zc_train_10k.jsonl" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.datasets.val_data.name="tatqa_val" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.datasets.val_data.file_path="/opt/ml/input/data/training/hp-recipe-validator/datasets/tatqa/zc_dev.jsonl" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.model_config.model_name_or_path="/opt/ml/input/data/training/users/changnit/models/meta-llama/Llama-3.2-1B-Instruct" \
  ++cluster.sm_jobs_config.additional_estimator_kwargs.image_uri="839249767557.dkr.ecr.us-west-2.amazonaws.com/hyperpod-recipes:llmft-v1.0.0" \
  +model.model_type=llm_finetuning_aws \
  container="839249767557.dkr.ecr.us-west-2.amazonaws.com/hyperpod-recipes:llmft-v1.0.0"

Launch command for verl job

HYDRA_FULL_ERROR=1 python main.py \
  cluster=sm_jobs \
  cluster_type=sm_jobs \
  cluster.sm_jobs_config.api_type=model_trainer \
  instance_type="ml.p4de.24xlarge" \
  recipes=fine-tuning/llama/verl-grpo-rlvr-llama-3-dot-2-1b-instruct-lora \
  base_results_dir="$(pwd)/results" \
  ++cluster.sm_jobs_config.inputs.s3=null \
  ++cluster.sm_jobs_config.inputs.file_system.id=fs-079b3411789c02c3f \
  ++cluster.sm_jobs_config.inputs.file_system.type=FSxLustre \
  ++cluster.sm_jobs_config.inputs.file_system.directory_path=/olyr5bev \
  cluster.sm_jobs_config.output_path="s3://hyperpod-recipes-validation-artifacts/validation_run" \
  "cluster.sm_jobs_config.tensorboard_config=''" \
  cluster.sm_jobs_config.wait=False \
  ++cluster.sm_jobs_config.additional_estimator_kwargs.image_uri="920498770698.dkr.ecr.us-west-2.amazonaws.com/hyperpod-recipes:verl-v1.0.0-smtj" \
  ++cluster.sm_jobs_config.additional_estimator_kwargs.use_training_recipe=true \
  ++cluster.sm_jobs_config.additional_estimator_kwargs.max_run=86400 \
  ++cluster.sm_jobs_config.additional_estimator_kwargs.instance_count=1 \
  '++cluster.sm_jobs_config.additional_estimator_kwargs.subnets=["subnet-0193afce112b25931"]' \
  '++cluster.sm_jobs_config.additional_estimator_kwargs.security_group_ids=["sg-0cd8958d241530753"]' \
  ++cluster.sm_jobs_config.recipe_overrides.run.results_dir="/opt/ml/model" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.data.train_files="/opt/ml/input/data/training/hp-recipe-validator/datasets/gsm8k/train.parquet" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.data.val_files="/opt/ml/input/data/training/hp-recipe-validator/datasets/gsm8k/test.parquet" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.actor_rollout_ref.model.path="/opt/ml/input/data/training/users/changnit/models/meta-llama/Llama-3.2-1B-Instruct" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.critic.model.path="/opt/ml/input/data/training/users/changnit/models/deepseek-ai/deepseek-llm-7b-chat" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.critic.model.tokenizer_path="/opt/ml/input/data/training/users/changnit/models/meta-llama/Llama-3.2-1B-Instruct" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.reward_model.model.path="/opt/ml/input/data/training/users/changnit/models/sfairX/FsfairX-LLaMA3-RM-v0.1" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.reward_model.model.input_tokenizer="/opt/ml/input/data/training/users/changnit/models/meta-llama/Llama-3.2-1B-Instruct" \
  ++cluster.sm_jobs_config.recipe_overrides.training_config.custom_reward_function.lambda_arn="" \
  container="920498770698.dkr.ecr.us-west-2.amazonaws.com/hyperpod-recipes:verl-v1.0.0-smtj"

Monitor the job on smtj

@Harsh270519 Harsh270519 requested a review from a team as a code owner January 20, 2026 18:47
@Harsh270519 Harsh270519 requested a review from jam-jee January 20, 2026 18:47
@rsareddy0329 rsareddy0329 merged commit 2e37c82 into aws:master-v2 Feb 3, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.