Tags: allenai/open-instruct
Tags
Add model merging scripts for Beaker (#1459) * Add model merging scripts for Beaker Two approaches for merging HuggingFace models on Beaker: - mergekit_merge.sh: Uses mergekit for standard architectures - direct_merge.sh: Direct safetensors averaging for all architectures, including hybrid models that mergekit doesn't support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add changelog entry for merge scripts Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Validate weights length and zero-sum in direct_merge.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Print merge config in Beaker logs, add test run links to README Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix direct_merge.py: update docstring path, remove unused import, use +=, allow tokenizer overwrite Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix direct_merge.sh: rename /tmp/linear_merge.py to /tmp/direct_merge.py Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Address PR review: refactor merge_models, add tests, document dual approach - Move scripts/merge/direct_merge.py to open_instruct/merge_models.py - Apply all code review fixes: Path types, logger, math.isclose, model_weights rename, helper functions, module-level parser - Add 14 unit tests (synthetic + pythia-14m integration) - Update shell scripts to reference new module path - Fix naming inconsistencies (linear_merge -> direct_merge) - Document why both mergekit and direct merge are needed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Move launch_merges.sh example into README Address PR feedback: launch_merges.sh is an example script, not something that should be checked in. Moved its content into a "Batch launching" section in the README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Add OLMo-core based DPO training module (#1391) * Add OLMo-core based DPO training module - Add dpo.py: New DPO training module using OLMo-core's TrainModule with HSDP support - Add build_reference_logprobs_cache_olmo: Generic reference logprobs caching for OLMo-core - Add compute_loss_olmo: Wrapper for DPO loss computation with ExperimentConfig - Add concatenated_forward_olmo and separate_forward_olmo: OLMo-core forward functions - Update mason.py: Add dpo.py to OPEN_INSTRUCT_COMMANDS - Update debug scripts to use torchrun with OLMo-core models Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Cleaned up PR. * Add OLMo-core train modules for DPO training Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix SpeedMonitorCallback parameter name Change device_peak_flops_per_second to device_peak_flops to match the OLMo-core API. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix CheckpointerCallback save_interval validation Set default checkpointing_steps to 500 when not specified, since the OLMo-core API requires save_interval >= 1. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Move checkpointing_steps default value to config class Move the default value for checkpointing_steps (500) from dpo.py to the CheckpointConfig dataclass in dpo_utils.py. This centralizes the default and removes the conditional logic in the callback setup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove duplicate checkpointing_steps field from ExperimentConfig The checkpointing_steps field was defined in both CheckpointConfig (the parent class) and ExperimentConfig. The duplicate field in ExperimentConfig had default=None, which overrode the parent class's default of 500, causing a TypeError when int() was called on None in dpo.py. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add Saturn cluster to medium_dpo.sh script Add Saturn as an alternative cluster to help with multi-node scheduling reliability. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * updated changelog * Remove explicit torchrun multi-node args from DPO scripts OLMo-core's prepare_training_environment() handles multi-node setup internally using Beaker's environment variables. The explicit --nnodes, --standalone, and --rdzv_backend=c10d arguments interfere with this and cause RendezvousTimeoutError on multi-node runs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fixed linter errors * Refactor DPO OLMo-core: add parallelism support, fix HSDP order - Move OLMO_MODEL_CONFIG_MAP and get_transformer_config to olmo_core_utils.py - Add tensor_parallel_degree, context_parallel_degree, pipeline_parallel_degree - Replace _apply_hsdp with _apply_parallelism supporting TP/CP/PP - Fix critical bug: apply HSDP before computing reference logprobs cache - Add LoRA error check (not supported with OLMo-core) - Remove unreachable make_disable_adapter_context function - Reorganize DPO scripts to scripts/train/debug/dpo/ - Add local.sh for testing without Beaker Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix race condition in reference logprobs cache directory creation Only the main process should create the cache directory and test write permissions. Other ranks now wait at a barrier until this is complete. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix multi-node DPO post-training barrier failures Two barrier issues caused "Connection closed by peer" gloo errors during post-training cleanup: 1. Unconditional barrier at start of _handle_post_training called even when distributed training wasn't active 2. Asymmetric barrier inside beaker save conditional - only main_process reached this code due to is_main_process check, causing non-main processes to hang at the barrier while main does file I/O Fix: Gate the initial barrier on is_distributed() and remove the asymmetric inner barrier entirely since only main_process enters that code block anyway. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove redundant compute_loss_olmo wrapper function ExperimentConfig inherits from DPOConfig, so compute_loss() accepts ExperimentConfig directly. The wrapper was unnecessarily creating a new DPOConfig object when one wasn't needed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * run urgent tests * Fix case-insensitive beaker secret lookup Beaker stores secret names case-insensitively, but Python's `in` operator is case-sensitive. This caused lookups for `finbarrt_WANDB_API_KEY` to fail when the secret was stored as `FINBARRT_WANDB_API_KEY`. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Updated mason.py * Add uv run prefix to local DPO script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Save DPO models in HuggingFace format for evals DPO training was saving models in olmo-core format, but eval jobs and push_folder_to_hub expect HuggingFace format. Use olmo-core's save_hf_model() to convert the trained model to HF format in output_dir/hf_model/ before launching evals or pushing to hub. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix WEKA_CLUSTERS import in submit_eval_jobs.py WEKA_CLUSTERS is defined in launch_utils, not utils. Import launch_utils and use launch_utils.WEKA_CLUSTERS instead of utils.WEKA_CLUSTERS. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update GRPO single GPU script to use DPO-trained model Use the DPO-trained OLMo model from allenai/open_instruct_dev with revision dpo_olmo_core_debug_test instead of Qwen/Qwen3-1.7B. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --add_bos flag for OLMo model in GRPO script OLMo models require the --add_bos flag to be set. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Copy original HF config when saving DPO model The save_hf_model() function creates an incorrect config.json with wrong values for num_hidden_layers, eos_token_id, etc. Copy the original model's config.json to preserve the correct values. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Use Weka path directly for DPO model in GRPO test The HuggingFace model config was still incorrect, so use the Weka path directly where the model was saved. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add logging for config.json save in DPO Helps debug issues with model config not being saved correctly. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update GRPO script to use new DPO model path Use the latest DPO model that was saved with correct config.json. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix DPO HF model saving to use correct layer count The save_hf_model function from olmo-core was creating extra layers in the output. Instead, use convert_state_to_hf with the original HuggingFace config and save using transformers' native save_pretrained. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix OLMo-2-0425-1B config mapping to use correct layer count The olmo2_1B config has 18 layers but the actual HuggingFace model has 16 layers. Use olmo2_1B_v2 which has the correct 16 layers. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix HF model loading to use from_config instead of from_pretrained Cannot pass state_dict together with a model name. Use from_config to create the model, then load_state_dict to load the weights. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Revert to using save_hf_model for DPO model saving The convert_state_to_hf approach doesn't work with DTensors from distributed training. Use save_hf_model which handles DTensors properly. The config mapping has been fixed so save_hf_model should now produce correct layer counts. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update GRPO script to use DPO model with correct 16 layers Use the model saved from the DPO run with fixed config mapping. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Copy original HF config after save_hf_model The save_hf_model function creates an incomplete config.json that is missing fields like max_position_embeddings. Copy the original model's config to ensure vLLM can load the model. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update GRPO script to use DPO model with complete config Use the model from the DPO run with copied original config that includes max_position_embeddings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add OLMo3-7B DPO script using OLMo-core trainer New script that uses dpo.py (OLMo-core + FSDP) instead of dpo_tune_cache.py (Accelerate + DeepSpeed) for DPO training. Configured for 2 nodes with 8k sequence length. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add documentation for adding OLMo-core models Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --no_auto_dataset_cache to DPO script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix multi-node torchrun configuration for DPO Add missing torchrun multi-node parameters: - --nnodes to specify total number of nodes - --node_rank for each node's rank - --master_addr for coordinator address - --master_port for coordinator port These use Beaker environment variables that get substituted at runtime. Without these, each node ran independently without distributed communication. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix nnodes to use hardcoded value instead of BEAKER_NUM_REPLICAS BEAKER_NUM_REPLICAS is not a valid Beaker environment variable. Use hardcoded value of 2 to match --num_nodes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add torchrun multi-node parameters to debug DPO multi_node.sh Same fix as 7b_instruct_dpo_olmo_core.sh - add nnodes, node_rank, master_addr, and master_port for proper multi-node coordination. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add OLMO_SHARED_FS=1 env var for multi-node DPO scripts OLMo-core's checkpointing code requires this env var to be set when using a shared filesystem (like Weka) to avoid unnecessary distributed coordination for filesystem operations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add comment about cache cleanup for corrupted dataset cache Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Remove cache cleanup comment Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Support separate model config and weights for OLMo-core DPO Allow users to specify a config_name separately from model_name_or_path, enabling local model paths to work with OLMo-core DPO training. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix save_hf_model for FSDP-wrapped models in DPO Add export_to_hf() function that builds an unwrapped model from config and loads the FSDP state dict before saving. This avoids the type check failure in olmo-core's get_hf_config() for FSDP-wrapped models. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix DTensor to Tensor conversion in export_to_hf Convert DTensors from FSDP state dict to regular CPU tensors before loading into the unwrapped model. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix FSDP state_dict collective operation for multi-node export All ranks must participate in model.state_dict() as it's a collective operation for FSDP models. Only rank 0 now saves to disk. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add detailed logging to export_to_hf for debugging Log entry/exit for all ranks and each step in the export process. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix DTensor full_tensor() collective operation in export The full_tensor() call on DTensors is a collective operation that requires all ranks to participate. Move the conversion outside the is_main_process check so all ranks call full_tensor(). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Clean up debug logging in export_to_hf Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix missing indices in DPO reference logprobs caching Add drop_last parameter to HFDataLoader. When drop_last=False, pad the remainder with repeated indices to fill a complete batch, ensuring all dataset indices are processed. Use drop_last=False for the cache-building dataloader to prevent -inf values in the reference logprobs cache. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add MFU/memory/token metrics to cache building + 3x cache batch size Forward-only cache pass doesn't store activations, so we can use 3x the training batch size. Also display avg_tok/ex, MFU%, and mem_GB in the tqdm progress bar during cache building. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Add --cache_logprobs_only flag for DPO cache forward-pass benchmarking Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Update DPO cache benchmark to match production OLMo3-7B config Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Now, we avoid the torch warning * 6x cache batch size + mem% in DPO cache tqdm Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Reduce cache batch multiplier to 4x (6x OOMed) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Try unsharded cache build, fall back to FSDP on OOM The DPO reference logprobs cache is forward-only (no backward pass), so the full unsharded model may fit in GPU memory and avoids allgather communication overhead. If it OOMs, we catch the error, clear the CUDA cache, apply FSDP, and retry. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix data loader tests that used single_example_collator with batch_size > 1 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix attn_backend auto-detection: check flash_attn_3 availability The auto-detection was selecting flash_3 for H100 GPUs without checking if the package is actually installed, causing RuntimeError on startup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * added export to HF function * Added script to convert olmo core to HF format. * Add example usage to olmo-core to HF conversion script Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * Fix code review issues in convert_olmo_core_to_hf.py - Use logger instead of print for output - Remove unused model.load_state_dict() call Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>