-
Notifications
You must be signed in to change notification settings - Fork 2.3k
[rollout] feat: add agent loop #2124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ccclyu @SwordFaith Could you help review this design. Basically, it converts recursive llm interactions to loop-based design to make writing complex agent loop more easily. The original idea is from @zw0610 |
@casper-hansen Could you please also take a look and give some comments? Thanks. |
@vermouth1992 This is a compromised version of what we've discussed. I would still suggest move the agent loop of out the AsyncLLMServer, and manipulate it directly on driver process. |
It runs on another ray actor. If we remove the ray actor, it directly runs on the driver process |
My concern lays more on the flexibility of debugging and customizing agent loop. Moreover, if we consider agent loops that consumes much more local resources, transforming them into distributed ray actors will release the resource pressure on either the ray actor you mentioned under the this implementation or driver process. |
My feedback below.
I hope the feedback is useful. Point 3 is a general challenge that I think can be overcome in a future PR (scope is too big), so I would suggest adding it to the roadmap. |
Thanks for your feedback! For number 2, I guess we can provide a registration mechanism for agent loop so that you can define by your self outside verl codebase? The main motivation of loop-based agent is for easy implementation when the agent interacts with an environment (e.g., pokemon game). Just wonder do you still prefer recursive style or loop style? For number 3, I agree that it requires substantial optimization in order to make it fast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking forward to discuss further.
verl/agent/agent_loop.py
Outdated
self.server_addresses = server_addresses | ||
|
||
# Least requests load balancing | ||
self.weighted_addresses = [[0, address] for address in server_addresses] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
least request load balancing may also results to shard equal data size to each dp group if without rate limit or running requests aware back press mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need a more server pressure aware load balancing strategy, @chenhaiq is investigating it.
verl/agent/agent_loop.py
Outdated
heapq.heapify(self.weighted_addresses) | ||
|
||
# LRU cache to map request_id to address | ||
self.request_id_to_address = LRUCache(maxsize=max_cache_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it possible to use radix-aware or prefix-aware routing hybrid with least requests and max running reqs aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With collecting server performance metrics(e.g current requests, gpu cache usage, cpu cache usage, etc), we can implement better routing strategy.
verl/agent/agent_loop.py
Outdated
|
||
return await self._chat_completions(address, request_id, messages=messages, tools=tools, **sampling_params, **kwargs) | ||
|
||
async def _chat_completions(self, address: str, request_id: str, **chat_complete_request) -> ChatCompletion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we finally choose to keep http server routing rather than ray based ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you observe a performance diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was suffered with concurrency when with sgl-router, and make Zilin Zhu switch to http2 in several weeks before. But may be ok because we only use between each server.
Another consideration is resource affinity inside message list e.g. tool create resource like image in each turn.
Using ray will help on auto-affinity scheduling, using http will require user equip with s3 like oss service and cause many network traffic during multi-turn generation. I guess support ray-based routing can improve user experience with less infrastructure requirements.
Using http-based routing can more easily porting with existing open source openai compatible server impl, and keep verl simplier, which is also an advantage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we finally choose to keep http server routing rather than ray based ?
The message protocol chat_complete_request
between client and server should be OpenAI compatible, but the transport protocol can either be http or ray remote call. So there will be 2 _chat_completions implementation, the second one is in develop
- _chat_completions_http
- _chat_completions_ray
num_turns: int | ||
|
||
|
||
class AgentLoopBase(ABC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least we need abort here. I wonder the main difference between agent loop and async rollout request is a normal class or a data class work with rollout or server+tool registry, why we need yet another interaction with env impl here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a fundamental difference between the two programming abstractions. In real world agent applications, rollout is just a server. The agent receives a prompt and starts to query LLMs for instructions in an agent loop. The tool calling is implemented inside the agent loop because there is no way to implement in the server (e.g., OpenAI models). The agent loop abstraction makes existing agent system easily pluggable, and not just restricted to tools (e.g., agentless or predefined workflows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For predefined workflow, in my opinion, we should constraint our discussion about shared context / seperate context, hybrid model / single model. Of course, current loop abstraction with http server can make it easier to port existing workflow, but using messages list rather than dict of message list, seems can't support seperate context case as well. Using current AsyncLLMServerManager seems still suffer to add yet another model which not include in current verl training. In #1630 we can support user train single node in a hybrid workflow, no matter it depends on which model for other node. For e2e agent shared model and shared context, current impl works fine. A key challenge is how to train seperate context and single model workflow in verl. Which If there is an agentic loop, I want it to accomplish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me take a close look at
return DataProto(batch=batch, non_tensor_batch={"__num_turns__": num_turns}) | ||
|
||
|
||
class AgentLoopManager: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be better to consider this as replay buffer or rollout manager, which is the coordinator for partial rollout.
Yes, allow users to define a custom agent loop easily.
Loop style is preferred. It’s easier to implement your own loop than it is to implement with recursion. Also, looking at the code, loop style is easier to transfer to inference after training. |
@casper-hansen Thanks for you feedback! For point 2, I'm adding AgentLoop registry, allowing user to register their custom AgentLoop. |
96f1f35
to
3891051
Compare
@wuxibin89 Two minor issues to consider:
|
|
I see now, great update on this! The removal of tokenization is reassuring and should be compatible with all models now.
After I train, I want to move to an inference system where I can see the intermediate messages from the trained agent as this can take 5-10 minutes. This loop makes it easier to create an API in front of the trained model. |
30a3404
to
f1c6a96
Compare
f1c6a96
to
9b7ea7a
Compare
@@ -0,0 +1,50 @@ | ||
# Copyright 2024 Bytedance Ltd. and/or its affiliates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename this file to single_turn_agent_loop? single_agent_loop
is confusing.
@wuxibin89 I see this has been merged now. Will you followup with a PR on extensibility so we can use our CustomAgentLoop by registering it through the config? |
@wuxibin89 Does this change mean we can no longer use a chat scheduler or completion_callback? That was quite helpful for our use case. Is there any way to restore or support this functionality again? |
### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Add AgentLoopBase and AgentLoopManager for agentic rollout. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design New Components: - **AgentLoopBase**: abstract class represents the loop of a single prompt's rollout, in the loop, agent may chat with OpenAI compatible LLM server and interact with various environments. - **AsyncLLMServerManager**: send chat completion requests to multiple LLM servers, providing load balance and sticky session. - **AgentLoopManager**: get a batch of prompts from dataloader and split to multiple AgentLoopWorker - **AgentLoopWorker**: for each prompt, create a AgentLoopBase instance, run loop task. <img width="885" alt="image" src="https://github.com/user-attachments/assets/1f949719-c000-4b94-9ee2-c8a8ff71b109" /> ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Add AgentLoopBase and AgentLoopManager for agentic rollout. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design New Components: - **AgentLoopBase**: abstract class represents the loop of a single prompt's rollout, in the loop, agent may chat with OpenAI compatible LLM server and interact with various environments. - **AsyncLLMServerManager**: send chat completion requests to multiple LLM servers, providing load balance and sticky session. - **AgentLoopManager**: get a batch of prompts from dataloader and split to multiple AgentLoopWorker - **AgentLoopWorker**: for each prompt, create a AgentLoopBase instance, run loop task. <img width="885" alt="image" src="https://github.com/user-attachments/assets/1f949719-c000-4b94-9ee2-c8a8ff71b109" /> ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
### Checklist Before Starting - [ ] Searched for similar PR(s). - [ ] Checked PR Title format - In format of: [modules] type: Title - modules are in `fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data` - type is in `feat, fix, refactor, chore, test` - can involve multiple modules, seperated by `,` or space, like `[megatron, fsdp, doc] feat: xxx` ### What does this PR do? Add AgentLoopBase and AgentLoopManager for agentic rollout. ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluatuion results, etc. ### High-Level Design New Components: - **AgentLoopBase**: abstract class represents the loop of a single prompt's rollout, in the loop, agent may chat with OpenAI compatible LLM server and interact with various environments. - **AsyncLLMServerManager**: send chat completion requests to multiple LLM servers, providing load balance and sticky session. - **AgentLoopManager**: get a batch of prompts from dataloader and split to multiple AgentLoopWorker - **AgentLoopWorker**: for each prompt, create a AgentLoopBase instance, run loop task. <img width="885" alt="image" src="https://github.com/user-attachments/assets/1f949719-c000-4b94-9ee2-c8a8ff71b109" /> ### Specific Changes > List the specific changes. ### API > Demonstrate how the API changes if any. ### Usage Example > Provide usage example(s) for easier usage. ```python # Add code snippet or script demonstrating how to use this ``` ### Checklist Before Submitting - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl?tab=readme-ov-file#contribution-guide). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl?tab=readme-ov-file#code-linting-and-formatting). - [ ] Add `[BREAKING]` to the PR title `description` if it breaks any API. - [ ] Update the documentation about your changes in the [docs](https://github.com/volcengine/verl/tree/main/docs). - [ ] New CI unit test(s) are added to cover the code path. - [ ] Rely on existing unit tests on CI that covers the code path.
Perfect Work! Is there any guide how to use the tool_agent_loop in the Training? Thx~ |
Checklist Before Starting
fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
feat, fix, refactor, chore, test
,
or space, like[megatron, fsdp, doc] feat: xxx
What does this PR do?
Add AgentLoopBase and AgentLoopManager for agentic rollout.
Test
High-Level Design
New Components:
Specific Changes
API
Usage Example
# Add code snippet or script demonstrating how to use this
Checklist Before Submitting
[BREAKING]
to the PR titledescription
if it breaks any API.