Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings
View reacher-z's full-sized avatar

Highlights

  • Pro

Block or report reacher-z

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
reacher-z/README.md

Hi, I'm Yuxuan Zhang

PhD student at Vector Institute & University of British Columbia · Research on AI Agents, LLM, RL

Website Twitter Google Scholar


  Top Projects

ClawBench — Can AI Agents Complete Everyday Online Tasks?

153 tasks · 144 live websites · 8 categories · Best model: 33.3%

Paper · Dashboard · Dataset · PyPI

VidGround — Watch Before You Answer

Visually grounded post-training for video LLMs.

Paper · HF Paper


  GitHub Activity

GitHub Contribution Graph

GitHub Stats


  News


  Contact

 yuxuan.zhang(at)ubc.ca      Google Scholar      GitHub      Twitter      Website

Pinned Loading

  1. ClawBench ClawBench Public

    Open-source benchmark for browser AI agents on 153 everyday online tasks across 144 live websites. 5-layer recording + DOM-match + LLM judge. Top score 33.3%.

    Python 199 12

  2. vidground vidground Public

    Watch Before You Answer: Learning from Visually Grounded Post-Training (arXiv 2604.05117)

    Python 3

  3. HarnessBench HarnessBench Public

    Python 6 1

Morty Proxy This is a proxified and sanitized view of the page, visit original site.