Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Conversation

@vaporhug
Copy link

No description provided.

import re
from typing import Any, Dict, List
from datasets import load_dataset
from vlmeval import *
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid multiple from xxx import *

return assistant_response


judge = VLM('o4-mini')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use judge_kwargs and build_judge to construct judge model.


try:
images = [decode_base64_to_image(b64) for b64 in row['step_images'] ]
llm_judge = judge(images, judge_prompt).strip()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to use track_progress_rich to parellel run api judge.

def evaluate(self, eval_file, **judge_kwargs):
save_dir_last = 'sgi_code_logs'
global save_dir
work_dir = judge_kwargs.get('work_dir','./outputs')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not define work dir in the dataset code. Use eval_file path to get the evaluation work dir.

os.makedirs(os.path.join(tmp_data_dir, "0200"), exist_ok=True)
os.makedirs(os.path.join(tmp_data_dir, "0236"), exist_ok=True)

download_file("https://raw.githubusercontent.com/InternScience/SGI-Bench/main/evaluation/task_3_dry_experiment/data/SGI_DryExperiment_0206/t10k-images-idx3-ubyte.gz", tmp_data_dir+"/0206")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Download the extra resource file to the subdir of LMUDataRoot(), instead of the evaluation work dir.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.