Make ilab able to download safetensors#1033
Make ilab able to download safetensors#1033mergify[bot] merged 1 commit intoinstructlab:maininstructlab/instructlab:mainfrom cdoern:model-agnostic-downloadcdoern/instructlab:model-agnostic-downloadCopy head branch name to clipboard
ilab able to download safetensors#1033Conversation
349e033 to
b7140b1
Compare
b7140b1 to
19a6b3e
Compare
when downloading from HF, check check if the repo has .safetensor files. If it does, and `--filename` is default, download the entire repo and put it into a <repo_name> folder inside of models Signed-off-by: Charlie Doern <cdoern@redhat.com>
19a6b3e to
6af9db5
Compare
russellb
left a comment
There was a problem hiding this comment.
lgtm! I know @alimaredia is testing this, so I requested his review. It should block merging until he approves as well.
| click.echo(f"Downloading model from {repository}@{release} to {model_dir}...") | ||
| if ( | ||
| "HF_TOKEN" not in os.environ | ||
| and repository != "instructlab/merlinite-7b-lab-GGUF" |
There was a problem hiding this comment.
do we need this check? I think it might be best not to hardcode merlinite for such checks - it would also be good practice to always have an HF token exposed and would make the code easier and cleaner to maintain (even if technically you dont need a token for the instructlab org)
just my 0.02
especially considering this would fail if we try to pull the granite model even though it exists in the same instructlab HF org
There was a problem hiding this comment.
I assume this was so at least the default config didn't require it, which is nice.
An even better UX would determine whether the token is required for a given model before then erroring out because it's not set, but that could be a future improvement ...
There was a problem hiding this comment.
right - then for now should we at least only check if "instructlab" is part of the repo string, and we allow the download without token if true? since we would want to at least have uniform behavior for all the models in the instructlab HF org
There was a problem hiding this comment.
How about you use clicks' feature to read values from env vars?
@click.option('--hf-token', envvar='HF_TOKEN')
There was a problem hiding this comment.
merl is the default. This is my way of checking If NO HF TOKEN and THE USER SET REPO
There was a problem hiding this comment.
I was surprised by it asking me for a HF_TOKEN when I did ilab download --repository instructlab/merlinite-7b-lab as that repository does not need one. In my case, I specifically wanted it to download the non-gguf version of this model, but perhaps the logic around HF_TOKEN usage should be a bit broader?
| filename=filename, | ||
| local_dir=model_dir, | ||
| ) | ||
| files = list_repo_files(repo_id=repository, token=os.getenv("HF_TOKEN")) |
There was a problem hiding this comment.
@cdoern why not download the entire repository via snapshot_download() instead of going file by file? https://huggingface.co/docs/huggingface_hub/en/guides/download#download-an-entire-repository
There was a problem hiding this comment.
This looks like a good idea to me +1
There was a problem hiding this comment.
Apologies @alimaredia -- once you posted this review, that made mergify happy that you looked at it. If you had done "request changes" it would have blocked it, just FYI.
I probably should just set a hold label and let you remove it
|
uh oh, seems mergify has gone rogue!!!! I will open another PR to address folks' comments |
when downloading from HF, check check if the repo has .safetensor files. If it does, and
--filenameis default, download the entire repo and put it into a <repo_name> folder inside of modelsSigned-off-by: Charlie Doern cdoern@redhat.com