Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ilab cuda containerfile throws ImportError: libcudnn.so.9: cannot open shared object file: No such file or directory error #2209

Copy link
Copy link
@pacificera

Description

@pacificera
Issue body actions

Describe the bug
Missing cuda library after succesful container build

To Reproduce
I built the containerfile with podman under fedora with success. I can enter the containerfile with an interactive terminal and issue nvidia-smi successfully. When I issue the ilab command, the following error is produced:
``
❯ podman run -it --rm --security-opt=label=disable --device=nvidia.com/gpu=all 3b098c5ec284 /bin/bash

==========
== CUDA ==

CUDA Version 12.4.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

[root@4245cf88a6e9 instructlab]# nvidia-smi
Thu Sep 5 19:41:52 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2070 ... On | 00000000:09:00.0 On | N/A |
| 0% 54C P0 52W / 260W | 1401MiB / 8192MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
[root@4245cf88a6e9 instructlab]# ilab
Traceback (most recent call last):
File "/usr/local/bin/ilab", line 5, in
from instructlab.lab import ilab
File "/usr/local/lib/python3.11/site-packages/instructlab/lab.py", line 20, in
from .model import model as model_group
File "/usr/local/lib/python3.11/site-packages/instructlab/model/model.py", line 13, in
from .train import train
File "/usr/local/lib/python3.11/site-packages/instructlab/model/train.py", line 11, in
import torch
File "/usr/local/lib64/python3.11/site-packages/torch/init.py", line 290, in
from torch._C import * # noqa: F403
^^^^^^^^^^^^^^^^^^^^^^
ImportError: libcudnn.so.9: cannot open shared object file: No such file or directory
[root@4245cf88a6e9 instructlab]#

Expected behavior
ilab should run without error

Device Info (please complete the following information):
I built with the container file with id: 945fd51
https://github.com/instructlab/instructlab/commits/main/containers/cuda/Containerfile

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingSomething isn't workingstale

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.