-
Notifications
You must be signed in to change notification settings - Fork 34
Description
Lately I work mainly in SageMaker Studio, and I'd really like to be able to debug / interact with a running job using the same UI.
Solution idea
Create a custom Studio kernel image using an IPython extension and/or custom magic through which users can connect to a running SSH Helper job and run notebook cells on that instead of the Studio app.
The user experience would be something like using EMR clusters in Studio:
- One-time up-front job to build/register the custom "SageMakerSSH" image (maybe?)
- User launches their SSH-helper-enabled job from "normal" notebook A and fetches the managed instance ID
mi-1234567890abcdef0
- User opens / switches to a notebook with SageMakerSSH kernel and runs something like
%load_ext sagemaker_ssh_helper.notebook
to initialize the IPython extension%sagemaker_ssh connect mi-1234567890abcdef0
to connect to the instance- From here on out, cells should run on the connected instance rather than the local Studio app unless a
%%local
cell magic is used: Same as how SageMaker Studio SparkMagic kernel works - Probably some kind of
%sagemaker_ssh disconnect
command would also be useful
Since the sagemaker_ssh_helper
library is pip-installable, it might even be possible to get this working with default (e.g. Data Science 3.0
) kernels? I'm not sure - assume it depends how much hacking is possible during IPython extension load vs what needs setting up in advance.
Why this route
To my knowledge, JupyterLab is a bit more fragmented in support for remote kernels than IDEs like VSCode/PyCharm/etc. It seems like there are ways to set up SSH kernels, but it's also a tricky topic to navigate because so many pages online are talking about "accessing your remotely-running Jupyter server" instead. Investigating the Jupyter standard kernel spec paths, I see /opt/conda/envs/studio/share/jupyter/kernels
exists but contains only a single python3
kernel which doesn't appear in Studio UI. It looks like there's a custom sagemaker_nb2kg
Python library that manages kernels, but no obvious integration points there for alternative kernel sources besides the studio "Apps" system - and sufficiently internal/complex that patching it seems like a bad idea.
...So it looks like directly registering the remote instance as a kernel in JupyterLab would be a non-starter.
If the magic-based approach works, it might also be possible to use with other existing kernel images (as mentioned above) and even inline in the same notebook after a training job is kicked off. Hopefully it would also enable toggling over to a new job/instance without having to run CLI commands to change the installed Jupyter kernels.