Running Jupyter Notebooks on the GPU Cluster

From TAMUQ Research Computing User Documentation Wiki
Jump to navigation Jump to search


Overview

As with other types of computation, by policy users are not allowed to run their jupyter notebooks on the login node (raad2-gfx), but must do so on one of the GPU nodes (gfx1 through gfx4). A jupyter notebook is essentially a web server application that allows the user to interact with it via a standard web browser. Because this application will be running on a GPU node residing on a private network internal to the HPC cluster, certain networking tricks (i.e., "port forwarding") need to be employed to allow a user outside the HPC system to reach the application on the inside. We have tried to make this process a bit friendlier than it normally is by automating it partially.

Before we can get to the point of launching the jupyter notebook though, we first need to install it in our home directory...

Make the Conda Tool Accessible

We can use the conda tool to create a custom environment within our home directory where we will install our own instance of a jupyter notebook. However, before the conda tool can be run, we must source the appropriate file to make it accessible to our bash shell.

source /ddn/sw/xc40/cle7/anaconda/2023.03-1/anaconda3/etc/profile.d/conda.sh

Alternatively, the line above may be appended to your ~/.bashrc file so you do not have to run this command manually every time you need to use the conda tool. This can be accomplished with:

echo "source /ddn/sw/xc40/cle7/anaconda/2023.03-1/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc

After adding this line to the .bashrc file for the first (and only) time, you should log out from your account and then log back in again so that the task of sourcing the conda.sh file is actually performed for your current bash shell.

Install the Jupyter Package

conda create -n myJupyter python=3.11 jupyter

When creating the environment with the command above, we assign it a name with the -n option ("myJupyter_3.8") and we specify a particular version of python we wish to use as a base for this environment with the python=3.8 option. Furthermore, we specify the installation of the jupyter conda package within this base environment.

Launch the Jupyter Notebook on a Raad2 compute Node

The following script may be downloaded to your PC, and saved as jupyter-gfxlauncher.sh.

#!/bin/bash
# Jupyter Launcher version 1.0
# Author: faisal.chaudhry@qatar.tamu.edu
# Group: Research Computing @ TAMUQ

if [ $# -eq 0 ]
  then
      printf "\nPlease provide your raad2-login1 username.\n   Usage: jupyter-launcher.sh <username> \n Example: jupyter-launcher.sh fachaud74\n\n"
      read -p 'Username: ' uservar
      usr=$uservar
  else
      usr=$1
  fi

printf "Connecting to raad2-login1 to get port number for Jupyter Lab \n"

port=`ssh -t -Y $usr@raad2-login1 "echo $((50000 + RANDOM % $UID))"`
port=`echo $port | tr -d '\r'`
printf "Port number fetched: $port\n\n"

printf "In order to launch the Jupyter notebook, you must manually:\n\n"
printf "1) Activate your specific conda environment (in which you installed your instance of jupyter notebook)\n"
printf "   with a command of the form \"conda activate <your_env_name>\". For example:\n"
printf "   conda activate myJupyter\n\n"
printf "2) Start the jupyter notebook server with this specific command:\n"
printf "   jupyter-notebook --no-browser --port=$port --ip='0.0.0.0' \n\n"
printf "3) Start a web browser on your local PC and point it to the URL suggested by the output of the previous command.\n"
printf "   It will look something like this:\n"
printf "   http://127.0.0.1:$port/?token=1e493a3016c55337ac1278a7ab320005749e86ea66d1a19d \n\n"
printf "4) Notebook shutdown instructions:\n"
printf "   a) click "Quit" on the notebook web page to stop the notebook server\n"
printf "   b) type \"conda deactivate\" & hit enter at the command prompt within the compute node terminal\n"
printf "   c) type \"exit\" & hit enter to end the interactive job on the compute node\n"
printf "\n"
printf "We will now launch an interactive job (4 hrs time limit) on one of the Raad2 compute nodes...\n"
printf "(follow steps 1 to 3 outlined above, once a command prompt becomes available)\n\n"

ssh -t -Y -L $port:localhost:$port $usr@raad2-login1 "srun --pty --tunnel=$port:$port --time=04:00:00  --job-name=Jupyter --ntasks=18 --gres=gpu:v100:1  /bin/bash"

In order to run this script from a local terminal within your MobaXterm program, do as follows:

17/10/2022   13:32.59   /home/mobaxterm  ./jupyter-launcher.sh fachaud74
Connecting to raad2-login1 to get port number for Jupyter Lab
Connection to raad2-login1 closed.
Port number fetched: 55751

In order to launch the Jupyter notebook, you must manually:

1) Activate your specific conda environment (in which you installed your instance of jupyter notebook)
   with a command of the form "conda activate <your_env_name>". For example:
   conda activate myJupyter

2) Start the jupyter notebook server with this specific command:
   jupyter-notebook --no-browser --port=55751 --ip='0.0.0.0'

3) Start a web browser on your local PC and point it to the URL suggested by the output of the previous command.
   It will look something like this:
   http://127.0.0.1:55751/?token=1e493a3016c55337ac1278a7ab320005749e86ea66d1a19d

4) Notebook shutdown instructions:
   a) click Quit on the notebook web page to stop the notebook server
   b) type "conda deactivate" & hit enter at the command prompt within the compute node terminal
   c) type "exit" & hit enter to end the interactive job on the compute node

We will now launch an interactive job (4 hrs time limit) on one of the Raad2 GPU nodes...
(follow steps 1 to 3 outlined above, once a command prompt becomes available)

[fachaud74@nid00053 ~]$

If the following output happens to be present among the output seen above:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ECDSA key sent by the remote host is
SHA256:Vhit7kLwb9tE1sexEJ/mW030U1FqDP9Nj/RG8fZrp98.
Please contact your system administrator.
Add correct host key in /ddn/home/fachaud74/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /ddn/home/fachaud74/.ssh/known_hosts:1
Password authentication is disabled to avoid man-in-the-middle attacks.
Keyboard-interactive authentication is disabled to avoid man-in-the-middle attacks.
Port forwarding is disabled to avoid man-in-the-middle attacks.

...then edit your known_hosts file and remove line 1 from it.