Raad2: Python

From Research Computing @ TAMUQ
Jump to navigation Jump to search


Using Python on raad2 is a 03 steps process.

a. Create Virtual Environment (Either with Python VirtualEnv or Anaconda Virtual Environments and install all the requirements.
b. Run interactively on a compute node and test your code.
c. Submit as batch job on the system for longer runs.

Decide on Python version and Packages requirements

Users should identify which version of Python is required for their application and the required packages. Once requirements are identified, users can create Python Virtual Environment using Anaconda Distribution of Python.

Users can use Python on raad2 in following ways;

a. Anaconda distribution of Python
b. Python via Modules (Not recommended)

Anaconda Distribution

Anaconda distribution allows users to create a Virtual Python environment with specific release and also makes easier for users to install custom packages which have multiple libraries dependencies. For example if your project 'X' requires Python 3.6.7 with TensorFlow 1.0, you can create a Conda Python Virtual Environment with required dependencies specific for this project. Similarly for another project 'Y', you can create another Virtual environment with totally different version of Python and TensorFlow version. Both Virtual environment will co-exist on your system and you can activate either of them as per project requirements.

In below example, we will demonstrate how you can create a Conda virtual environment and install required packages. Once the environment is created, you will learn how to use this environment in interactive session or in batch mode while working on raad2.

Note: If you were making use of old instructions to run anaconda3 on raad2, you may need to update your .bashrc. Edit your .bashrc file and remove line "source /lustre/sw/xc40ac/anaconda3/etc/profile.d/conda.sh"

Step01: The first step is to load Anaconda module which will make conda executable available in your environment.

echo "source /lustre/sw/xc40/cle7/anaconda/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc

Step 02: Now create a virtual environment and choose the python version you require. In this example, we are creating environment with Python version 3.6. During this process, Conda installer will install any dependencies which are required to make your Python version work. Requested version of Python and packages will be installed in your home directory.

muarif092@raad2a:~> conda create -n DLProj python=3.6

Step 03: Once the Python environment is created, you have to activate the environment to install new packages. Activating the environment sets user environment variables and make the respective version of Python available to use.

muarif092@raad2a:~> conda activate DLProj
(DLProj) muarif092@raad2a:~>

Step 04: In the terminal, you will notice that (DLProject) is added, this means that you are working inside the Conda python virtual environment. Now we can verify that the requested version of Python is correctly installed.

(DLProj) muarif092@raad2a:~> which python3
/lustre/home/muarif092/.conda/envs/DLProj/bin/python3

(DLProj) muarif092@raad2a:~> python3 --version
Python 3.6.10 :: Anaconda, Inc.
(DLProj) muarif092@raad2a:~>

Step 05: Install any additional Python Packages. Conda installer will list all the packages will be downloaded, upgraded and installed.

(DLProj) muarif092@raad2a:~> conda install tensorflow keras matplotlib

Step 06: Your Virtual Environment is ready and can be used for Testing codes and then Batch submission. You can navigate this page to find out how to submit Test Runs and Batch Jobs. At this point, you can deactivate your Virtual Environment.

(DLProj) muarif092@raad2a:~> conda deactivate

Tip 01: To see the list of Conda environments you have:

muarif092@raad2a:~> conda env list

Tip 02: To delete any Conda environment ("DLProj" in the example below) which is no longer required:

conda remove --name DLProj --all

Python via Modules

In HPC systems, multiple versions of application/packages are maintained via Modules. Users can use specific version of application by loading respective module. Modules ensure that different versions of same application can co-exist on a system.

Available versions

a. Python 2.6.9
b. Python 3.7.9
c. Python 3.6.3
d. Python 3.8.1

Using Python via Modules

To use a specific version of Python, users can issue following;

muarif092@raad2a:~> module load python/363 
muarif092@raad2a:~> python3 --version
Python 3.6.3

Now if you want to use different version of Python, you can issue below;

muarif092@raad2a:~> module unload python/363 
muarif092@raad2a:~> module load python/381
muarif092@raad2a:~> python3 --version
Python 3.8.1

Installing additional Python Packages

Users can install additional Python packages by creating a Python Virtual Environment. Below is an example of creating a Python Virtual Env which requires certain packages.

Step 01: Decide on base version of Python you want to use and load respective module. Here we are using Python 3.6.3

muarif092@raad2a:~> module load python/363 
muarif092@raad2a:~> python3 --version
Python 3.6.3

Step 02: Create Virtual Environment


muarif092@raad2a:~> mkdir myPythonProject

muarif092@raad2a:~> cd myPythonProject/

muarif092@raad2a:~/myPythonProject> virtualenv .

Using base prefix '/lustre/sw/xc40ac/python/363'
New python executable in /lustre/home/muarif092/myPythonProject/bin/python3.6
Also creating executable in /lustre/home/muarif092/myPythonProject/bin/python
Installing setuptools, pip, wheel...done.
muarif092@raad2a:~/myPythonProject> 

Step 03: Activate the Virtual Environment which was created in last step. Notice the change in terminal line which now shows that you are in a Virtual Environment and if you install any Python packages, those will be installed within this environment.

muarif092@raad2a:~/myPythonProject> source bin/activate
(myPythonProject) muarif092@raad2a:~/myPythonProject> 


Step 04: Install any Python Packages requirements.

(myPythonProject) muarif092@raad2a:~/myPythonProject> pip install matplotlib

Note: If your code has a lot of library dependencies and you require multiple additional Python packages, we recommend using Anaconda Distribution of Python.

Test your application via Interactive Job on raad2

When we created Python Virtual Environment, all the work was done on Login node i.e. raad2a/raad2b. The login nodes are meant for development purposes only, where you can build/compile your application. A login node is shared among many users of the HPC system. Users should submit Interactive jobs to get an allocation on a compute node for code testing.

Step 01: Request for an interactive session to a compute node. Interactive session will give you terminal access to a compute node where you can do test runs. Notice the change in node name when you issued sinteractive command. This confirms that now you are on one of the compute node i.e. nid00xx

muarif092@raad2a:~> sinteractive
muarif092@nid00444:~> 

Step 02: Once you have allocation to a compute node, you can then activate the Python Virtual Environment which you created earlier.

muarif092@raad2a:~> sinteractive 
muarif092@nid00388:~> conda activate DLProj
(DLProj) muarif092@nid00388:~> which python3
/lustre/home/muarif092/.conda/envs/DLProj/bin/python3

(DLProj) muarif092@nid00388:~> python3
Python 3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 23:51:54) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()

(DLProj) muarif092@nid00388:~> python3 mycode.py

Step 03: Once you are done testing your code and ready for production runs, you can exit from the current session.

(DLProj) muarif092@nid00388:~> exit
exit
Connection to raad2-login2 closed.
muarif092@raad2a:~> 

Submit a batch job for longer runs

Batch Job allows you to package your application and define all the requirements in a file called as "job file". Then you submit this Job file to system which then runs your simulation. The results of the simulations are either written to file(s) you create in your source code Or slurm output/error files generated in the same directory from where you submitted your job file.

Step 01: Create a directory which will have all the input files.

muarif092@raad2a:~> mkdir myDlRun
muarif092@raad2a:~> cd myDlRun/

Step 02: Create slurm.job file with following contents;

#!/bin/bash
#SBATCH -J Python_job
#SBATCH -p s_debug
#SBATCH --qos sd
#SBATCH --time=08:00:00
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --output=slurm.o%j
#SBATCH --error=slurm.e%j
#SBATCH --hint=nomultithread
#SBATCH --gres=craynetwork:0 # Do not remove this line if you are submitting to s_* queues.

echo "Starting at "`date`
echo "SLURM_JOBID="$SLURM_JOBID
echo "SLURM_JOB_NODELIST"=$SLURM_JOB_NODELIST
echo "SLURM_NNODES"=$SLURM_NNODES
echo "SLURMTMPDIR="$SLURMTMPDIR
echo "working directory = "$SLURM_SUBMIT_DIR

# Source conda.sh to make conda executables available in the environment
source /cm/shared/apps/anaconda3/etc/profile.d/conda.sh

# Activate Conda environment which you created and tested earlier
conda activate DLProject

# Launch computation
srun --ntasks=1 python3 DLcode.py

echo "Simulation Ending.."
echo "Ending at "`date`

Step 03: Submit slurm.job file to raad2

muarif092@raad2a:~/myDlRun> sbatch slurm.job 

Step 04: Monitor your job

You can monitor your job by inspecting output file created in the current directory.