Skip to main content

Jupyter Lab/Notebook

Table of Contents


Introduction

Background

Project Jupyter is a open source software stack that supports interactive data science and scientific computing across a wide array of programming languages (>130 supported kernels). The primary applications within Jupyter are:

  1. JupyterHub: Jupyter’s multi-user server. This application spawns, manages, and proxies multiple instances of the single-user JupyterLab server.

  2. JupyterLab: Jupyter’s next-generation notebook interface, which includes: Jupyter notebooks, text editor, terminal, file browser (with upload/download capacity), data viewer, markdown, context help, and external extensions.

screenshot of jupyterlab software

Why Jupyter

Jupyter is popular among data scientists and researchers (Perkel, 2018) because it offers:

For more details about Jupyter and why you may want to use it for computational research see: Why Jupyter


Launching JupyterLab

There are multiple approaches for accessing the Jupyter stack on Ceres.

The simplest and most succinct method to launch JupyterLab is thru the JupyterHub interface. To access, you will need functioning SCINet credentials. To setup a SCINet account, see the quickstart guide. Below are the instructions for JupyterHub.

  1. Go To: https://jupyterhub.scinet.usda.gov/
  2. Log into JupyterHub (SCINet credentials)
    • Username: SCINet username
    • Verification Code: 6 digit time-sensitive code
    • Password: SCINet password
  3. Spawning a JupyterLab Instance

    The Spawning page includes a comprehensive set of options for customizing JupyterLab and the compute environment. There are two ways to spawn JupyterLab, with the standard environment (default) or with a user defined container (optional).

    Standard Options

    • Node Type (Required): Which partition (Ceres partitions) to spawn JupyterLab.
    • Number of Cores (Required): How many cores to allocate (must be an even number).
    • Job Duration (Required): How long should the Slurm (Ceres resource allocation software) allocate to this task.
    • Slurm Sbatch Args (Optional): Additional options for Slurm (see sbatch options). An example may be –mem-per-cpu=6GB.
    • Working Directory (Optional): The directory to launch JupyterLab. An example may be /lustre/project/name_of_project, defaults to your $HOME directory.

    Container Options

    • Full Path to the Container (Optional): If you wish to launch JupyterLab with a container, specify the Ceres path or Hub URL to the container.
    • Container Exec Args (Optional): [Additional options] for executing the container (see the singularity exec options. An example may be –bind /lustre/project/name_of_project.
  4. Terminating JupyterLab

    To end the JupyterLab instance go to: File –> Hub Control Panel –> Stop Server

Below is a video (COMING SOON) showing the above process.


Environments and Software

Default Environment

The default environment includes:

Bring Your Own Environment

If you have an environment (e.g. a conda environment) in your $HOME directory (e.g. ~/.conda/envs/my_env) with a Jupyter Kernel installed (i.e. IPyKernel, IRKernel, IJulia, idl_kernel, etc…), JupyterLab will detect this environmnet as a seperate kernel (assuming it is not the base environment). For instance, a conda environment named my_env with the IPyKernel will appear as Python [conda env:myenv] in the list of optional kernels in JupyterLab. The one exception to this is the base environment, which already exists in the defualt Jupyter environment, and will not be loaded from your home directory.

Use Ceres Maintained Software

The default environment includes an extension (located on the left vertical section of JupyterLab) to load Ceres software into the current environment. This is the software visible with the module avail command.

Containerized Environment

JupyterHub will spawn an instance of JupyterLab using a singularity container (see the container options above). The container selected needs to have JupyterLab installed. Users can specify a container in the Container Path section on the Spawner Options page. There are several ways to access containers on Ceres:


Best Practices

Resource Conservation

Reproducible Research

A detailed tutorial about conducting reproducible research can be found at: Coming Soon!

Tutorials and Packages for Parallel Computing

Developing code/scripts that utilize resources of a cluster can be challenging. Below are some software packages that may assist in parallelizing computations as well as links to some Ceres specific examples.

  1. Python - Dask, Ipyparallel, Ray, Joblib
  2. R - rslurm, Parallel, doParallel, Snow

Known Issues