Jupyter
Jupyter is a web-based interactive development environment, widely used in ML and data science. It enables us to quickly experiment with code, without having to rerun entire scripts from the top when something breaks. It also allows us to visually examine and inspect the data that we use, as well as information about the training process. Read more about it here.
We will use jupyter on the fep
machines and forward the network traffic such
that we can access it from the local browser (but all the computation will be
done on the partitions on fep
!).
Installing Jupyter
Activate the conda environment you want to use, and install jupyter:
$ conda install jupyter
It might take a while, as it will bring with it a lot of dependencies.
Running Jupyter
Before running jupyter, we need to determine the IP address at which the compute
node is accesible from fep
.
Run:
$ ip a
And look for the first bond
interface; it might look like:
17: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6c:fe:54:48:3b:b8 brd ff:ff:ff:ff:ff:ff
inet 172.24.12.2/16 brd 172.24.255.255 scope global bond0
valid_lft forever preferred_lft forever
inet6 fe80::6efe:54ff:fe48:3bb8/64 scope link
valid_lft forever preferred_lft forever
The value we're interested in is right after inet
(172.24.12.2
in this
case).
We need to run jupyter on a compute node with access to GPUs; srun
an
interactive shell there and don't forget to activate the conda environment in
which you installed jupyter.
$ jupyter notebook --ip 0.0.0.0 --port <port>
Remember that there are other people using these machines at the same time, so we need to avoid conflicting ports. You can use the following bash function that generates a random, unused port (adapted from here):
randport() {
comm -23 <(seq 10000 65000) \
<(ss -tuan | \
awk '{print $4}' | \
cut -d':' -f2 | \
grep "[0-9]\{1,5\}" | \
sort | \
uniq) \
| shuf | head -n 1
}
So we can run:
$ jupyter notebook --ip 0.0.0.0 --port $(randport)
In the output generated, look for the lines that say something like:
[I 2024-10-22 11:04:27.600 ServerApp] Jupyter Server 2.14.1 is running at:
[I 2024-10-22 11:04:27.600 ServerApp] http://dgxh100-precis-wn02.grid.pub.ro:17872/tree?token=3a538e99683e78c740acaa560ad185fa16d59001bed8e17b
[I 2024-10-22 11:04:27.601 ServerApp] http://127.0.0.1:17872/tree?token=3a538e99683e78c740acaa560ad185fa16d59001bed8e17b
The relevant information here is the actual port and the hexadecimal token.
Accessing Jupyter
Now the jupyter notebook is running remotely on a compute node and we want to access it from our local browser.
We will use ssh
to securely forward traffic from a port on the compute node,
using fep
as relay.
$ ssh -L 8080:<address>:<port> <fep>
ssh
will then leave you logged in to fep
, but we can ignore this shell for
the following steps; just keep it open: once the session is closed, the port
forwarding stops.
Where <address>
is the IP address of the bond
interface of the compute node
and <port>
is the randomly generated port on which jupyter is served.
<fep>
represents your login; either <username>@fep.grid.pub.ro
or
the name of a ~/.ssh/config
entry that you previously configured.
For the examples we provided in this tutorial, the concrete command would look like:
$ ssh -L 8080:172.24.12.2:17872 mihai.dumitru2201@fep.grid.pub.ro
Now go to your local browser and access:
http://localhost:8080/tree?token=<hexadecimal token>
In the example from this tutorial, the correct URL would be:
http://localhost:8080/tree?token=3a538e99683e78c740acaa560ad185fa16d59001bed8e17b
In short, the effect of our ssh
forwarding is that now the jupyter notebook is
available as if served locally on port 8080
.
The token is there for authentication purposes, without it you will not be able
to access jupyter.