JupyterHub with K8s: Shared /home volume?

ana-v-espinoza · May 29, 2025, 5:25pm

Hello Jovyans,

I’m trying to figure out a way to have /home exist as a share volume mount, and have users’ home directories, i.e. /home/$USER, all be found on this mount.

Here’s my motivation. I want to:

use fewer volumes (provided by the underlying cloud), as all user data will exist on a single volume as opposed to one volume per user
simplify volume backups/snapshots; 1 vs many
allow JHub admins (in my case, classroom instructors) to more easily access student files for grading of assignments, while students cannot peak at each others’ work

Roughly, I see a path forward to get me most of the way there:

Export an NFS volume from a dedicated NFS server Pod
Use the extraVolumeMounts and extraVolumes to mount this shared volume at /home for each single user server
Use the hub.extraConfig to subclass kubespawner and:
- define a function that returns some necessary environment variables as a dictionary*
- set KubeSpawner.environment to what’s returned by this function
Start the single user container as root

* The necessary environment variables are:

NB_USER = <jhub-username>
NB_GROUP = "users" or NB_GROUP = "admin" (for JHub administrators)
NB_UID = <some unique uid>
NB_GID = "100" (users) or NB_GROUP = "200" (admin)

The last two steps are so that this section of start.sh, provided as part of jupyter-docker-stacks, will run appropriately and start the jupyterlab server as the appropriate user, uid, group, gid, and with home directory (in /home/$NB_USER).

I’m sure there’s some unintentional hand-waving in the steps I’ve described.

The part that I’m having trouble figuring out is how to give users <some unique uid>, which will allow us to then set home directory ownership to <uid>:200 (i.e. <user>:admin) and permission to 770 (rwx by <user> and admins, inaccessible by everybody else).

My gut tells me that I should store user uid’s in the JHub data base (specifically in each user’s Spawner state). My function that returns the KubeSpawner.environment would then have to either read this value from the data base or, if it doesn’t exist, create the next available uid. I don’t know how to do this!

After reading through the docs and some the source for JupyterHub and Kubespawner, I’ve decided that I should reach out for help since I’m having trouble understanding how data gets to/from the database and the spawner instances.

To be explicit in what my questions are:

First of all, based on my motivations, is having a shared /home directory an appropriate solution?
Is this an appropriate implementation?
If yes to the above, how can I interact with the JHub database to create/read unique user uid’s?

Thanks!

ana v e

mahendrapaipuri · May 30, 2025, 6:53am

Which authenticator are you using and which is the identity provider? Ideally this sort of user related information should come from IdP. I am assuming your IdP does not store/provide uid for the users. In this case, creating uid on the authenticator and passing it to spawner via authstate can be solution. You can save the created uid info in a simple text file or a SQLite DB in the post_auth_hook of Authenticator.

ana-v-espinoza · May 30, 2025, 3:37pm

Mahendra,

Thanks for the response!

We’re using GitHub OAuth.

Indeed, your suggestion seems simpler than my approach. I’ll be looking further into this and will share out when I have something of substance.

Best,

ave

tony · June 5, 2025, 1:37am

Hi Ava,

I do something similar as to what I think you are asking. First I create a PV then a PVC in the namespace, this is my PVC YAML

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: fcba-su25-pvc
spec:
  storageClassName: fcba-su25-sc
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

Then in the jupyterhub config I have:

  storage:
    type: static
    static:
      pvcName: 'fcba-su25-pvc'
      subPath: '{username}'

All users files are under one folder in my NFS mounted storage on the cluster. For example:

/home/fcba-su25/user1
/home/fcba-su25/user2
/home/fcba-su25/user3

The folders are automatically created when users login. I have used this method with github auth and google oauth2.

This may or not be the correct way to do it, but it works well for me.

Tony

ana-v-espinoza · June 5, 2025, 4:55pm

Hey Tony,

Thanks for the feedback. This seems like a handy method and I’ll give it a try when I have a chance. I assume, however, that each of these directories are all owned by the jovyan user, as is the nature of the JupyterLab containers. Would this not give users rwx permissions on each others files? The motivation behind giving users their own uid is to prevent this.

Best,

ana v e

tony · June 5, 2025, 5:22pm

Hi Ana,

When I started this method, I had Yuvi (Zero-to-Jupyterhub developer) look at it and he did not see any security issues. Check this out if I open a terminal:

jovyan@jupyter-montereytony-40gmail-2ecom:~$ pwd
/home/jovyan
jovyan@jupyter-montereytony-40gmail-2ecom:~$ cd ..
jovyan@jupyter-montereytony-40gmail-2ecom:/home$ ls
jovyan
jovyan@jupyter-montereytony-40gmail-2ecom:/home$ cd /
jovyan@jupyter-montereytony-40gmail-2ecom:/$ ls
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var
jovyan@jupyter-montereytony-40gmail-2ecom:/$ cd
jovyan@jupyter-montereytony-40gmail-2ecom:~$ pwd
/home/jovyan
jovyan@jupyter-montereytony-40gmail-2ecom:~$

I do not see anyone else’s folder.

Tony

ana-v-espinoza · June 5, 2025, 7:02pm

Tony,

Excellent! I think I understand what is going on here now. The subPath of the PVC is being mounted and not the full volume (with every user’s data) itself?

Thanks again for the input, I’ll be looking more into this to make sure I see the full picture.

ana v e

tony · June 5, 2025, 7:36pm

Hey Ana,

I am not sure of the terminology, best to wait for experts to chime in. But for sure, the users can only see their own data.

Tony

Paul2708 · June 6, 2025, 12:15pm

You are right. We use the same mechanism to manage submissions: The lecturer receives the “full volume” (i.e., a directory with sub-directories for each student), and the students receive the volume with a subPath (i.e., their username).

ana-v-espinoza · June 6, 2025, 9:41pm

Thanks for the help everybody!

I’ve successfully been able to get a similar setup working in our cluster. Next week I’ll reply with a more substantial description for others to have a look at and replicate if they come across this thread. In short, I’ve:

Create a cloud backed PVC that will act as the shared storage
Create an NFS server Deployment and Service; I followed one of K8s archived examples adapted to my needs to achieve this
Create a PV of NFS type
In the same namespace as my JupyterHub, create a PVC that binds to the above PV
Edit my values.yaml to have every Pod mount a subPath of the above PVC

There are still a few things I’ve yet to figure out:

Tony’s example shows a storage request of 10Gi. Does this correspond to an enforced storage quota? I suppose this is easy enough to test myself!
How does one give instructor/admin-only access to the full volume? In principle, it’s as easy as mounting it as an extraVolume without the subPath. Is this accomplished by modifying Kubespawner to configure the extra volume (and volumeMount) when the authenticated user is an admin? Or is this yet another case of me over-complicating things?

Best,

ana v e

manics · June 7, 2025, 2:02pm

NFS quotas on Kubernetes depend on the NFS implementation, many ignore the quota request and give you unlimited storage.

Adding a conditional extra volume for instructors should work. group_overrides might work, if not you should be able to add the volume using modify_pod_hook

Topic		Replies	Views
Local Unix users in Jupyterhub (minikube) setting up Authentication, Volumes, NB_UID, NB_USER Zero to JupyterHub on Kubernetes jupyterhub , help-wanted	3	953	April 22, 2022
EFS multi user home directories JupyterHub	2	1829	January 10, 2020
Is there a way to mount more than 1 volume? Zero to JupyterHub on Kubernetes community , jupyterhub , help-wanted	5	1337	August 2, 2023
Setting root folder to shared folder Zero to JupyterHub on Kubernetes how-to , help-wanted	1	782	August 30, 2022
Sharing some data with each users. (creating symblink or bind mount) Zero to JupyterHub on Kubernetes	1	259	December 8, 2023

JupyterHub with K8s: Shared /home volume?

Related topics