Hello Jovyans,
I’m trying to figure out a way to have /home
exist as a share volume mount, and have users’ home directories, i.e. /home/$USER
, all be found on this mount.
Here’s my motivation. I want to:
- use fewer volumes (provided by the underlying cloud), as all user data will exist on a single volume as opposed to one volume per user
- simplify volume backups/snapshots; 1 vs many
- allow JHub admins (in my case, classroom instructors) to more easily access student files for grading of assignments, while students cannot peak at each others’ work
Roughly, I see a path forward to get me most of the way there:
- Export an NFS volume from a dedicated NFS server Pod
- Use the extraVolumeMounts and extraVolumes to mount this shared volume at
/home
for each single user server - Use the hub.extraConfig to subclass kubespawner and:
- define a function that returns some necessary environment variables as a dictionary*
- set KubeSpawner.environment to what’s returned by this function
- Start the single user container as
root
* The necessary environment variables are:
NB_USER = <jhub-username>
NB_GROUP = "users"
orNB_GROUP = "admin"
(for JHub administrators)NB_UID = <some unique uid>
NB_GID = "100"
(users) orNB_GROUP = "200"
(admin)
The last two steps are so that this section of start.sh, provided as part of jupyter-docker-stacks, will run appropriately and start the jupyterlab server as the appropriate user, uid, group, gid, and with home directory (in /home/$NB_USER
).
I’m sure there’s some unintentional hand-waving in the steps I’ve described.
The part that I’m having trouble figuring out is how to give users <some unique uid>
, which will allow us to then set home directory ownership to <uid>:200
(i.e. <user>:admin
) and permission to 770
(rwx by <user>
and admins, inaccessible by everybody else).
My gut tells me that I should store user uid
’s in the JHub data base (specifically in each user’s Spawner state). My function that returns the KubeSpawner.environment
would then have to either read this value from the data base or, if it doesn’t exist, create the next available uid
. I don’t know how to do this!
After reading through the docs and some the source for JupyterHub and Kubespawner, I’ve decided that I should reach out for help since I’m having trouble understanding how data gets to/from the database and the spawner instances.
To be explicit in what my questions are:
- First of all, based on my motivations, is having a shared
/home
directory an appropriate solution? - Is this an appropriate implementation?
- If yes to the above, how can I interact with the JHub database to create/read unique user
uid
’s?
Thanks!
ana v e