Scalable Open Source Data Science on Bare Metal Kubernetes

Audience:
Topic:

By utilizing DevOps and cloud-native strategies, UCSB's College of Letters and Science hosts JupyterHub instances for over 50 courses and over 4600 persistent individual developer environments per year in baremetal Kubernetes environments.  This talk will highlight the methods to accomplish this, such as automated pipelines for keeping all of the container images continuously updated and tested via Jenkins, provisioning servers with immutable Linux on bare metal Kubernetes hosts via SUSE Rancher Elemental, leveraging Ansible and git to automate an open source load balancer stack in front of the clusters to reliably and securely serve environments over the public internet, optimizing storage provisioning for performance, cost, and scalability via Longhorn and Ceph, git-based CI driven pipelines for managing the lifecycle of Jupyter Notebook and RStudio environments to align with SLAs through a helm-based dev-staging-production deployment workflow capable of automatically rolling back, and utilizing operational logging and monitoring tools such as prometheus, grafana, fluentd, and opensearch dashboards to gain insights into the health of deployments and to better troubleshoot incidents.

Supporting containerized Python and R development environments is inherently challenging as dependencies break often.  This talk will demonstrate the value of using a CI to identify potential library problems or container image build problems ahead of time, while giving stakeholders, such as faculty or researchers, a means of optional additional manual testing current and upcoming changes to their container image backed environments.  This has also given us the opportunity to contribute back to the open source community by submitting issues and patches for issues as we often discover them quickly through the use of CI, so that other users and organizations can benefit as well.

The move to baremetal Kubernetes was a DevOps success story in reducing latency, cost, technical debt, and overall time-to-live.

Time:
Thursday, October 31, 2024 - 14:45