Running batch workload on K8s at scale
We live in the age of data. We generate and consume more data than ever before. But the data is useless without context. We run ETL jobs to process the data. We do HPC to analyze and process the data. We build machine learning models to automate processes and make better decisions. Batch processing allows us to do all of this. Batch processing is the backbone of data science. But we need to rethink how we approach batch processing in order to take advantage of modern hardware, containers, and cloud infrastructure.
In Kubernetes we can run Job/Cronjobs quite easily. But that is not enough to run batch workloads at scale. We need to consider scalability, cost optimization and performance. We also need to think about day 2 operations of managing and upgrading the platform.
In this talk we will discuss how to run batch workloads on k8s at scale. We will cover what type of workloads are good candidates for running on k8s, how to design them to be easy to manage and scale, and common pitfalls to avoid.