Building a self-service data pipeline with Apache Spark
Audience:
Topic:
At ZipRecruiter, we are currently building our next generation in house streaming data platform to enable our 10 person data services team to support 20 distinct dev teams by providing a self-service system.
I’ll share the architecture we design based on the trade-offs we considered and the choices we’ve made.
Building a data pipeline for stats and analysis is a big job. We have a cornucopia of open source tools to choose from and so many decisions to make regarding:
- Tools
- orchestration
- storage formats
- streaming compute
- SQL integration
- data ingress, egress
- job vetting
- data integrity
Room:
Ballroom H
Time:
Saturday, March 10, 2018 - 15:00 to 16:00