Building a self-service data pipeline with Apache Spark

Audience:
Topic:

 

At ZipRecruiter, we are currently building our next generation in house streaming data platform to enable our 10 person data services team to support 20 distinct dev teams by providing a self-service system.

I’ll share the architecture we design based on the trade-offs we considered and the choices we’ve made. 

Building a data pipeline for stats and analysis is a big job.  We have a cornucopia of open source tools to choose from and so many decisions to make regarding:

  • Tools
  • orchestration
  • storage formats
  • streaming compute
  • SQL integration
  • data ingress, egress
  • job vetting
  • data integrity

 

Room:
Ballroom H
Time:
Saturday, March 10, 2018 - 15:00 to 16:00