Real-world Big Data workloads in the cloud
There's a flood of open data out there from organizations and governments large and small. With such easy access to this, solving common big data problems seems simpler than ever before. Got a problem with traffic, weather, or money? Analyze the right datasets, and you just might learn that 2pm on a sunny Tuesday afternoon is the best time to drive to the bank. Thanks Big Data!
The rub with trying to solve these problems is in the deployment and configuration of all the services that need to work together to get to an answer. Wouldn't it be great if there was an easy way to model a Big Data platform (complete with ingestion, processing, and visualization components), stand that up in a cloud, and get down to business? "Yes" is the right answer, and fortunately, Juju does just that.
In this talk, we'll cover some of the Big Data services available in the Juju ecosystem (Hadoop, Spark, Kafka, Zeppelin, etc) and then discuss how these can be bundled together as a platform for grinding on Big Data problems. We'll then demo some of the real-world problem-solving bundles that we've found most popular, including realtime log analytics and finding meaning in financial market data. We may even tell you which is better: Emacs or vi. (just kidding, you don't need Hadoop to know it's vi).
This talk is intended for an intermediate audience and will best serve people that have a basic understanding of common components in a Big Data platform. If I say "NameNode", you should not need to google that. Ideally, you've experienced the pain of standing up Big Data first hand, or at least heard of someone that changed careers because configuring their Big Data cluster was that awful.