Netflix and Twitter are joining to host this data engineering Meetup, focused on how our companies solve the challenge of transforming and logging data at scale and in real time. Let's all get together to network, enjoy some food & drinks and share triumphs/challenges about what we've been working on.
Agenda:
6:00 - 7:00 Registration, Happy Hour & Networking
7:00 - 8:30 Presentation (Details below)
8:30 - 9:00 Q&A, Networking, Wrap-up
Topic: You Are What You Log - Attacking Big Data at the Source
One area of modern data pipelines that has lacked attention is the logger itself. However, by providing a richer logging api, we can enable source applications to do much of the heavy lifting for us, eliminating the need for expensive downstream joining and aggregation.
Topic: TSAR (the TimeSeries AggregatoR) - How to Count Tens of Billions of Daily Events in Real Time Using Open Source Technologies
Twitter’s 300+ million users generate tens of billions of tweet views per day. In this talk I’ll introduce TSAR (the TimeSeries AggregatoR), a robust, flexible, and scalable service for real-time event aggregation designed to solve this problem and a range of similar ones. I’ll discuss how we built TSAR using Python and Scala from the ground up, almost entirely on open-source technologies (Storm, Summingbird, Kafka, Aurora, and others), and describe some of the challenges we faced in scaling it to process tens of billions of events per day.
Topic: Big Data Processing @Scale
Deep dive into the internals of the Netflix Big Data Platform and discover what it takes to process petabytes of data everyday in the cloud. Learn about how we deploy Spark and Presto for everything from ETL to ML and leverage Parquet for advanced storage and processing.
