Cruising in data lake with Apache spark : From zero to scale

Nikita Voronin, Sneha Chaphalkar from HERE Technologies

May 13th, 2019, 3:10 PM (duration: 40 minute)

Location: The Native

As part of Highly Automated Driving (HAD) group at HERE Technologies we build High-Definition Map (HDMap) of the real world to make autonomous driving possible. Given the complexity of pipelines for data enrichment and the petabyte scale of rich and unstructured content, there is a need for a mechanism to avoid data silos and to have one centralized way to access, evaluate and analyze the data across multiple systems. In this talk we will outline the principles and the technology behind our approach for building a data lake to address these challenges. We will provide guidelines for implementing and scaling up the data lake using Apache Spark in the cloud. Presenters: Nikita Voronin - Lead Architect @HERE He specializes in cloud-based solutions for distributed data processing. Specifically, he designs and develops data persistence components with the goal of obtaining logical data consistency, reliability and performance under load. He coordinated the initiative for building the heterogeneous data lake from ground-up. Sneha Chaphalkar - Senior Engineering Manager @HERE Leads the Content Management infrastructure teams that power the HDMap product. She has over 13 years of Software Engineering experience in building scalable resilient microservices and distributed data architectures.

A Technical (How To) presented as Regular Talk in the Made in Chicago track.