What it takes to use machine learning in streaming data pipelines

Dean Wampler from Lightbend

May 14th, 2019, 3:10 PM (duration: 40 minute)

Location: The Native

If you want to use ML/AI in streaming data pipelines, here are the challenges you face: 1) there is a gap between popular data science tools and methods (e.g., Python and R based) and typical production deployment tools (e.g., Java based), 2) how do data scientists periodically deploy updated models into running streams without forcing restarts?, and finally 3) production streaming data pipelines must meet higher demands for reliability, resiliency, dynamic scalability, etc., compared to their batch counterparts. I'll explain these problems and discuss established and emerging open-source techniques and tools (Apache and beyond) to solve them.

A Technical (How To) presented as Regular Talk in the Made in Chicago track.