Building an open source high-performance data analytics framework

Supun Kamburugamuve from Indiana Univeristy

May 13th, 2019, 3:10 PM (duration: 40 minute)

Location: Emporium Logan Square Arcade Bar

Big data computing and high-performance computing (HPC) has evolved over the years as separate paradigms. With the explosion of the data and the demand for machine learning algorithms, these two paradigms are increasingly embracing each other for data management and algorithms. For example, public clouds such as Microsoft Azure are adding High performance compute instances with Infiniband and large scale deployments of GPUs in HPC clusters, enable Artificial Intelligence algorithms on large data sets. In the future, we can expect more and more applications to explore the benefits of HPC while taking advantage of big data systems. This presentation introduces a high-performance open source data analytics platform developed at Indiana University, its benefits compared to the existing solutions such as Spark and Flink and how we are using the Apache way of software development to create a global community. The project is called Twister2 and source code can be found at

A Case Study (What We Did) presented as Regular Talk in the Startups track.