Wednesday 15:00 UTC
Democratizing ML Pipelines using StreamPipes: Flexible Model Integration and Serving for Industrial IoT Applications
Philipp Zehnder, Marco Heyden
Apache StreamPipes (incubating) is an industrial IoT toolbox which enables non-technical users to flexibly connect, analyze and exploit continuous data streams. Under the hood, StreamPipes integrates an event-driven microservice architecture with a rich graphical user interface that lets users create stream processing pipelines. In this talk, we focus on analyzing image data from industrial cameras with Machine Learning (ML) and Apache StreamPipes. Based on an application example for Visual Quality Inspection, techniques for integrating ML models into StreamPipes are presented and a demo shows how product defects can be easily recognized in real time using a no-code approach.
The talk gives an overview of Apache StreamPipes, highlights the positioning of StreamPipes within the ASF IoT ecosystem and demonstrates how machine learning models can be easily integrated and evaluated within a StreamPipes application.
Philipp Zehnder is a research scientist at the FZI Research Center of Information Technology. His current research interests are in the areas of Distributed Stream Processing and Streaming Machine Learning. He is very interested in open source software, especially in the field of IIoT, and is involved in the Apache StreamPipes (incubation) project.
Marco Heyden currently works at the FZI Research Center for Information Technology in Karlsruhe. His research interests include Federated Learning, Data Stream Processing and Unsupervised Learning. He has worked in several public-funded research projects related to Machine Learning and Data Stream Processing in industrial IoT.
StreamPipes’ New Kids on the Block: An Introduction to Edge Extensions, Client API and Data Explorer
Dominik Riemer, Patrick Wiener
Several community initiatives are working on major feature extensions to Apache StreamPipes which aim to ease the time-consuming task of analyzing Industrial IoT data in a self-service manner. In this talk, we focus on a feature tour of the three latest Apache StreamPipes features that are to be introduced in 2021: First, we introduce StreamPipes Edge Extensions, which allow users to flexibly select deployment targets of pipeline elements, enabling geographic distribution of pipeline elements over the edge-to-cloud continuum. Second, integration of StreamPipes with external applications is significantly improved with the Apache StreamPipes API & Client libraries, a new developer-oriented way to interact with StreamPipes concepts programmatically. Finally, we demonstrate the recently introduced data explorer which eases the interaction and visual exploration of persisted data streams in StreamPipes. The talk will be accompanied by several live demos that illustrate the practical outcome for Apache StreamPipes users.
Dominik Riemer is a senior researcher and division manager at the FZI Research Center for Information Technology. His main work is focused around stream processing and data management in IoT applications. He is co-initiator and PPMC member of Apache StreamPipes (incubating), a self-service industrial IoT toolbox.
Patrick Wiener currently works at the FZI Research Center for Information Technology in Karlsruhe. His research interests include Distributed Computing (Cloud, Edge/Fog Computing), IoT, and Stream Processing. Patrick is an expert for infrastructure management such as containers and container orchestration frameworks. He has worked in several public-funded research projects related to Big Data Management and Stream Processing in domains such as manufacturing, logistics and geographical information systems.
Cracking the Nut, Solving Edge AI with Apache Tools and Frameworks
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the edge before we start our real-time streaming flows. Fortunately using the all Apache Mm FLaNK stack we can do this with ease! Streaming AI Powered Analytics From the Edge to the Data Center is now a simple use case. With MiNiFi we can ingest the data, do data checks, cleansing, run machine learning and deep learning models and route our data in real-time to Apache NiFi and Apache Kafka for further transformations and processing. Apache Flink will provide our advanced streaming capabilities fed real-time via Apache Kafka topics. Apache MXNet models will run both at the edge and in our data centers via Apache NiFi and MiNiFi. Our final data will be stored in Apache Kudu via Apache NiFi for final SQL analytics. We add microservices in Kafka Streams.
Apache Flink, Apache Kafka, Apache NiFi, MiNiFi, Apache MXNet, Apache Kudu, Apache Impala, Apache HDFS
Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Kafka, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.Wednesday 18:00 UTC
Building and Scaling Robust Zero-code IoT Streaming Data Pipelines with Open Source Technologies
With the rapid onset of the global Covid-19 Pandemic in 2020 the USA Centers for Disease Control and Prevention (CDC) quickly implemented a new Covid-19 pipeline to collect testing data from all of the USA’s states and territories, and produce multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka.
Inspired by this story, we built two demonstration streaming pipelines for ingesting, storing, and visualizing public IoT data (Tidal data from NOAA, the National Oceanic and Atmospheric Administration) using multiple open source technologies. The common ingestion technologies were Apache Kafka, Apache Kafka Connect, and Apache Camel Kafka Connector, supplemented with Prometheus and Grafana for monitoring. The initial experiment used Open Distro for Elasticsearch and Kibana as the target storage and visualisation technologies, while the second experiment used PostgreSQL and Apache Superset.
In this talk we introduce each technology and the pipeline architecture, and walk through the steps followed, challenges encountered, and solutions used to build reliable and scalable pipelines, and visualize the results (including Tidal periods, ranges and locations). We compare and contrast the two approaches, focussing on exception handling, scalability, performance and monitoring, and the pros and cons of the two visualization technologies (Kibana and Superset).
Since learning to program on a VAX 11/780, Paul has extensive R&D and consulting experience in distributed systems, technology innovation, software architecture and engineering, software performance and scalability, grid and cloud computing, and data analytics and machine learning.
Paul is the Technology Evangelist at Instaclustr. He’s been learning new scalable technologies, solving realistic problems, building applications, and blogging about an ever-increasing list of Open Source technologies.
Paul has worked at UNSW, several tech start-ups, CSIRO, UCL (UK), and NICTA. Paul has an MSc in Machine Learning and a BSc (Computer Science and Philosophy).
PMEM-backed Repositories for more high-performance Apache NiFi
Low latency and high throughput are desirable characteristics for dataflow systems such as Apache NiFi to process large-scale IoT data fast. NiFi keeps track of all dataflow in the system by storing their content, attribute, and provenance in disks, which needs I/O and tends to be a performance bottleneck. Persistent memory (PMEM), a non-volatile and byte-addressable memory installed into DIMM slots, is a new alternative to disk and could bring more performance. However, to benefit much from PMEM, programs need to change to be PMEM-aware. In this talk, I’ll show my proposal to NiFi, which introduces PMEM-backed Content, FlowFile, and Provenance Repositories, based on PMEM-related technologies such as Filesystem DAX (Direct Access) and PMDK (Persistent Memory Development Kit). Note that it is for now a proposal to NiFi but could be applied to other data management systems such as Kafka. This talk covers how my proposal improves NiFi’s latency and throughput, how it modifies NiFi to use DAX and PMDK, the results of performance tests when using Optane PMem in App Direct Mode, and one more proposal for NiFi to be more PMEM-aware.
Takashi Menjo is a Researcher at NTT Software Innovation Center located in Tokyo, Japan. He works in the area of PMEM-aware applications, especially redesigning traditional disk-based software such as PostgreSQL or Apache NiFi.Wednesday 19:40 UTC
Creating an open, vendor-independent Industry 4.0 solution with IndustryFusion
IndustryFusion is a vendor-independent, easy-to-adopt, multi-party open source IoT connectivity solution for smart factories and smart products. The main goal of the project is to provide simple mechanisms to share data among multiple stakeholders, enabling new business models like production sharing, machine leasing, and carbon neutrality monitoring. Industry Fusion architecture has been designed from ground up to run locally in the factory as well as in the Cloud.
Konstantin Kernschmidt is passionate about the digital transformation of small and medium-sized enterprises. He has a broad experience in mechanical engineering, IT, and smart factory solutions. Konstantin is the Head of Research & Development / Industry 4.0 at MicroStep Europa GmbH and the technical lead of IndustryFusion. Prior to his current position, he was the General Manager of a cross-disciplinary research center focusing on innovation processes and new business models in the context of Industry 4.0. He holds a PhD in automation and information systems as well as a diploma in mechanical engineering and management from the Technical University of Munich (TUM).
At MicroStep Europa - a manufacturer of high-end CNC cutting systems - he accompanied the digital transformation of key business processes. In his new role within the IndustryFusion Team, besides the brand communication, he passionately takes care of the user experience & application design, streamlining processes and building a scalable solution.
Marcel Wagner is Software Application Engineer in Intel's IoT Group. In this role, he works with cusomters on Open Source Edge-Cloud platforms and Cloud Native architectures, with focus on Industrial IoT. He contributed to open source projects like StarlingX, the OpenStack open source Edge-Cloud, and Open IoT Service Platform, an open source cloud platform which is based on Apache projects like Kafka, Beam, Flink, and Casssandra. Before joining Intel, Marcel was researching at Siemens Corporate Technology and Nokia Networks on video transmission protocols and distributes applications. Marcel holds a Dr. rer. nat from the University of Freiburg, Germany, and a master of science (Dipl. Inform.) from the Karlsruhe Institute of Technology.
A machine learning hub for IoT data using Apache Madlib
Apache Madlib adds comprehensive statistics and machine learning capabilities to Postgres. A Cambrian explosion of IoT data sources has meant massive data integration overhead long before you can get to ML. But what if the ML step itself was very low friction and done all inside the database. And what if it was possible to access raw data files from sensors so a lot of ETL was actually LTE - load, then transform then extract. Combining Apache Madlib with the super powers of Postgres extensions allows such... well... "massive synergies".
We look at data from human sleep monitoring devices and heart rate monitors to explore and exercise the capabilities of this powerful combination and see how we can set alarms and thresholds to warn of dangerous situations. Predictive analytics and visualization are also made much simpler which we demonstrate using this health data.
Nitin Borwankar has played almost every role in the software development lifecycle over 25 years from QA Engineer, to App developer, Database Architect, Product Manager, Engineering Manager and Data Scientist. He is a founder and CTO of Numericc.Thursday 15:00 UTC
IoTDB - Moving from Relational to Timeseries
I am developing a remote monitoring solution for IT infrastructure. The solution was originally based on a RDBMS but through necessity migrated to the IotDB timeseries database. I would like to share my experiences of migrating to IotDB.
Trevor is a seasoned geospatial professional. More recently he has developed a passion for timeseries databases!Thursday 15:50 UTC
Better and better: new features of Apache IoTDB
In this talk, we will introduce the new features that IoTDB brings in the past year, including the cluster module, the UDF, the trigger, multi-variable time-series support, etc.
We will also share some practice solutions like how to implement hot backup in this talk.
Dr. Xiangdong Huang is a Research Assistant at the School of Software, Tsinghua University. He is the PMC Chair of Apache IoTDB.Thursday 17:10 UTC
Unified fieldbus API with Apache PLC4X
The universe is full of standards which compete in multiple dimensions. Same truth applies to manufacturing industry. Space between standards is filled by various software libraries which are intended to let both - computers as well as micro controllers interact with end hardware.
During this presentation we will have a look on Open Systems Interconnection (OSI) model and see where typical fieldbus technology. We will briefly see major differences in usage of field buses compared to traditional TCP or UDP based communications.
In second part of this talk we will take a look on typical manufacturing solutions such as Profinet, CANopen and Ethernet/IP. We will learn basis of one of these standards and see how to acquire data through Apache PLC4X client API. Finally we will compare how to interact with remaining two standards using APIs introduced earlier.
Łukasz worked in field of software integration for more than a decade. Starting from 2010 he was involved in multiple projects involving middleware projects coming out of Apache Software Foundation.
Starting from 2015 Łukasz began to acquire more and more interest in area of building automation which eventually lead him to contributions of BACnet binding for openHAB project. Later he joined cooperation unser Apache PLC4X project which resulted in implementation of CANopen driver for project.
Łukasz worked so far with BACnet, Wireless M-Bus, Modbus as well as CAN, CANopen and did his experiments with Profinet and M-Bus.
Polyglot Apache PLC4X
Apache PLC4X had the "X" in it's name from the start as it was always intended to provide APIs for accessing PLCs and industrial hardware in multiple languages.
But creating drivers for "N" protocols in "M" languages is a maintenance nightmare.
In this session I will not only take you through the available APIs in Java, Go, C and possibly C#/.Net and Python, but also explain how we manage to do this via code-generation.
In the past 2 years we have created a framework and the necessary tooling for generating more than 90% of a drivers code from a machine- and human-readable format.
In the second part I'll take you on a journey through this toolset.
Full blooded Apache and Open-Source enthusiast. Invests all of his work and private time in multiple Apache Projects. Deeply interested in the IoT Area he is currently VP of the Apache PLC4X project and deeply involved in multiple Apache projects as well as mentor to multiple podlings in the Apache Incubator.