Researchers, scientists and companies alike increasingly leverage semantically enriched, linked datasets to train machine learning models for tasks ranging from discovering new vaccines and materials, to recommending products and services, to building virtual personal assistants. At the same time, big-data analytics engines are increasingly adopted to store and process the ever increasing volumes of data efficiently at scale. Until recently however, the Semantic Web, big data analytics and machine learning communities were separated, since big-data analytics engines could not process Knowledge Graphs (KGs).
This tutorial aims to provide an up to date overview of recent advances that allow end to end processing pipelines to be constructed so that analytics and machine learning tasks can be performed without the need for intermediate, computationally expensive and/or time consuming data transfers and/or transformations. Hands on activities covering statistical analytics and inferencing over KGs, using simple use cases will be provided.
All examples, code snippets, and hands-on exercises will be shown and prepared using interactive Jupyter and/or Spark Notebooks. Docker images, which allow for enabling easy deployment and configuration, will be used to demonstrate the various pipelines throughout this tutorial.
This tutorial is targeted to attendees from several research areas, since the covered topic is at the intersections of several domains within Semantic Web, namely: ontology based data access, scalable analytics, and machine learning. Thus, we consider that most (if not all) sessions being offered as part of the tutorial would appeal to almost everyone attending the conference. Beginners will learn the basics of scalable semantic data processing, whereas experts will have a chance for an in-depth review of existing solutions and how these compare against each other for end-to-end processing pipelines.
Knowledge of Python is required, whereas basic familiarity with Java and understanding of Distributed Computing frameworks and Docker is preferred. Participants interested in following the hands-on activities should bring their own laptop with a Linux based OS, preferably with Docker Engine 1.13.0 and docker-compose 1.10.0
Dr. Charalampos Chelmis is Assistant Professor in Computer Science at the University at Albany, where he leads the Intelligent Big Data Analytics, Applications, and Systems group. His research interests lay in socially important data science and democratizing the Semantic Web. He is actively working towards adding exploratory statistical analysis and prediction support to SPARQL in the context of a project supported by the U.S. National Science Foundation. He is acting Associate Editor of SNAM journal, and program committee member of international conferences including but not limited to AAAI, the WebConf, ICWSM, and ASONAM. He is also reviewer for journals including IEEE TKDE and TCSS.
Mr. Bedirhan Gergin is a second year Ph.D. student in Computer Science at the University of Albany. He had his bachelor's in Computer and Industrial Engineering, and experience in leading defense industry companies in Turkey. Currently, he is a junior researcher in the Intelligent Big Data Analytics, Applications, and Systems lab, where he is working with Dr. Chelmis towards adding exploratory statistical analysis and prediction support to SPARQL in a project that is funded by the U.S. National Science Foundation.