BigSem: Big Data Analytics for Semantic Data

Info Motivation Schedule Preparation Materials Speakers

Big Data Analytics for Semantic Data

Half-Day Tutorial at the 23rd International Semantic Web Conference

Tutorial information

Date and Time: Monday, November 12th (Part I: 2:00PM - 3:40PM, Part II: 4:00PM - 5:40PM)

Motivation

Researchers, scientists and companies alike increasingly leverage semantically enriched, linked datasets to train machine learning models for tasks ranging from discovering new vaccines and materials, to recommending products and services, to building virtual personal assistants. At the same time, big-data analytics engines are increasingly adopted to store and process the ever increasing volumes of data efficiently at scale. Until recently however, the Semantic Web, big data analytics and machine learning communities were separated, since big-data analytics engines could not process Knowledge Graphs (KGs).

This tutorial aims to provide an up to date overview of recent advances that allow end to end processing pipelines to be constructed so that analytics and machine learning tasks can be performed without the need for intermediate, computationally expensive and/or time consuming data transfers and/or transformations. Hands on activities covering statistical analytics and inferencing over KGs, using simple use cases will be provided.

Schedule (subject to change)

Introduction (2:00 PM)

Overview and setup instructions

Module 1 (2:20 PM)

Libraries for analytics and machine learning
Basic examples using interactive Jupyter Notebooks

Module 2 (2:50 PM)

Libraries for semantic data access
Basic examples using interactive Jupyter Notebooks

Coffee Break (3:40 PM)

Module 3 (4:00 PM)

Engines and frameworks for semantic data analytics
Introduction to Apache Spark
Introduction to SANSA stack
Hands-on demonstration of SANSA on the Linked Movie Database
Introduction to SparkKG
Hands-on demonstration of SparkKG-ML on a large-scale Knowledge Graph of recipes

Discussion and conclusion (5:00 PM)

Recap of key learnings
Lessons learned and potential future directions
Clossing discussion

Prerequisites

Knowledge of Python is required. Basic understanding of Distributed Computing frameworks is preferred.

Please come prepared

During the tutorial participants will be invited to actively work on hands-on exercises using interactive Jupyter and/or Spark Notebooks. We therefore ask participants to bring their own laptop. Additionally, we provide a choice of either using Google Colab (in which case having a web browser and an internet connection will suffice) or pre-installing certain libraries in advance. Please refer to our setup instructions.

Resources

All resources for this tutorial, including slides, code samples, and additional reading materials, are available at our repository.

Speakers

Dr. Charalampos Chelmis is is Associate Professor in Computer Science at the University at Albany, where he leads the Intelligent Big Data Analytics, Applications, and Systems (IDIAS) group. His research interests lay in socially important data science and democratizing the Semantic Web. He is actively working towards adding exploratory statistical analysis and prediction support to SPARQL in the context of a project supported by the U.S. National Science Foundation4. He is acting Associate Editor of SNAM journal, and program committee member of international conferences including but not limited to AAAI, the WebConf, ICWSM, and ASONAM. He is also reviewer for journals including IEEE TKDE and TCSS. Dr. Chelmis has taught over 15 courses at the University at Albany, including a special topics course on Semantic Web Technologies during the Spring 2021 semester. He has additionally organized and presented a tutorial on cyberbullying detection at WebSci21 and ICWSM18, and a tutorial on the prediction and estimation of web content popularity at the IEEE Big-Data17.

Mr. Bedirhan Gergin is a Ph.D. candidate in Computer Science at the Uni- versity at Albany. He has a Bachelor’s in Computer and Industrial Engineer- ing, and experience in leading defense industry companies in Turkey. He is a member of the Intelligent Big Data Analytics, Applications, and Systems lab, where he is working with Dr. Chelmis towards facilitating large scale analytics over semantic data.