Date and Time: Monday, November 12th (Part I: 2:00PM - 3:40PM, Part II: 4:00PM - 5:40PM)
Researchers, scientists and companies alike increasingly leverage semantically enriched, linked datasets to train machine learning models for tasks ranging from discovering new vaccines and materials, to recommending products and services, to building virtual personal assistants. At the same time, big-data analytics engines are increasingly adopted to store and process the ever increasing volumes of data efficiently at scale. Until recently however, the Semantic Web, big data analytics and machine learning communities were separated, since big-data analytics engines could not process Knowledge Graphs (KGs).
This tutorial aims to provide an up to date overview of recent advances that allow end to end processing pipelines to be constructed so that analytics and machine learning tasks can be performed without the need for intermediate, computationally expensive and/or time consuming data transfers and/or transformations. Hands on activities covering statistical analytics and inferencing over KGs, using simple use cases will be provided.
Knowledge of Python is required. Basic understanding of Distributed Computing frameworks is preferred.
During the tutorial participants will be invited to actively work on hands-on exercises using interactive Jupyter and/or Spark Notebooks. We therefore ask participants to bring their own laptop. Additionally, we provide a choice of either using Google Colab (in which case having a web browser and an internet connection will suffice) or pre-installing certain libraries in advance. Please refer to our setup instructions.
All resources for this tutorial, including slides, code samples, and additional reading materials, are available at our repository.
Dr. Charalampos Chelmis is is Associate Professor in Computer Science at the University at Albany, where he leads the Intelligent Big Data Analytics, Applications, and Systems (IDIAS) group. His research interests lay in socially important data science and democratizing the Semantic Web. He is actively working towards adding exploratory statistical analysis and prediction support to SPARQL in the context of a project supported by the U.S. National Science Foundation4. He is acting Associate Editor of SNAM journal, and program committee member of international conferences including but not limited to AAAI, the WebConf, ICWSM, and ASONAM. He is also reviewer for journals including IEEE TKDE and TCSS. Dr. Chelmis has taught over 15 courses at the University at Albany, including a special topics course on Semantic Web Technologies during the Spring 2021 semester. He has additionally organized and presented a tutorial on cyberbullying detection at WebSci21 and ICWSM18, and a tutorial on the prediction and estimation of web content popularity at the IEEE Big-Data17.
Mr. Bedirhan Gergin is a Ph.D. candidate in Computer Science at the Uni- versity at Albany. He has a Bachelor’s in Computer and Industrial Engineer- ing, and experience in leading defense industry companies in Turkey. He is a member of the Intelligent Big Data Analytics, Applications, and Systems lab, where he is working with Dr. Chelmis towards facilitating large scale analytics over semantic data.