Call for Papers

Call for Workshops

Call for Challenges

Call for Tutorials

Important Dates








Keynote Speakers










Accepted Papers


Presentation and Poster Instructions

Keynote Speakers

This year we will have five keynote speeches.

Keynote I: Mubarak Shah

UCF Trustee Chair Professor, Director
Center for Research in Computer Vision
University of Central Florida, USA

September 18 Wednesday 8:40-9:40am

Mubarak Shah

Video Object Segmentation and Human Action Localization


Video Object Segmentation and human action localization are two related problems with a wide range of applications in multiple areas. Video object segmentation deals with segmentation of the primary moving objects in the videos. While the goal of action detection is to detect every occurrence of a given action within a long video, and to localize each detection both in space and time. My group has been working on these two problems for some time and have proposed several different methods. In this talk I will present our methods developed solving these two problems.


Dr. Mubarak Shah, the UCF Trustee Chair Professor, is the founding director of Center for Research in Computer Visions at University of Central Florida (UCF). Dr. Shah is a fellow of IEEE, IAPR, AAAS and SPIE. He has published extensively on topics related to visual surveillance, tracking, human activity and action recognition, object detection and categorization, shape from shading, geo registration, visual crowd analysis, etc. He has been ACM and IEEE Distinguished Visitor Program speaker and is often invited to present seminars, tutorials and invited talks all over the world. He is recipient of ACM SIGMM Technical Achievement award; IEEE Outstanding Engineering Educator Award; Harris Corporation Engineering Achievement Award; an honorable mention for the ICCV 2005 Where Am I? Challenge Problem; 2013 NGA Best Research Poster Presentation; 2nd place in Grand Challenge at the ACM Multimedia 2013 conference; and runner up for the best paper award in ACM Multimedia Conference in 2005 and 2010. At UCF he has received Pegasus Professor Award; University Distinguished Research Award; Faculty Excellence in Mentoring Doctoral Students; Scholarship of Teaching and Learning award; Teaching Incentive Program award; Research Incentive Award.

Keynote II: Jenq-Neng Hwang

Professor and Associate Chair
Department of Electrical and Computer Engineering
University of Washington, Seattle, WA, USA

September 18 Wednesday 1:30-2:30pm

Jenq-Neng Hwang

Coordinated 3D World Exploration for Smart Surveillance and Autonomous Driving


With the huge amount of networked static surveillance and moving video cameras available everywhere nowadays, such as the cameras on the vehicles/drone for autonomous driving or aerial surveillance applications, there is an urgent need of systematic and coordinated mining of the detected video objects in the 3D physical world, so that the explored information can be exploited for various smart city applications. To achieve this goal, several critical challenges need to be effectively overcome, more specifically, reliable SLAM-based visual odometry for pose estimation (self-calibration) of moving cameras, robust tracking-by-detection and detection-by-tracking for detected object associations in presence of missing or erroneous detections, reliable ground plane estimation for 2D to 3D inferences, finally efficient 3D pose estimation for action description of detected human. In this talk, I will cover all these topics and propose our optimized strategies of integrating these research components, practical applications for AI City and autonomous driving will be demonstrated.


Dr. Jenq-Neng Hwang received the BS and MS degrees, both in electrical engineering from the National Taiwan University, Taipei, Taiwan, in 1981 and 1983 separately. He then received his Ph.D. degree from the University of Southern California. In the summer of 1989, Dr. Hwang joined the Department of Electrical and Computer Engineering (ECE) of the University of Washington in Seattle, where he has been promoted to Full Professor since 1999. He served as the Associate Chair for Research from 2003 to 2005, and from 2011-2015. He is currently the Associate Chair for Global Affairs and International Development in the ECE Department. He is the founder and co-director of the Information Processing Lab., which has won CVPR AI City Challenges awards consecutively in the past years. He has written more than 350 journal, conference papers and book chapters in the areas of machine learning, multimedia signal processing, and multimedia system integration and networking, including an authored textbook on "Multimedia Networking: from Theory to Practice," published by Cambridge University Press. Dr. Hwang has close working relationship with the industry on multimedia signal processing and multimedia networking.

Dr. Hwang received the 1995 IEEE Signal Processing Society's Best Journal Paper Award. He is a founding member of Multimedia Signal Processing Technical Committee of IEEE Signal Processing Society and was the Society's representative to IEEE Neural Network Council from 1996 to 2000. He is currently a member of Multimedia Technical Committee (MMTC) of IEEE Communication Society and also a member of Multimedia Signal Processing Technical Committee (MMSP TC) of IEEE Signal Processing Society. He served as associate editors for IEEE T-SP, T-NN and T-CSVT, T-IP and Signal Processing Magazine (SPM). He is currently on the editorial board of ZTE Communications, ETRI, IJDMB and JSPS journals. He served as the Program Co-Chair of IEEE ICME 2016 and was the Program Co-Chairs of ICASSP 1998 and ISCAS 2009. Dr. Hwang is a fellow of IEEE since 2001.

Keynote III: Jeff Alstott

Deep Intermodal Video Analytics (DIVA) Program Manager
Intelligence Advanced Research Projects Activity (IARPA), USA

September 19 Thursday 8:40-9:40am

Jeff Alstott

The IARPA DIVA Program


The task of monitoring video in airports, at border crossings, or at government facilities is increasingly critical for security, public safety, transportation and infrastructure monitoring. Security personnel or operators of camera networks are overwhelmed with the volume of video they must monitor, and cannot afford to view or analyze even a small fraction of collected video footage. In addition, when incidents occur and officials are tasked with forensically analyzing large volumes of video, it is manually intensive to identify relevant activities and the subjects of those activities. DIVA aims to develop technology to automate much of this video intelligence analysis, by automatically detecting activities in video with their constituent components and attributes. The DIVA program will produce a common framework and software prototype for activity detection, person/ object detection and recognition across a multi-camera network. The development of tools for forensic analysis, as well as real-time alerting for user-defined threat scenarios, will have significant impact on surveillance monitoring workflow and processing.


Dr. Jeff Alstott is a program manager at IARPA, where he runs programs on using AI for security and on the security of AI. He previously worked for MIT, Singapore University of Technology and Design, the World Bank and the University of Chicago. He obtained his PhD studying complex networks at the University of Cambridge, and his MBA and bachelor’s degrees from Indiana University. He has published research in such areas as animal behavior, computational neuroscience, complex networks, design science, statistical methods, and S&T forecasting.

Keynote IV: Hong-Yuan Liao

Distinguished Research Fellow and Director
Institute of Information Science, Academia Sinica, Taiwan

September 19 Thursday 1:30-2:30pm

Hong-Yuan Mark Liao

When AI meets Multimedia


The development of artificial intelligence mainly goes through three periods. The first period is from 1950 to 1970, and the main focus of this period was on logic reasoning. The second period is from 1980 to 1990, and the main emphasis of this period was on knowledge representation. The development of 2010 ImageNet inspired development of the third phase of artificial intelligence. This stage is mainly the age of machine learning. As for multimedia, it covers image, video, music, speech, graphics, text, etc. This talk will mainly cover our research results starting in 2010 on how to use AI technology to solve multimedia content processing and related applications. The techniques used cover knowledge representation and deep learning. The five topics I am going to deliver include: (1) born conductor; (2) automatic concert video mashup; (3) two-pass regression framework for people counting; (4) tactics analysis on NBA broadcast videos; and (5) fisheye surveillance camcorder-based traffic flow computation.


Mark Liao received his Ph.D degree in electrical engineering from Northwestern University in 1990. In July 1991, he joined the Institute of Information Science, Academia Sinica, Taiwan and currently, is a Distinguished Research Fellow and Director. He has worked in the fields of multimedia signal processing, computer vision, pattern recognition, multimedia protection, and artificial intelligence for more than 30 years. He is jointly appointed as an Honorary Chair Professor of National Chiao-Tung University. During 2009-2012, he was jointly appointed as the Multimedia Information Chair Professor of National Chung Hsing University. Since August 2010, he has been appointed as an Adjunct Chair Professor of Chung Yuan Christian University. From August 2014 to July 2016, he was appointed as an Honorary Chair Professor of National Sun Yat-sen University. He received the Young Investigators' Award from Academia Sinica in 1998; the Distinguished Research Award from the National Science Council in 2003, 2010 and 2013; the Academia Sinica Investigator Award in 2010; and the TECO Award from the TECO Foundation in 2016. His professional activities include: Co-Chair, 2004 International Conference on Multimedia and Exposition (ICME); Technical Co-chair, 2007 ICME; President, Image Processing and Pattern Recognition Society of Taiwan (2006-08); Editorial Board Member, ACM Computing Surveys (2018 – present), IEEE Signal Processing Magazine (2010-13); Associate Editor, IEEE Transactions on Image Processing (2009-13), IEEE Transactions on Information Forensics and Security (2009-12) and IEEE Transactions on Multimedia (1998-2001). He has been a Fellow of the IEEE since 2013.

Keynote V: Rama Chellappa

Distinguished University Professor and Minta Martin Professor of Engineering
Department of Electrical and Computer Engineering
University of Maryland, College Park, USA

September 20 Friday 8:40-9:40am

Rama Chellappa

Re-Identification of Humans and Vehicles from Stationary and Moving Sensors


In this talk, I will discuss methods based on handcrafted and data-driven representations for re-identification of humans and vehicles from images and videos collected by stationary and moving sensors. The handcrafted representations are derived using spherical harmonics, structure from motion and dictionaries. The data-driven representations are derived using deep networks that incorporate spatial and temporal attention models. I will conclude the talk by presenting results on several publicly available datasets.


Prof. Rama Chellappa is a Distinguished University Professor and a Minta Martin Professor of Engineering in the Department of Electrical and Computer Engineering at the University of Maryland. He is a recipient of the K.S. Fu Prize from the International Association of Pattern Recognition (IAPR), the Society, Technical Achievement and Meritorious Service Awards from the IEEE Signal Processing Society and the Technical Achievement and Meritorious Service Awards from the IEEE Computer Society. He also received the Inaugural Leadership Award from the IEEE Biometrics Council. At UMD, he has received college and university level recognitions for research, teaching, innovation and mentoring of undergraduate students. He received the Outstanding ECE Award and a Distinguished Alumni Award from Purdue University and the Indian Institute of Science, respectively. He is a Fellow of IEEE, IAPR, OSA, AAAS, ACM, and AAAI and holds six patents. His current researcher interests are computer vision, pattern recognition and machine intelligence.