Web applications have become the de facto way to access services and functionalities. It is vital to ensure that different users of web applications are only allowed to access what they are supposed to, i.e., implementing correct authorization. But, unfortunately, broken access control and authorization has been constantly ranked as one of the top web application vulnerabilities. In fact, many web applications cannot provide an accurate specification of their enforced authorization policies due to challenges such as code complexity and fast-paced development. Neither end users nor even developers of the applications could reason about data protection in this environment. To address this problem, this research project devises a novel framework for learning fine-grained authorization policies from web applications without relying on access to their source codes or understanding other internal complexities. The project develops an integrated research and education program to train the next generation of cybersecurity workforce at the intersection of security/privacy, machine learning, and web technologies. Since web-based systems are pervasive in our society, the developed framework and associated solutions will significantly contribute to system safety and user privacy. Furthermore, due to their black-box design, the developed techniques will be critical assets to investigate data authorization practices of applications outside their development environments by application adopters (e.g., companies deploying outsourced applications) and third parties acting in the interest of end users (e.g., security/privacy researchers and regulators investigating compliance with privacy laws and expectations). The project engages a diverse body of students especially from underrepresented groups in security and privacy research and exposes the broad community to security and privacy topics through outreach activities.

This research project develops a novel paradigm for automated learning of web application authorization policies that significantly improves ensuring the security and privacy of web applications. A key characteristic of this research is to treat web applications as black boxes, i.e., learning authorizations by interacting with and observing them as would regular end users. The black-box approach allows abstracting away internal complexities of web applications and focusing instead on what matters: learning what policies are enforced on users as they access application data. The research is carried out in three thrusts. First, a theoretical policy learning framework will be devised for efficiently probing the authorization space of applications as black boxes and constructing formal specifications of their policies. Second, a methodology and associated techniques for learning representation of data objects, relationships, and operations from black-box web applications will be developed in order to realize practical deployment of the theoretical framework in the web domain. Third, the project will develop techniques for analysis and integration of the learned authorization policies to improve the security and privacy of web applications. This paradigm will be transformative for web security/privacy research and practice by providing researchers, developers, and analysts an automated approach to learn the specifications of authorization policies. In addition to enabling them to understand the authorization behavior of web applications, it will revitalize research in formal policy testing and verification techniques that rely on concrete policy specifications. Furthermore, the general framework will be applicable beyond web applications to other domains such as mobile app ecosystems.