CSI 445/660. Topics in Data Management Systems

Tips on Writing, Presentation, and Reviewing

Parallel Data Processing Systems

•[dean04] Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004: 137-150

Parallel/Distributed Database Systems

•[chang06] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert Gruber: Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!). OSDI 2006: 205-218
•[thomson12] Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi: Calvin: fast distributed transactions for partitioned database systems. SIGMOD Conference 2012: 1-12
•[xin13] Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica: Shark: SQL and rich analytics at scale. SIGMOD Conference 2013: 13-24
•[zou13] Tao Zou, Ronan Le Bras, Marcos Antonio Vaz Salles, Alan J. Demers, Johannes Gehrke: ClouDiA: A Deployment Advisor for Public Clouds. PVLDB 6(2): 109-120 (2012)
•[mahmoud13] Hatem A. Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi: Low-Latency Multi-Datacenter Databases using Replicated Commit. PVLDB 6(9): 661-672 (2013)
•[shute13] Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, Himani Apte: F1: A Distributed SQL Database That Scales. PVLDB 6(11): 1068-1079 (2013)

Graph Data Management

•[malewicz10] Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski: Pregel: a system for large-scale graph processing. SIGMOD Conference 2010: 135-146
•[karypis98] George Karypis, Vipin Kumar: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing 20(1): 359-392 (1998).
•[kyrola12] Aapo Kyrola, G. Blelloch, C. Guestrin: GraphChi: Large-Scale Graph Computation on Just a PC. OSDI Conference 2012
•[cheng12] James Cheng, Zechao Shang, Hong Cheng, Haixun Wang, Jeffrey Xu Yu: K-Reach: Who is in Your Small World. PVLDB 5(11): 1292-1303 (2012)
•[hu13] Xiaocheng Hu, Yufei Tao, Chin-Wan Chung: Massive graph triangulation. SIGMOD Conference 2013: 325-336
•[khan13] Arijit Khan, Yinghui Wu, Charu C. Aggarwal, Xifeng Yan: NeMa: Fast Graph Search with Label Similarity. PVLDB 6(3): 181-192 (2013)
•[olsen14] Paul Olsen Jr, Alan Labouseur, and Jeong-Hyon Hwang: Efficient Top-k Closeness Centrality Search. ICDE 2014

Semantic Web

•[zeng13] Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang: A Distributed Graph Engine for Web Scale RDF Data. PVLDB 6(4): 265-276 (2013)

Spatial Data Management

•[cudre-mauroux10] Philippe Cudré-Mauroux, Eugene Wu, Samuel Madden: TrajStore: An adaptive storage system for very large trajectory data sets. ICDE 2010: 109-120
•[fam13] Huy Pham, Cyrus Shahabi, Yan Liu: EBM: an entropy-based model to infer social strength from spatiotemporal data. SIGMOD Conference 2013: 265-276
•[luo13] Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni: Finding time period-based most frequent path in big trajectory data. SIGMOD Conference 2013: 713-724
•[armenatzoglou13] Nikos Armenatzoglou, Stavros Papadopoulos, Dimitris Papadias: A General Framework for Geo-Social Query Processing. PVLDB 6(10): 913-924 (2013)

Query Processing/Optimization

•[vernica10] Rares Vernica, Michael J. Carey, Chen Li: Efficient parallel set-similarity joins using MapReduce. SIGMOD Conference 2010: 495-506
•[armbrust13] Michael Armbrust, Eric Liang, Tim Kraska, Armando Fox, Michael J. Franklin, David A. Patterson: Generalized scale independence through incremental precomputation. SIGMOD Conference 2013: 625-636
•[bruno13] Nicolas Bruno, Sapna Jain, Jingren Zhou: Continuous Cloud-Scale Query Optimization and Processing. PVLDB 6(11): 961-972 (2013)
•[koutris13] Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, Dan Suciu: Toward practical query pricing with QueryMarket. SIGMOD Conference 2013: 613-624

Memory Database Systems

•[ren13] Kun Ren, Alexander Thomson, Daniel J. Abadi: Lightweight Locking for Main Memory Database Systems. PVLDB 6(2): 145-156 (2012)
•[li13] Yinan Li, Jignesh M. Patel: BitWeaving: fast scans for main memory data processing. SIGMOD Conference 2013: 289-300

High Availability

•[yang10] Christopher Yang, Christine Yen, Ceryen Tan, Samuel Madden: Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. ICDE 2010: 657-668
•[cao11] Tuan Cao, Marcos Antonio Vaz Salles, Benjamin Sowell, Yao Yue, Alan J. Demers, Johannes Gehrke, Walker M. White: Fast checkpoint recovery algorithms for frequently consistent applications. SIGMOD Conference 2011: 265-276
•[ananthanarayanan13] Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman: Photon: fault-tolerant and scalable joining of continuous data streams. SIGMOD Conference 2013: 577-588

Crowd Sourcing

•[trushkowsky13] Beth Trushkowsky, Tim Kraska, Michael J. Franklin, Purnamrita Sarkar: Crowdsourced enumeration queries. ICDE 2013: 673-684
•[marcus13] Adam Marcus, David R. Karger, Samuel Madden, Rob Miller, Sewoong Oh: Counting with the Crowd. PVLDB 6(2): 109-120 (2012)
•[park13] Hyunjung Park, Jennifer Widom: Query Optimization over Crowdsourced Data. PVLDB 6(10): 781-792 (2013)

Security

•[tu13] Stephen Tu, M. Frans Kaashoek, Samuel Madden, Nickolai Zeldovich: Processing Analytical Queries over Encrypted Data. PVLDB 6(5): 289-300 (2013)