CSI 445/660. Topics in Data Management Systems

Tips on Writing, Presentation, and Reviewing

  1. Elements of Style

  2. Advice on Research and Writing

  3. How to Write a Good Research Paper

  4. How to Give a Good Research Talk

  5. The Task of the Referee

  6. How NOT to review a paper

Parallel Data Processing Systems

  1. [dean04] Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters. OSDI 2004: 137-150

Parallel/Distributed Database Systems

  1. [chang06] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, Robert Gruber: Bigtable: A Distributed Storage System for Structured Data (Awarded Best Paper!). OSDI 2006: 205-218

  2. [thomson12] Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, Daniel J. Abadi: Calvin: fast distributed transactions for partitioned database systems. SIGMOD Conference 2012: 1-12

  3. [xin13] Reynold S. Xin, Josh Rosen, Matei Zaharia, Michael J. Franklin, Scott Shenker, Ion Stoica: Shark: SQL and rich analytics at scale. SIGMOD Conference 2013: 13-24

  4. [zou13] Tao Zou, Ronan Le Bras, Marcos Antonio Vaz Salles, Alan J. Demers, Johannes Gehrke: ClouDiA: A Deployment Advisor for Public Clouds. PVLDB 6(2): 109-120 (2012)

  5. [mahmoud13] Hatem A. Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi: Low-Latency Multi-Datacenter Databases using Replicated Commit. PVLDB 6(9): 661-672 (2013)

  6. [shute13] Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Littlefield, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, Himani Apte: F1: A Distributed SQL Database That Scales. PVLDB 6(11): 1068-1079 (2013)

Graph Data Management

  1. [malewicz10] Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski: Pregel: a system for large-scale graph processing. SIGMOD Conference 2010: 135-146

  2. [karypis98] George Karypis, Vipin Kumar: A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs. SIAM Journal on Scientific Computing 20(1): 359-392 (1998).

  3. [kyrola12] Aapo Kyrola, G. Blelloch, C. Guestrin: GraphChi: Large-Scale Graph Computation on Just a PC. OSDI Conference 2012

  4. [cheng12] James Cheng, Zechao Shang, Hong Cheng, Haixun Wang, Jeffrey Xu Yu: K-Reach: Who is in Your Small World. PVLDB 5(11): 1292-1303 (2012)

  5. [hu13] Xiaocheng Hu, Yufei Tao, Chin-Wan Chung: Massive graph triangulation. SIGMOD Conference 2013: 325-336

  6. [khan13] Arijit Khan, Yinghui Wu, Charu C. Aggarwal, Xifeng Yan: NeMa: Fast Graph Search with Label Similarity. PVLDB 6(3): 181-192 (2013)

  7. [olsen14] Paul Olsen Jr, Alan Labouseur, and Jeong-Hyon Hwang: Efficient Top-k Closeness Centrality Search.  ICDE 2014

Semantic Web

  1. [zeng13] Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, Zhongyuan Wang: A Distributed Graph Engine for Web Scale RDF Data. PVLDB 6(4): 265-276 (2013)

Spatial Data Management

  1. [cudre-mauroux10] Philippe Cudré-Mauroux, Eugene Wu, Samuel Madden: TrajStore: An adaptive storage system for very large trajectory data sets. ICDE 2010: 109-120

  2. [fam13] Huy Pham, Cyrus Shahabi, Yan Liu: EBM: an entropy-based model to infer social strength from spatiotemporal data. SIGMOD Conference 2013: 265-276

  3. [luo13] Wuman Luo, Haoyu Tan, Lei Chen, Lionel M. Ni: Finding time period-based most frequent path in big trajectory data. SIGMOD Conference 2013: 713-724

  4. [armenatzoglou13] Nikos Armenatzoglou, Stavros Papadopoulos, Dimitris Papadias: A General Framework for Geo-Social Query Processing. PVLDB 6(10): 913-924 (2013)

Query Processing/Optimization

  1. [vernica10] Rares Vernica, Michael J. Carey, Chen Li: Efficient parallel set-similarity joins using MapReduce. SIGMOD Conference 2010: 495-506

  2. [armbrust13] Michael Armbrust, Eric Liang, Tim Kraska, Armando Fox, Michael J. Franklin, David A. Patterson: Generalized scale independence through incremental precomputation. SIGMOD Conference 2013: 625-636

  3. [bruno13] Nicolas Bruno, Sapna Jain, Jingren Zhou: Continuous Cloud-Scale Query Optimization and Processing. PVLDB 6(11): 961-972 (2013)

  4. [koutris13] Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, Bill Howe, Dan Suciu: Toward practical query pricing with QueryMarket. SIGMOD Conference 2013: 613-624

Memory Database Systems

  1. [ren13] Kun Ren, Alexander Thomson, Daniel J. Abadi: Lightweight Locking for Main Memory Database Systems. PVLDB 6(2): 145-156 (2012)

  2. [li13] Yinan Li, Jignesh M. Patel: BitWeaving: fast scans for main memory data processing. SIGMOD Conference 2013: 289-300

High Availability

  1. [yang10] Christopher Yang, Christine Yen, Ceryen Tan, Samuel Madden: Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database. ICDE 2010: 657-668

  2. [cao11] Tuan Cao, Marcos Antonio Vaz Salles, Benjamin Sowell, Yao Yue, Alan J. Demers, Johannes Gehrke, Walker M. White: Fast checkpoint recovery algorithms for frequently consistent applications. SIGMOD Conference 2011: 265-276

  3. [ananthanarayanan13] Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman: Photon: fault-tolerant and scalable joining of continuous data streams. SIGMOD Conference 2013: 577-588

Crowd Sourcing

  1. [trushkowsky13] Beth Trushkowsky, Tim Kraska, Michael J. Franklin, Purnamrita Sarkar: Crowdsourced enumeration queries. ICDE 2013: 673-684

  2. [marcus13] Adam Marcus, David R. Karger, Samuel Madden, Rob Miller, Sewoong Oh: Counting with the Crowd. PVLDB 6(2): 109-120 (2012)

  3. [park13] Hyunjung Park, Jennifer Widom: Query Optimization over Crowdsourced Data. PVLDB 6(10): 781-792 (2013)


  1. [tu13] Stephen Tu, M. Frans Kaashoek, Samuel Madden, Nickolai Zeldovich: Processing Analytical Queries over Encrypted Data. PVLDB 6(5): 289-300 (2013)