Data Details

The test and development data sets have been generated by the social network benchmark dataset generator produced by LDBC.

Query Details

The goal is to execute a set of queries as quickly as possible. There are four types of queries. For convenience, we have broken the sample query workload (below) into individual text files, one for each of the four query types.

  • Query Type 1 (Shortest Distance Over Frequent Communication Paths)

    Given two integer person ids p1 and p2, and another integer x, find the minimum number of hops between p1 and p2 in the graph induced by persons who
    1. have made more than x comments in reply to each others' comments (see comment_hasCreator_person and comment_replyOf_comment), and
    2. know each other (see person_knows_person, which presents undirected friendships between persons; a friendship relationship between persons x and y is represented by pairs x|y and y|x).
    API query1(p1, p2, x)
    Output  One integer (hop count) per line.
    Samples  1k-sample-queries1.txt and 1k-sample-answers1.txt
    Notes: There is a one-to-one mapping between queries and answers. The answers file contains comments after the % character. These are for your own debugging purposes. Submissions must not contain such comments.
  • Query Type 2 (Interests with Large Communities)

    Given an integer k and a birthday d, find the k interest tags with the largest range, where the range of an interest tag is defined as the size of the largest connected component in the graph induced by persons who
    1. have that interest (see tag, person_hasInterest_tag),
    2. were born on d or later, and
    3. know each other (see person_knows_person, which presents undirected friendships between persons; a friendship relationship between persons x and y is represented by pairs x|y and y|x).
    API query2(k, d)
    Output  Exactly k strings (separated by a space) per line. These k strings represent interest tag names, ordered by range from largest to smallest, with ties broken by lexicographical ordering.
    Samples  1k-sample-queries2.txt and 1k-sample-answers2.txt
    Notes: There is a one-to-one mapping between queries and answers. The answers file contains comments after the % character. These are for your own debugging purposes. Submissions must not contain such comments.
  • Query Type 3 (Socialization Suggestion)

    Given an integer k, an integer maximum hop count h, and a string place name p, find the top-k similar pairs of persons based on the number of common interest tags (see person_hasInterest_tag). For each of the k pairs mentioned above, the two persons must be located in p (see person_isLocatedIn_place, place, and place_isPartOf_place) or study or work at organisations in p (see person_studyAt_organisation, person_workAt_organisation, organisation_isLocatedIn_place, place, and place_isPartOf_place). Furthermore, these two persons must be no more than h hops away from each other in the graph induced by persons and person_knows_person.
    API query3(k, h, p)
    Output  Exactly k pairs of person ids per line. These pairs are separated by a space and person ids are separated by the pipe character ( | ). For any person id p, p | p must be excluded. For any pairs p1 | p2 and p2 | p1, the second pair in lexicographical order must be excluded. These k pairs must be ordered by similarity from highest to lowest, with ties broken by lexicographical ordering.
    Samples  1k-sample-queries3.txt and 1k-sample-answers3.txt
    Notes: There is a one-to-one mapping between queries and answers. The answers file contains comments after the % character. These are for your own debugging purposes. Submissions must not contain such comments.
  • Query Type 4 (Most Central People)

    Given an integer k and a string tag name t, find the k persons who have the highest closeness centrality values in the graph induced by persons who
    1. are members of forums that have tag name t (see tag, forum_hasTag_tag, and forum_hasMember_person), and
    2. know each other (see person_knows_person, which presents undirected friendships between persons; a friendship relationship between persons x and y is represented by pairs x|y and y|x).
    Here, the closeness centrality of a person p is
       (r(p)-1) * (r(p)-1)
       -------------------
          (n-1) * s(p)
    where r(p) is the number of vertices reachable from p (inclusive), s(p) is the sum of geodesic distances to all other reachable persons from p, and n is the number of vertices in the induced graph. When either multiplicand of the divisor is 0, the centrality is 0.
    API query4(k, t)
    Output  Exactly k person ids (separated by a space) per line. These person ids are ordered by centrality from highest to lowest, with ties broken by person id (in ascending order).
    Samples  1k-sample-queries4.txt and 1k-sample-answers4.txt
    Notes: There is a one-to-one mapping between queries and answers. The answers file contains comments after the % character. These are for your own debugging purposes. Submissions must not contain such comments.

Evaluation Environment

Processor 2.67 GHz Intel Xeon E5430
Configuration 2 processors (8 cores total)
L2 Cache Size 12 MB
Main Memory 15 GB
Operating System Red Hat Enterprise Linux Server release 6.5 (Santiago)
Compilers GCC 4.4.7 and JDK 1.7.0

Evaluation Process

The winner will be the submission that completes all of the given queries with the smallest total execution time (including the time to read/load the provided data and construct any indices). Submitted solutions are currently being run against workloads on small (1k people), medium (10k people), and large (100k people) data sets. Processing time is limited to 3 minutes for the small data set workloads, 5 minutes for the medium data set workloads, and 10 minutes for the large data set workloads. Submissions that produce correct results within the time limits will appear on the leaderboard ranked by their combined processing time. Additional data sets may be added in the coming weeks. Time limits may decrease as the contest continues.

Submitting

The SIGMOD 2014 programming contest management system accepts submissions of the form [all.tar.gz]. (Here is a sample you can modify.) Each submission must contain the following files/directories:
  • run.sh
    This shell script must take two arguments: [path_to_data_directory] and [path_to_query_file]. This script is responsible for running your executable file(s) that evaluate all of the queries found in [path_to_query_file] on the data found in [path_to_data_directory], and printing the results to standard output (stdout).
  • README
    This file must contain information about the submission, including:
    1. the team name
    2. for each team member: full name, e-mail, institution, department, and degree program
    3. advisor/supervisor name (if any)
    4. a brief description of techniques used for executing queries
    5. all the third party code used
  • src/
    This directory must contain all of the source code. The executable file(s) called by run.sh must be located outside this directory.
  • src/compile.sh
    This shell script must be able to compile the source contained in the src directory. The produced executable file(s) must be located within the src directory. These files must equivalent to the ready-to-execute binary/executable files you provided.
Notes:
  • Please ensure that the following command places run.sh and src/ in the current directory:
    tar -zxvf all.tar.gz
  • Everything submitted must be open source under some license such that we can publish and make freely available all the code from the finalists’ implementations. Submissions which are not fully open source or which cannot be made freely available will be disqualified.
  • We request participants submit executable files so that their implementations can be evaluated even if their submitted source code doesn't compile on our machine (due to compatibility issues, access right problems, etc.).
  • Before submission, please ensure that your output for our sample graph data and queries is the same as the answers we provided. If you have a question regarding the correctness of the provided answers, please contact us at SIGMOD14contest@cs.albany.edu
Submissions will be evaluated and results will appear on the leaderboard. Teams are allowed to make several test submissions before their final submission. Final submissions must be made by 11:59pm EDT, April 15, 2014.

Rules

  • The 2014 SIGMOD Programming Contest is open to undergraduate and graduate students all over the world. Students associated with the organizers, however, are not eligible to participate.
  • Teams must consist of individuals currently registered as graduate or undergraduate students in an accredited academic institution. A team may be formed by one or more students who may or may not be enrolled at the same institution. Several teams from the same institution can compete independently, but one person can be a member of only one team. There is no limit on team size. Teams can register on the contest site after March 1, 2014.
  • All submissions must consist of only code written by the team or open source licensed software, all compatible with the MIT License. For source code from books or public articles, clear reference and attribution must be made. Final submissions must be made by 11:59pm EDT, April 15, 2014.
  • All teams must agree to license their code under the MIT License. By participating in this contest, each team agrees to publish its source code. The finalists' implementations will be made public on the contest website.
  • A team is eligible for the prize if at least one of its members will come to present the work at the SIGMOD 2014 conference.