Project #1: dbEvents (Due: Mar. 3)


Project designed by Dongwon Lee
Last modified: Fri Feb 11 03:07:39 EST 2005

Initial Drafts of Students: look at others' writings to get ideas.

dbEvents: Web Database (Individual Project)

The ACM dbEvents is an XML-based web interface to a database containing a list of academic events (e.g., workshops, symposiums, conferences, journals) information, and in particular their CFP (Call For Paper) solicitation information. The project #1 is for you to implement similar web-based database and improve it further. Your implementation must be runnable in the following environment: Your web interface to the database should support the following features to users (i.e., By clicking some of the links on the interface, users should be able to achieve all of these). The more features you implement (correctly), the higher score you will get.

FEATURE LIST (10 Features + 2 Extra)

  1. Insert a new event into a database. If users try to insert an event that's already in the database with the same (1) acronym, (2) due-year, and (3) due-month (CHANGED:2005/2/10), then the insertion must be aborted. That is, we assume that the acronym, due-year, and due-month together functionally determines all the rest of fields:
        {acronym, due-year, due-month} --> {type, event-date, location, society, ...} 
    

  2. Retrieve current event list, sorted by:
    1. Post time (by the attribute "post_time")
    2. Due date (the due date of CFP) -- Use this option by default
    3. Acronym (alphabetical order)
    4. Event date (the date when the event begins)

  3. Retrieve current event list, grouped by:
    1. Type (workshop, conference, ...)
    2. Year (2003, 2004, ...)
    3. Country
    4. Society (e.g., IEEE, ACM, ...)
    If there are multiple events within a group, sort them by "due date".

  4. Retrieve events matching a combination of conditions (e.g., view only the "conferences" of "2004" located in "USA")

  5. (Extra Bonus) Subscribe to certain events and get a notification email when such an event is inserted into a database. For instance, once John registers himself by providing initial ID/password and email address, he specifies that he is interested in the event called "SIGMOD" each year. Then, later, when the event is inserted, he should get an email. This feature can be implemented by the database features like Stored Procedures, Functions, or Triggers (not all are well supported in MySQL).

  6. (Extra Bonus) Retrieve top-10 events matching a combination of conditions, and sort them according to their "similarities" like search engines do (e.g., retrieve top-10 events that are located in "State College, PA, USA"). This feature can be implemented by the database feature "approximate query processing". In MySQL, for instance, it is like:
      SELECT DISTINCT MATCH(E.location) AGAINST ('State College, PA, USA') As similarity, E.* 
      FROM   events AS E
      WHERE  MATCH(E.location) AGAINST ('State College, PA, USA')
    

Of course, In addition to these specifics, this project is open-ended. You are encouraged to take more complex and challenging features for extra bonus if you are capable of. The project #1 consists of four parts.


Part 1: Database Schema and Data Preparation

The raw data are stored in a data.xml file as XML format. In part 1, you need to:
  1. Study the raw data in the data.xml file, infer a reasonable schema in both DTD and XML Schema notations and save them as "schema.dtd" and "schema.xsm", respectively. Note that several correct schema can exist for the given XML data.
  2. From the data.xml, you need to find answers to the following questions using an XML query language such as XPath or XQuery:
    1. How many events are "posted" in 2002?
    2. Retrieve all events located at USA.
    3. Retrieve all workshops organized by VLDB society in 2003.
    4. For each month of 2004, show events in reverse alphabetical order using the acronym.
    Put all your XML queries and corresponding answers into a file "answer.txt".
  3. Based on your understanding of the XML data and schema, create your relational schema (i.e., tables and attributes) using SQL. Save the SQL statements in a file named "schema.sql".
  4. Extract data from data.xml file (how can you extract this?), and import them into the relational schema you just created (how can you import this?). Let's call the extraction part as "extract.exe" and the import part as "import.exe".
As an example, the following are schema creation and data population codes for MySQL and IBM DB2 databases.
  1. IBM DB2: company_schema.sql, company_data.sql
  2. MySQL: prep-db.sql (both combined)
  3. Data extraction part: N/A
Turn-In: schema.dtd, schema.xsm, answer.txt schema.sql, extract.exe, import.exe

Part 2: Web Interface Design

The example web interface for "inserting new event into a database" is given in input.html file as a starting point. For "retrieve" commands, you need to design the interface on your own. It can be from as simple as a simple link to as complex as java script-based menu. You are free to use any tools (e.g., Frontpage) for web interface design. At the end, your interfaces must be stored as "interface.html" (or several automatically-generated files). Also, turn in a screen capture of your intereface as a file "screen.jpg".

Here are a few example web database interfaces from ther web:

PSUAmazon.comWine.com

Turn-In: interface.html, screen.jpg


Part 3: Database Connection

In the part 3, you need to implement the aforementioned features in terms of SQL commands, and link the commands to your interface. For instance, to help you out, below are a set of codes in PHP that connect the FORM interface to the underlying database. Instead of PHP, you are free to use any alternatives (e.g., CGI, JavaScript, JSP, Servlet, etc) that can be runnable on the MySQL/Linux environment.
  1. form.php
  2. form-db.php (in MySQL)
  3. form-db-odbc.php (in ODBC)
Turn-In: PHP files and all SQL codes

Part 4: Online Presentation

At the end, your interface must be accessible at a URL of the designated machine TBD). Your final report (hard-copy) must include:
  1. Simple overview of your design and architecture (upto 5 pages)
  2. Something that you want to show off (if you have)
  3. All turn-Ins above
  4. URL to your web interface (TA will actually run it to grade)
Furthermore, all of the turn-in files mentioned above must be put into a directory called "submission" under the URL.