IST 210: Organization of Data Spring 2009 Proj #1 (DUE: Feb. 26, 1PM) Last Updated: Mon Feb 3 15:35:02 EDT 2009 ______________________________________________________________________ NOTE: Team Assignment (also available in ANGEL) Your IST 210 team project will be to build a substantial database application for a real-world domain. You will design a schema on paper, and then you will create an actual database using Microsoft SQL Server provided by IST Infrastructure. You will populate the database on your own, write interactive queries and modifications on the database, create programs that manipulate the database. In a series of 3 projects, you are asked to design a database-backed application for the on-line retailer, called "X", that hopes to outweigh the Amazon.com someday. Your first task is to: - Come up with a name for "X" - Decide the products to sell (e.g., Books, Electronics, Clothes, etc). Note a few assumptions as follows: - X is a purely on-line based business so that all transactions (e.g., purchase, review, return) are designed for on-line activities. - Nation wide, X has three warehouses (W1 to W3) and three shipping sites (S1 three S3). Products are shipped from a supplier to a warehouse to a shipping center, and then to a customer. - The database for X should handle two types of queries -- one for regular customers and the other for employees. In particular, your E/R design should be able to handle the following scenarios. Consider two users -- a regular customer "Kim" and an employee "Joe" for each scenario. 1. After Kim registers her information, she can log in back using her chosen ID and PWD later. X keeps track of the log-in information of each account so that if one does not log in for more than a year, the account is suspended and the ID/PWD is overwritten. 2. 10 copies of a product "Foo" arrive from the supplier "IGotAll" to the warehouse "W2" located at 220 S. Atherton on 1/1/2009. In addition, 20 copies of a product "Foo2" arrive from another supplier "IGotSome" to "W2" on 1/2/2009. 3. A product "Foo" can have a new or old edition (e.g., 2nd edition of a book or successor model of a particular HDTV TV). If "Foo" has a new edition "NewFoo" or an old edition "OldFoo", Kim should be able to follow the link to see the detailed information of "NewFoo" or "OldFoo". 4. Kim purchases 2 copies of a product "Foo" for $54.50 each (with 20% discount each) and a product "Bar" for $23.00 (with no discount). In her account, Kim can see both purchases in a single transaction that is uniquely identified by 4 digit number (e.g., 1110). Later, Kim can make a comment (i.e., 100 words) and assign a rating (i.e., 1 to 5 with 5 being the best) to each product that she purchased. 5. Effective from 1/1/2009 for a week, all products in some category (e.g., all TVs by Samsung or all books from Oprah's book club) go on sale with 50% discount. 6. Kim purchases a product "Foo" for $100 on 1/1/2009. Once she receives the product, she changes her mind and returns it for credit. If the return is within 30 days of the purchase, she can get her money back (e.g., credited to her credit card account). Otherwise, she needs to pick another product, say "Foo2", of the same (or lesser) value than Foo is. The fact that Kim returns Foo and gets Foo2 instead must be recorded somehow. Whether the return is treated as a new transaction or not is upto your design. In theory, Kim can repeat this purchase and return cycle for arbitrary many times. 7. For any sold products, Joe can retrieve the flow as to how products were shipped from the beginning to the end (i.e., supplier -- warehouse -- shipping site -- customer). 8. Joe likes to know the number of product items (not the number of transactions) and the total amount of those product items that "Kim" has purchased so far for each year. If product items were purchased and returned, then those items must NOT be included in the number of items and total amount. 9. Suppose a product has a category assigned (e.g., SciFi genre for a book or a Game category for Wii). Now, Joe wants to generate a quarterly report showing the total sales of each category per year for 2007 and 2008. Again, total sales should NOT include those products returned. Furthermore, total sales should use only the actual sale price, not the original price of products. For instance, if a product "Foo" originally costed $100 but was sold for $50 with a 50% discount, then the TotalSales should include $50 (actual sale price), not $100 (original price). The final answers would look something like (assuming Electronics to sell): Year Quarter Category TotalSales ------------------------------------- 2007 1 TV-Video $1.5M 2007 1 Audio $1.2M 2007 1 ... 2007 2 TV-Video $2.5M 2007 2 Audio $0.2M 2007 2 ... ... 2008 4 TV-Video $1.4M 2008 4 Audio $1.12M 2008 4 Music $3.2M 2008 4 Game $2.1M 10. One scenario of your own. Note that in the above descriptions, any quoted names such as "Kim" or "Foo" can be replaced by any other arbitrary names. ______________________________________________________________________ Part 1: E/R Design With the above scenarios and reasonable assumptions, in proj #1, you need to design an E/R diagram. Draw an E/R diagram using MS Word or Visio or any drawling tool (do NOT hand-write). List all your assumptions for things that are NOT explicitly specified or ambiguous (e.g., all items can be associated with some discounts). Otherwise, we will grade your design based on the common sense, and you may lose points if ours and yours are different. Don't forget to specify key attributes by * symbol for each entities. Also, specify cardinalities and modalities for all relationships. Consult the PBL 3.1 AFV in class, where various queries that the given design is trying to to answer. In your scenario, you should go through the similar thinking process, and make sure that your design is *expressive* enough to handle all of the above 10 scenarios. You goal is to design a good E/R that supports all 10 scenarios. If your design cannot support some scenarios since you don't have proper entities/relationships or your cardinality/modality is not sufficient enough, then you will lose points. Some E/R design, even though all 10 scenarios can be supported, may have a lot of unnecessary or redundant components in it or the support of a scenario is not very efficient (e.g., if one has to use 7-8 entities to support a single scenario, then in practice, this will result in a lot of JOIN operations and may take a while to process it). Therefore, I'll also consider how "good" your E/R design is as well. In short, the main grading rubrics are: Max points: 6 - Correct support of 10 scenarios: 5 (0.5 point each) - Goodness of the overall E/R design : 0.5 - Misc (notation errors, report, assumption) : 0.5 ______________________________________________________________________ Part 2: Turn-In of Proj #1 Report Your report should consist of at least the following: - On the cover page, list project team name (e.g., Team-01), and all member names - List 10 major assumptions that you made - E/R diagram - For each of 10 scenario, briefly explain why the scenario can be supported by your E/R design (e.g., "Scenario 3 can be supported by the binary relationship between the entities X and Y and their attributes Z...") - Meeting log: a detailed log showing each meeting date and activity: Date Absent Members Activity ------------------------------------------------ 2/3 John Doe brainstorming 2/4 None initial ER sketch ... Do NOT make your report lengthy unnecessarily -- be succinct and get to the point. The length of your report has nothing to do with the score that you are getting. If your project team has some problems (eg, some members don't come to project meetings), you need to inform me early so that I can intervene. Otherwise, all team members share the * IDENTICAL * scores for the projects. TURN-IN: Only *ONE* person from each team drops the report to ANGEL once AND give me a hard-copy in class.