IST 511: Information and Technology Fall 2008 Proj #1 (DUE: Sep. 24) ____________________________________________________________ Overview -------- IST consists of 30-40 faculty who collaborate each other on many tasks. Two common collaboration tasks among faculty members include research "publication" and "funding." This kind of interaction among faculty may form a complex collaboration network where a vertex is faculty and an edge is co-authorship in research papers and/or funding proposals. In the proj #1, therefore, we will use this network to practice various social network analysis techniques. Data Sets --------- Publication list: From the faculty list, home pages of all faculty members can be found. Typically, each faculty has up-to-date list of publications on their home pages. You can ignore those faculty whose home pages or publication pages are not available from the above list. Funding list: Use this 2002 -- 2006 and 2007 -- 2008. (NOTE: this file is only provided for IST 511 by IST research office -- please, do not distribute this file outside of the class) Task ---- Step 0: Teams are randomly assigned by Sep. 3. Step 1: Gather data sets from the two lists and modify/clean them to suit your purpose. Assume that we use the names of faculty as their unique ID. That is, if two publication lists use the name spelling "Dongwon Lee", then we assume that both lists refer to the same person. The difficulty is that the way names are written are not uniform. Different pages may use different forms of the same name like "Dongwon Lee", "Lee, D.", or "D. Lee", etc. You don't have to automatically figure out how to deal with this. For now, just determine your policy as to how you are going to manually determine if two names refer to the same person or not (e.g., if first initial and full lastname are the same like "D. Lee" and "Dongwon Lee", two names are assumed to be the same person). If you are more ambitious and opt for higher score, you may decide to write a small piece of code that automatically fetches publication pages from the Web, parses them, and extracts names and their collaboration data sets automatically. However, it is just fine for you to do this step manually (e.g., if you see the name "Lee Giles" from the publication page of "Dongwon Lee", then you mark it in your data as two faculty have collaborated). Note also that depending on the analysis software that you use, their input data format varies (e.g., matrix format, excel format) Step 2: Form your research questions or hypothesis and design your analysis. In essence, in this step, you need to design what you want to find out from this study. For instance, - If your hypothesis is that "faculty collaborates differently, depending on their gender/rank/major/etc", then you need to think how to prove/disprove your hypothesis. - Or, if your research question is "Who is the research hub/center/boundary/strategic-location/etc of IST?", then you need to think which measures (e.g., degree, closeness, betweenness) on social network to use to determine your question. - Yet another question could involve some kind of dimensional analysis (e.g., temporal) like "The degree of collaboration has increased over the time", or "Faculty working on research area X tend to collaborate more often that area Y because ...", "The average # of co-authors in research area X is larger than Y because ..." - It's really open to any kind of questions. The more interesting/challenging/non-trivial-to-answer your questions are, the higher score you will get. More ideas for research questions can be found here: - On Six Degrees of Separation in DBLP-DB and More - Who is the best connected scientist? A study of scientific coauthorship networks - Modeling Idiosyncratic Properties of Collaboration Networks Revisited Step 3: Analyze the data sets using software programs to prove/disprove your research questions or hypothesis. There are many software for analyzing social network analysis. You are encouraged to explore those and choose your own software (I used Pajek or Graphviz with good results before). At the end, each team should show: - Overall collaboration network diagram w.r.t publication and funding - At least ** THREE ** interesting non-trivial research questions and describe how they prove/disprove it Turn-In @ ANGEL Drop Box ------------------------ Turn in your final **ANONYMIZED** report (no author or team names) to ANGEL by due date/time (HARD DEADLINE) (but as a title of your drop in a dropbox, put your or team name so that I can recognize). Your report should describe how you did for three steps above in detail like an academic paper (not a dull project report). Unless there is a problem between team members, all members will share the same project score.