IST 511: Information and Technology

   		      Fall 2008

                Proj #1 (DUE: Sep. 24)
____________________________________________________________

Overview
--------

IST consists of 30-40 faculty who collaborate each other on many
tasks. Two common collaboration tasks among faculty members include
research "publication" and "funding." This kind of interaction among
faculty may form a complex collaboration network where a vertex is
faculty and an edge is co-authorship in research papers and/or funding
proposals. In the proj #1, therefore, we will use this network to
practice various social network analysis techniques.

Data Sets
---------

Publication list: From the faculty list, home pages of all faculty
members can be found. Typically, each faculty has up-to-date list of
publications on their home pages. You can ignore those faculty whose
home pages or publication pages are not available from the above list.

Funding list: Use this 2002 -- 2006 and 2007 -- 2008. (NOTE: this file is 
only provided for IST 511 by IST research office -- please, do not
distribute this file outside of the class)

Task
----

Step 0: Teams are randomly assigned by Sep. 3.

Step 1: Gather data sets from the two lists and modify/clean them to
suit your purpose. Assume that we use the names of faculty as their
unique ID. That is, if two publication lists use the name spelling
"Dongwon Lee", then we assume that both lists refer to the same
person. The difficulty is that the way names are written are not
uniform. Different pages may use different forms of the same name like
"Dongwon Lee", "Lee, D.", or "D. Lee", etc. You don't have to
automatically figure out how to deal with this.  For now, just
determine your policy as to how you are going to manually determine if
two names refer to the same person or not (e.g., if first initial and
full lastname are the same like "D. Lee" and "Dongwon Lee", two names
are assumed to be the same person). 

If you are more ambitious and opt for higher score, you may
decide to write a small piece of code that automatically fetches
publication pages from the Web, parses them, and extracts names
and their collaboration data sets automatically. However, it is
just fine for you to do this step manually (e.g., if you see the
name "Lee Giles" from the publication page of "Dongwon Lee", then
you mark it in your data as two faculty have collaborated). Note
also that depending on the analysis software that you use, their
input data format varies (e.g., matrix format, excel format)

Step 2: Form your research questions or hypothesis and design your
analysis. In essence, in this step, you need to design what you want
to find out from this study. For instance,

- If your hypothesis is that "faculty collaborates differently,
depending on their gender/rank/major/etc", then you need to think how
to prove/disprove your hypothesis.

- Or, if your research question is "Who is the research
hub/center/boundary/strategic-location/etc of IST?", then you need to
think which measures (e.g., degree, closeness, betweenness) on social
network to use to determine your question.

- Yet another question could involve some kind of dimensional analysis
(e.g., temporal) like "The degree of collaboration has increased over
the time", or "Faculty working on research area X tend to collaborate
more often that area Y because ...", "The average # of co-authors in
research area X is larger than Y because ..."

- It's really open to any kind of questions. The more
interesting/challenging/non-trivial-to-answer your questions are, the
higher score you will get. 

More ideas for research questions can be found here:

- On Six Degrees of Separation in DBLP-DB and More 

- Who is the best connected scientist? A study of scientific
coauthorship networks

- Modeling Idiosyncratic Properties of Collaboration Networks Revisited

Step 3: Analyze the data sets using software programs to
prove/disprove your research questions or hypothesis. There are many
software for analyzing social network analysis. You are encouraged to
explore those and choose your own software (I used Pajek or Graphviz
with good results before). At the end, each team should show:

- Overall collaboration network diagram w.r.t publication and funding

- At least ** THREE ** interesting non-trivial research questions and
describe how they prove/disprove it

Turn-In @ ANGEL Drop Box
------------------------

Turn in your final **ANONYMIZED** report (no author or team names) to 
ANGEL by due date/time (HARD DEADLINE) (but as a title of your drop in
a dropbox, put your or team name so that I can recognize).

Your report should describe how you did for three steps above in
detail like an academic paper (not a dull project report).

Unless there is a problem between team members, all members will share
the same project score.