Last generated: Fri Jul 6 00:43:25 2007 EST.
The Quagga project has been migrated to the phase 2 as
the new Data Linkage
project.
Real data are often dirty. Despite active research on
integrity constraints enforcement and data cleaning, real data in real
database applications are still dirty. To make matters worse, both
diverse formats/usages of modern data and demand for large-scale data
handling makes the problem even harder. The research goal of the
QUAGGA project is, therefore, to contribute to the effective
improvement of data quality by investigating foundational theories,
developing efficient, effective and scalable algorithms, and building
tools and systems. In particular, to surmount the challenges for
which conventional solutions no longer work, we aim at identifying
four tasks: (1) context-aware error detection; (2) scalable error
detection; (3) fixing errors with tools or in systems; and (4)
prototyping and evaluation.
In particular, we re-visit the following well-defined and
well-studied (but greatly overlapping and related) problems from
the database perspective and scalability in mind. Eventually, we
aim at building an integrated solution and toolkit in SQL/RDBMS that
are general enough to be used for various problems under diverse
domains.
Someone needs to unify all these confusing nomenclature...
Members:
Collaborators:
- Min-Yen Kan, National University of Singapore, Singapore
- Jaewoo Kang, Korea University, Korea
- Nick Koudas, University of Toronto, Canada
- Prasenjit Mitra, Penn State University, USA
- Jian Pei, Simon Fraser University, Canada
- Divesh Srivastava, AT&T Labs - Research, USA
2007
- Are Your Citations Clean? New Challenges and Scenarios in Maintaining Digital Libraries, Dongwon Lee, Jaewoo Kang, Prasenjit Mitra, C. Lee Giles, Byung-Won On, In ACM Comm. of the ACM (CACM), 2006 (Forthcoming)
- Group Linkage, Byung-Won On, Nick Koudas, Dongwon Lee, Divesh Srivastava, In 23rd Int'l Conf. on Data Engineering (ICDE), page x-x, Istanbul, Turkey, April 2007
2006
- Improving Grouped-Entity Resolution using Quasi-Cliques, Byung-Won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kang, Jian Pei, In IEEE Int'l Conf. on Data Mining (ICDM), Hong Kong, China, December 2006
- An Effective Approach to Entity Resolution Problem Using Quasi-Clique and its Application to Digital Libraries, Byung-Won On, Dongwon Lee, Ergin Elmacioglu, Jaewoo Kang, Jian Pei, In ACM/IEEE Joint Conf. on Digital Libraries (JCDL), Chapel Hill, NC, USA, June 2006 (Short)
- Search Engine Driven Author Disambiguation, Yee Fan Tan, Min-Yen Kan, Dongwon Lee, In ACM/IEEE Joint Conf. on Digital Libraries (JCDL), Chapel Hill, NC, USA, June 2006 (Short)
2005
- Establishing Value Mappings Using Statistical Models and User Feedback, Jaewoo Kang, Tae Sik Han, Dongwon Lee, Prasenjit Mitra, In 14th ACM Int'l Conf. on Information and Knowledge Management (CIKM), Bremen, Germany, November 2005 Acceptance Rate: 18% (76/425)
- Identifying Value Mappings for Data Integration: an Unsupervised Approach, Jaewoo Kang, Dongwon Lee, Prasenjit Mitra, In 6th International Conf. on Web Information Systems Engineering (WISE), New York, NY, USA, November 2005 Acceptance Rate: 21%
- Blocking-Aware Private Record Linkage, Ali Al-Lawati, Dongwon Lee, Patrick McDaniel, In ACM SIGMOD Workshop on Information Quality in Information Systems (IQIS), page 59-68, Baltimore, MD, USA, June 2005 Acceptance Rate: 42% (11/26)
- Effective and Scalable Solutions for Mixed and Split Citation
Problems in Digital Libraries, Dongwon Lee, Byung-Won On, Jaewoo Kang, Sanghyun Park, In ACM SIGMOD Workshop on Information Quality in Information Systems (IQIS), page 69-76, Baltimore, MD, USA, June 2005 Acceptance Rate: 42% (11/26)
- Comparative Study of Name Disambiguation Problem using a Scalable Blocking-based Framework, Byung-Won On, Dongwon Lee, Jaewoo Kang, Prasenjit Mitra, In ACM/IEEE Joint Conf. on Digital Libraries (JCDL), Denver, CO, USA, June 2005 Acceptance Rate: 21% (59/270)
- On Six Degrees of Separation in DBLP-DB and More, Ergin Elmacioglu, Dongwon Lee, In ACM SIGMOD Record, Vol. 34, No. 2, page 33-40, June 2005
2004
The quagga (Equus quagga) is a recently extinct mammal, closely
related to horses and zebras. It was a yellowish-brown zebra with
stripes only on its head, neck and forebody. The quagga was native to
desert areas of the African continent until it was exterminated in the
wild in the 1870s. The last captive quaggas died in Europe in the
1880s.
The project logo was made out of the figure from The Quagga
Project at South African Museum. This project is an attempt by a
group of dedicated people in South Africa to bring back an animal from
extinction and reintroduce it into reserves in its former habitat.
The figure on the right is the only quagga to ever have been
photographed alive at the London Zoo
mare. Five photographs are known, taken by Frederick York
and Frank Haes circa 1870.
| |
|