First Int'l VLDB Workshop on Clean Databases (2006)

September 11, 2006 — Seoul, Korea
(Co-located with VLDB 2006)


  • + 2006.08.10: Final Program is ready.
  • + 2006.07.31: Dr. Divesh Srivastava at AT&T Labs will give an invited talk with the title "The Bellman Data Quality Browser".
  • + 2006.07.21: 9 papers are accepted.
  • + 2006.07.14: Review is being delayed. Notification of accepted papers will be announced around July 18 July 21 (Sorry for delay !).
  • + 2006.06.16: Deadline for paper submission has passed.
  • + 2006.05.31: Deadline is extended to June 16 by 2 weeks.
  • + 2006.05.01: Submission Site is open !
  • + 2006.02.26: CleanDB is officially accepted as a VLDB 2006 workshop.

Welcome to CleanDB 2006

The First International VLDB Workshop on Clean Databases (CleanDB) is a  forum focusing on the issues to maintain and improve the "Quality of Data" (QoD) toward clean databases. The existence of poor or erroneous data in databases causes the so-called Garbage-in, Garbage-out problem. For any mission-critical analysis or applications, the first and foremost task to do is to improve the quality of data. However, as the sources of data become diverse, their formats become heterogeneous, and the volume of data grows rapidly, maintaining and improving the quality of such data gets harder. The CleanDB workshop solicits papers on database-centric data quality problems and solutions.

Topics suitable for this workshop include (but are not limited to):

  • + Data quality metrics
  • + Algorithms for data cleaning and reconciliation
  • + Document quality
  • + Data integration and fusion
  • + Data quality and cleaning in information extraction
  • + Data quality and cleaning in semi-structured data
  • + Data quality and cleaning in multimedia data
  • + Data quality and cleaning in graphs (e.g., web, network)
  • + Data quality and cleaning in sensor data


We welcome original, unpublished manuscripts for both short (1 -- 4 pages) and long (5 -- 8 pages) papers inclusive of all references and figures. Long papers should report completed results, while vision papers and descriptions of work-in-progress are welcomed as short paper submissions. Papers must be written in English, and formatted according to either VLDB or SIGMOD proceeding formats (if accepted, camera ready version must be formatted in VLDB format). Templates are located at: Open: Paper Submission Site

Organizing Committee

Program Co-Chairs

  • + Dongwon Lee, Penn State Univ., USA (dongwon @
  • + Chen Li, UC Irvine, USA (chenli @

Steering Committee

+ Surajit Chaudhuri, Microsoft Research, USA
+ Chin-Wan Chung, KAIST, Korea
+ Johann-Christoph Freytag, Humboldt-Universitat zu Berlin, Germany
+ Divesh Srivastava, AT&T Labs - Research, USA
+ William E. Winkler, US Census, USA

Program Committee

+ Laure Berti-Equille, IRISA (France)
+ Angela Bonifati, ICAR-CNR (Italy)
+ K. Selcuk Candan, Arizona State Univ. (USA)
+ Maria R. Ebling, IBM (USA)
+ Ahmed K. Elmagarmid, Purdue Univ. (USA)
+ Helena Galhardas, INESC-ID (Portugal)
+ Venky Ganti, Microsoft Research (USA)
+ Michael Gertz, UC Davis (USA)
+ Mauricio A. Hernandez, IBM Almaden (USA)
+ Matthias Jarke, RWTH Aachen, (Germany)
+ Dmitri Kalashnikov, UC Irvine, (USA)
+ Jaewoo Kang, North Carolina State Univ. (USA)
+ Nick Koudas, Univ. of Toronto (Canada)
+ Mong Li Lee, National Univ. of Singapore (Singapore)
+ Sharad Mehrotra, UC Irvine (USA)
+ Prasenjit Mitra, Penn State Univ. (USA)
+ Amihai Motro, George Mason Univ. (USA)
+ Vivek Narasayya, Microsoft Research (USA)
+ Felix Naumann, Humboldt-Universitat zu Berlin (Germany)
+ Vincent Oria, NJIT (USA)
+ Sanghyun Park, Yonsei Univ. (Korea)
+ Kai-Uwe Sattler, Technische Universitat Ilmenau (Germany)
+ Monica Scannapieco, Universita di Roma (Italy)
+ Anthony Tung, National Univ. of Singapore (Singapore)
+ Can Turker, FGCZ Zurich, (Switzerland)
+ Panos Vassiliadis, Univ. of Ioannina (Greece)
+ Vassilios Verykios, Univ. of Thessaly, Volos (Greece)
+ Ji-Rong Wen, Microsoft Research Asia (China)
+ Qinghua Zou, Microsoft (USA)

Invited Talk

The Bellman Data Quality Browser
(Divesh Srivastava at AT&T Labs -- Research)

Data quality is a serious concern in complex industrial-scale databases, which often have thousands of tables and tens of thousands of columns. Commonly encountered problems include missing data (null values), duplicates and default values in columns supposed to treated as keys, data inconsistencies (violation of functional dependencies), and poor quality join paths (lack of referential integrity). Compounding the data quality problems are incomplete and out-of-date metadata about the database and the processes used to populate the database. These problems make the task of analyzing data particularly challenging. To effectively address such problems, we have built the Bellman data quality browser at AT&T. Bellman profiles the database and computes concise statistical summaries of the contents of the database, to identify approximate keys, frequent values of a field (often default values), joinable fields with estimates of join sizes paths, and to understand database dynamics (changes in a database over time). In this talk, I'll describe the technology underlying Bellman and how it is used to help make sense of complex databases.

Important Dates
(9pm in PST)

June 02 June 16, 2006
      Paper deadline
July 03 July 21, 2006
July 12 July 31, 2006
     Camera ready