IST 511
Project #2: Vertical Search Engine
(DUE: Oct. 30, Nov. 6, 11AM)

Team Assignment (4 or 5)
The vertical search engine is a niche search engine that focuses on specific domains and/or businesses. Unlike MSN, Google, or Yahoo that aim at covering the entire Web as complete as possible, a vertical search engine drills down a focused area for deeper coverage (thus "vertical"). In Computer Science jargon, your search engine will employ depth-first search per se, instead of breadth-first search. In project #2, your task is to build a small-sized (less than 3 GB) vertical search engine using Apache Nutch toolkit. In a nutshell, your team needs to pick a domain to cover (e.g., Penn State Sports, PA Used Cars, PA State Parks), crawl only relevant web pages, build index and DB, and provide keyword search capability using web interface.

Turn-In @ ANGEL Drop Box

