The World Wide Web is arguably the greatest technological success in history. Starting from zero in 1990, it grew to 16 million pages by the end of 1995. In 2008, more than 1.4 billion web pages are accessible to anybody with an Internet connection and a computer. With the explosion of the Web, indexing and searching the content of all Web documents has become a perpetual challenge, in spite of the continuous technological advancements of Web search engines. However, this challenge is even greater when considering the Web data not accessible by search engines.
To address the problems associated with accessing rich, structured back-end data as well as ontology construction and use, we propose the Semantic Deep Web. As the name suggests, the Semantic Deep Web consists of elements from both the Deep Web and the Semantic Web, especially the hidden back-end data sources, the interface or Deep Web services that access these data sources, and programs to manipulate ontologies.
It’s important not to confuse the Semantic Deep Web with the Deep Semantic Web, which is part of the Semantic Web’s original vision. Whereas the Deep Semantic Web refers to the more complex and AI-oriented levels in the so-called Semantic Web layer cake, the Semantic Deep Web fuses aspects of the Semantic Web with the use of ontology-aware browsers to extract information from the Deep Web.
The primary goals of the Semantic Deep Web are to access Deep Web data through various Web technologies and to realize the Semantic Web’s vision by enriching ontologies using this data. Its research areas include
- Information extraction from the Deep Web, especially e-commerce sites;
- Semantic annotation and indexing of the Deep Web;
- Deep Web schema understanding based on data semantics;
- Semantic Deep Web search engines;
- Semantic Deep Web data fusion and interoperation;
- Semantic browsing and visualization of the Deep Web;
- Semiautomatic ontology generation from the Deep Web;
- Quality of ontology measurements; and
- A quality of Semantic Deep Web search and ranking measurements.