Abstract:
Don’t worry if you don’t understand what the term “Deep Web” means. “Deep Web” is a vague description of the internet not necessarily accessible to search engines. The Deep Web is often misinterpreted as the “Dark Web”. While browsing the internet, the Deep Web is usually right in front of you, you may just not know it yet. Whether you are searching for unstructured Big Data or trying to answer narrowly targeted questions, it can typically be found somewhere within the millions of Deep Web sources.
Both public and private sector organizations are intrigued by the vast potential of harvesting unstructured content at scale from the internet, tagging entities in the metadata, and curating that semi-structured content into actionable intelligence. There are many questions frequently asked about the process and possibilities for Deep Web harvesting, analytics, and data output.
The Deep Web is a part of the internet not accessible to link-crawling search engines like Google. The only way a user can access this portion of the internet is by typing a directed query into a web search form, thereby retrieving content within a database that is not linked. In layman’s terms, the only way to access the Deep Web is by conducting a search that is within a particular website.
The Surface Web is the internet that can be found via link-crawling techniques; link-crawling means linked data can be found via a hyperlink from the homepage of a domain. Google can find this Surface Web data.