What is the Deep Web and why should you care? Whether you are searching for unstructured Big Data or trying to answer narrowly targeted questions, what you need can typically be found somewhere within the millions of Deep Web sources.
The deep web are parts of the World Wide Web, which is completely hidden from the regular internet user. This may shock you, but Google can't find everything. Google is only a Surface Web search engine. The Deep Web is a part of the internet not accessible to link-crawling search engines like Google. From a purist's definition standpoint, the Surface Web is anything that a search engine can find while the Deep Web is anything that a search engine can't find.
This means anything behind the Google Drive, OneDrive, paywall, anything that is password protected, or anything that is dynamically generated on the fly and doesn't have a permanent URL - all of these things are said to comprise the deep web because they don't exist at the surface of the web.
Everyone access the deep web routinely, every day. The emails in your Gmail account, your online bank statements, your office intranet, direct messages in Twitter, photos you've uploaded to Facebook and marked as private. These are all the deep web.
It isn't known how large the deep web is, but estimates from researchers suggest it's likely to constitute the overwhelming majority of all online content.
Another part of the Deep web is Dark web. It is a subset of the Deep Web that is not only not indexed, but that also requires something special to be able to access it. The Dark Web is a term that refers specifically to a collection of websites that are publicly visible, but hide the IP addresses of the servers that run them.The Dark Web often sits on top of additional sub-networks, such as Tor, I2P, and Freenet, and is often associated with criminal activity of various degrees, including buying and selling drugs, pornography, gambling, etc.
Almost all sites on the so-called Dark Web hide their identity using the Tor encryption tool. You may know Tor for its end-user-hiding properties. You can use Tor to hide your identity, and spoof your location. When a website is run through Tor it has much the same effect.
Indeed, it multiplies the effect. To visit a site on the Dark Web that is using Tor encryption, the web user needs to be using Tor. Just as the end user's IP is bounced through several layers of encryption to appear to be at another IP address on the Tor network, so is that of the website. So there are several layers of magnitude more secrecy than the already secret act of using Tor to visit a website on the open internet.
Not all Dark Web sites use Tor. Some use similar services such as I2P - indeed the all new Silk Road Reloaded uses this service. But the principle remains the same. The visitor has to use the same encryption tool as the site and - crucially - know where to find the site, in order to type in the URL and visit.
Common Dark Web resource types are media distribution, with emphasis on specialized and particular interests, and exchanges where you can purchase illegal goods or services. These types of sites frequently require that one contribute before using, which both keeps the resource alive with new content and also helps assure (for illegal content sites) that everyone there shares a bond of mutual guilt that helps reduce the chances that anyone will report the site to the authorities.
The Tor Browser is the main application for accessing the dark web. Tor stands for The Onion Router - the onion metaphor indicates the layers of security that work to conceal a user's location, and the browser enables you to access hidden web sites with the .onion domain suffix. It can also be used to browse the surface web anonymously. Moore & Rid's research suggests that the dark web accounts for only 3 to 6 percent of Tor traffic, with the vast majority of users choosing to use Tor for privacy reasons.