Starting with the Surface

To start on our journey of the different aspects of the web, we’ll begin with the surface; the parts you’re most familiar with. The Surface Web is anything that can be indexed by a typical search engine like Google, Bing or Yahoo. Google has a great interactive story explaining how they index and search the web in depth.

To help you understand how search engines work, open a traditional news or blog site (CNN, BBC, etc.) and begin clicking different links to new article pages. Once you have finished doing that, come back to the blog posting.

If you’re done clicking links, you’ve just behaved how search engines’ crawling technology finds and identifies websites. Search engines rely on pages that contain links to find and identify content. You’ll find that this is a great way for finding new content on the web that most of the people generally care about (blogs, news, etc.).  But this technique of navigating links also misses a lot of content. Let’s go a little deeper to find out exactly what type of content is missed.

To start on our journey of the different aspects of the web, we’ll begin with the surface; the parts you’re most familiar with. The Surface Web is anything that can be indexed by a typical search engine like Google, Bing or Yahoo. Google has a great interactive story explaining how they index and search the web in depth.

To help you understand how search engines work, open a traditional news or blog site (CNN, BBC, etc.) and begin clicking different links to new article pages. Once you have finished doing that, come back to the blog posting.

If you’re done clicking links, you’ve just behaved how search engines’ crawling technology finds and identifies websites. Search engines rely on pages that contain links to find and identify content. You’ll find that this is a great way for finding new content on the web that most of the people generally care about (blogs, news, etc.).  But this technique of navigating links also misses a lot of content. Let’s go a little deeper to find out exactly what type of content is missed.

Moving a Little Deeper

From a purist’s definition standpoint, the Surface Web is anything that a search engine can find while the Deep Web is anything that a search engine cant find.  The Forbes article that we mentioned previously used a company called BrightPlanet’s definition for the Deep Web as the definition for the Dark Web (see whitepaper, below). There are a number of reasons that a search engine can’t find data on the web, today I’ll cover the most common one.

Remember how I had you open up a web page and crawl links? Now I want you to stop and open up a different web page, let’s use the travel site Hotwire this time. I have a challenge for you – I want you to attempt to find the price of a hotel in Minneapolis, MN (My hometown) from April 10 to 12 (Minneapolis is still cold in April). But wait, there’s a catch, you can only interact with the site like a standard search engine would – meaning, you can only click links to get there.

There’s a nice search box that Hotwire allows users to fill out, but you can’t use it. Search engines don’t use search boxes, they just use links. You’ll quickly find that you can’t find the search results you are looking for without a search box. The results of a Hotwire search are perfect examples of Deep Web content.

Other examples of Deep Web content can be found almost anytime you navigate away from Google and do a search directly in a website – government databases and libraries contain huge amounts of Deep Web data. Here’s a few other examples:

1)    North Dakota Court Record Search

2)    Florida Medical License Database

Google search can’t find the pages behind these website search boxes. Most of the content located in the Deep Web exists in these websites that require a search and is not illicit and scary like the media portrays. However, if you go a little deeper in the Internet you’ll find the Dark Web.

Getting a Little Darker

Continuing with our definitions, we’ve learned that the Surface Web is anything that a search engine can access and the Deep Web is anything that a search engine can’t access. The Dark Web then is classified as a small portion of the Deep Web that has been intentionally hidden and is inaccessible through standard web browsers.

The most famous content that resides on the Dark Web is found in the TOR network. In very simple terms, the TOR network is an anonymous network that can only be accessed with a special web browser, called the TOR browser.  This sounds relatively simple, however, the network architecture, software, etc. comprising the TOR network is fairly complex and well beyond the scope of this article.  Most importantly, this is the portion of the Internet most widely known for illicit activities because of the anonymity associated with the TOR network.  If you want to delve a deeper into this dark underbelly of the Internet, you can start by downloading and installing the TOR browser from The Tor Project.

I’ll detail my walk on the dark side of the Internet in an upcoming post, let’s just say if you want something illegal, want something done that’s illegal, or want to sell your illegal wares or dark side services–that dream can become a reality with a few simple keystrokes in the Tor Browser and looking through the various online marketplaces of the Dark Web.

For chart and infograph types, this graphic might clear things up:

 

Inaccurate Definitions

The key thing to keep in mind is the Dark Web is a small portion of the Deep Web. Some media is inaccurately defining both and we want to do our best to clear up the confusion.

Want to learn more about the Deep Web?  Here is a whitepaper that does a fantastic job explaining the Deep Web, you can also download it by clicking here: Download

 

understandingthedeepweb_20130311