by Jamie Holmes
While the Dark Web is perceived to be a paradise for terrorists, drug cartels, and other shady characters, there are few scientific studies available on what is actually out there. To fill the void, Terbium Labs have recently released a scientific study to answer the question, “What is actually out there on the Dark Web?”
Utilizing their patented technologies, including ‘data fingerprinting,’ a hashing protocol generating one-way cryptographic hashes, along with their large-scale crawler, which constantly indexes the Dark Web, the team at Terbium Labs managed to generate billions of new fingerprints of Dark Web data. As a result, they have found some interesting, yet counter-intuitive, results.
While past reports have indicated answers to the question of "What is out there on the Dark Web?", none have been conducted with the rigor demonstrated by Terbium Labs, with the research piece clearly explaining their methodology, its limitations, and data sources.
By using a random draw from the population of Tor Hidden Services found by Terbium's automated big data crawler, the study aims to minimize selection bias, which is problematic if human decisions are used to select a sample instead, introducing inaccuracies into the results:
“We reviewed a sample of 400 URLs from a single day in our automated crawler’s history. URLs (as opposed to domains) were used as the independent unit within the sample. The sample was selected at random from the population of URLs known to our unrivaled big-data infrastructure that crawls the dark web continuously.”
While the Dark Web has a reputation for illicit activity based on its structural anonymity and a strong focus on ensuring privacy, the sites contained a large proportion of 'Legal' activity according to Terbium’s results, as illustrated below.
Source: Terbium Labs
What does 'Legal' content on the Dark Web look like? According to Terbium Labs, it ranges from Facebook profiles, Scandinavian political parties, personal blogs about security, forums to discuss privacy as well as forums to discuss personal problems that perhaps deserve anonymity, such as erectile dysfunction. However, the study also notes that just because some content is deemed 'Legal', it does not necessarily mean it is safe and can still have the potential to be dangerous.
Scant Evidence for the Arms Trade/Extremism
One of the biggest myths dispelled by the report is the prevalence of weapons on the Dark Net. Although the study recognizes the arms trade is active on sites uncovered with Tor, it is nowhere near as rampant as people would like to think; its exists only in isolated pockets, so much so that in their sample, not a single instance of a weapon site was revealed. Moreover, the study highlights that most of these 'dark' markets responded to terror attacks in 2015 and 2016 by delisting weapons.
Moreover, extremism is also another myth associated with the Dark Web, only comprising less than 1 percent of the sample according to the study, whereby only one instance of extremism was observed for the 400 URLs reviewed. However, the study mentions that an official ISIS '.onion' site existed in the past, surfacing in 2015, but was swiftly taken down by Anonymous. Much like Bitcoin, the results regarding weapons and extremism could suggest that the Dark Web does have some sort of self-regulating property.
Following legal content, the next largest observed categories were 'Drugs' and 'Explicit.' Unsurprisingly, when legal content is excluded, almost half of all content is related to drugs, comprising 44.95 percent of the total from the 400 URLs under scrutiny.
Source: Terbium Labs
'Explicit' content refers to non-exploitative content, namely legal pornography, which goes to show that the desire for pornography is not limited solely to the 'clear' web. Approximately 6 percent of the content by URL contained this type of content.
The chart below shows the confidence intervals based on their results for each category. For categories such as 'Weapons', 'WMDs', and 'Extremism', zero lies in the confidence interval, suggesting that if Terbium Labs were to redraw a new random sample, they may fail to observe any of these categories some of the time. However, there are limits to statistical inference, as the researchers will never know what the underlying truths of the population are to inform the design of such a study.
Source: Terbium Labs
Limitations of the Study
While the results reveal interesting insights into the nature of the Dark Web, one of the study's limitations is that it only provides a snapshot of the Dark Web on one given day. Also, it is a difficult task to quantify the boundaries of the Dark Net. The definition the study went with is any URL with the .onion domain. For example, the results could substantially differ if 'carding' markets, which do not operate with Tor, were included as well as other avenues into unfrequented parts of the web.
While the results have thrown up some surprising results, this could be a result of sampling error, as well as measuring the Dark Web with a count of URLs. If a different approach is taken, the results may change dramatically, emphasized by the closing words of the study:
"Our team of analysts know for certain that weapons and counterfeits do exist on the dark web. Our team is not aware of clear examples of weapons of mass destruction or violent extremism. If we were to take the Bayesian approach to this inference, we could have used this existing knowledge to update and strengthen our inferences."
Terbium Lab's study can be found here.