Tech'ed Up

The Deep & Dark Web • Matteo Tomasini (District 4 Labs)

June 15, 2023 bWitched Media
Tech'ed Up
The Deep & Dark Web • Matteo Tomasini (District 4 Labs)
Show Notes Transcript

Cyber defense expert, Matteo Tomasini, joins Niki on the podcast to shine a light into the deepest, darkest parts of the web. He explains the differences between the “Clearnet" aka the surface web, deep web, and the dark web. While they dive into the dangerous - and legitimate - motivations for anonymous web browsing, Matteo also provides helpful tips for protecting your own privacy online. 

"…the hard part is a lot of people just don't know. So, they just kind of will blindly visit websites, click on things, allow location, do this without understanding what the repercussions of that are." -Matteo Tomasini

Intro: 

[music plays]


Niki:  I’m Niki Christoff and welcome to Tech’ed Up. 

On today’s episode, cyber defense expert, Matteo Tomasini, joins remotely from Brooklyn to help demystify the deep and the dark web. We talk browsers, VPNs, and mirror sites, and it wouldn’t be summer 2023 if we didn’t touch on AI. Matteo has an interesting take on both the good and the bad of the 5% of the internet that most of us don’t ever visit. 

Today's guest, joining me remotely, is Mateo Tomasini. Welcome! 

Matteo: Hi. Thank you for having me. 

Niki: You and I were sitting next to each other at a dinner. And I was like, “What are you doing for a living?” And you were talking about the dark web and then you were talking about the deep web, and I was thinking, “I don't totally understand how this all works. And if I don't understand it, nobody understands!” And, as I am want to do, I dragged you onto this podcast. 

So, I appreciate you taking the time to help me and help other people understand how this is all structured and what it means.

Matteo: Yeah, no, absolutely! I'm happy to be here. 

Niki: So, maybe we just start with the concept of, like, the surface web, what we all use, what the deep web is, and what the dark web is. Just kind of in layman's terms and how you would describe it to someone. 

Matteo: Sure. Absolutely. So, yeah, like, as you kind of already mentioned, the internet can be kind of divided into three strata or layers.

There's the, the surface web, the deep web, and the dark web. The, the surface web would be basically everything that has been indexed by major search engines. So, Google, Bing, Yahoo, whatever it is. So, when you go to google.com, you type in a query, it returns results. And because those results have an index because those results are on the surface web, you can see them. You can view them.

The deep web is kind of one level below that. So, that's really where you need to  make a jump to get to that data. So, it might be data that's in a database like vehicle records online or even political contributions, password-protected databases, like Westlaw, which helps you look up litigation, or really anything like that.

Then, kind of one level below that is, kind of, the dark web. And the dark web is a relatively small percentage of the internet, maybe five to 7% at any one time. And the dark web is basically a part of the internet that requires specialized software to access. There are multiple darknets within the dark web.

So, there's different services that run on the, the dark web that you can access. The most popular one is Tor, but I can kind of stop there and see if you have any questions so far. 

Niki: Okay! Yeah. I do have questions. [chuckling] [Matteo: Yeah.] Okay, so backing up, I, when I started, said the surface web, but then I've heard, heard people call it clearnet or, like, am I saying that right?

Matteo: Yeah. 

Niki: What would you say?

Matteo: It's, it's interchangeable.  [Niki: Okay]  It, it really is. And a, a lot of these definitions are, are pretty loose. Like, even darknet versus dark web. Like, people say, “Oh, there's dark, there's the dark web, which includes different darknets.” 

Niki: So, the clearnet, the surface web is anything basically that you can Google and you see the results. So, [Matteo: Yeah] Like, the Financial Times landing page is on the surface web, but you have to go behind a paywall to get to their article. So, you can't just search that and find it.

Is that the, sort of, the difference between surface and deep web?

Matteo: Yeah, basically. Basically, yeah. So you would need to have, like, kind of something in between you getting to that result. Google has been proactively indexing more and more and more of the internet. So, like, the deep web is, technically, kind of shrinking just because a lot has been indexed now.

Niki: Okay. Oh, I remember, actually, when I worked at Google, that was not the case. You had to literally go to Twitter and then search within Twitter to find tweets. They weren't indexed by Google. 

Matteo: Yeah. And that's still gonna be the case most of the time. But, like, Google and other search engines have been indexing more and more of them.

So, if you type in, search for, for Twitter, and then you say, “I wanna search for certain tweets.” You're not gonna get all of 'em because obviously, they don't have the entire Twitter index. Especially now, Musk isn't gonna give it to them, but you could find some tweets that mention maybe news articles or posts or certain things that have been shared a million times.

And to the extent that they've risen to the level of being indexed by Google. But for the most part, if you're gonna wanna search a kind of unique database, whether it's Twitter, Facebook, court records, vehicle numbers, vehicle identification numbers, political contributions - you're gonna wanna go to that database to do that search. And because you have to go to that database, it's technically the deep web. 

Niki: Okay. Got it! 

One thing I've read before we started chatting today is that the percentage of things that are indexed on the surface web is, like, actually pretty small of the whole internet. Is that true? 

Matteo: Yeah. Yeah. Most is gonna be on that deep web, like there's just sooo much out there. ‘Cause you have to consider A, there are these kind of databases that are behind paywalls, or you need credentials to log in to get to them.

There's also gonna be kind of ephemeral websites, things that are only up for like a day or two. So, those aren't sites that Google or Bing or whoever is gonna be able to index in the first place. 

Niki: Oh, I never thought about ephemeral sites! I like that. Okay, so then, the dark web is part, kind of, part of the deep web. It's, like, because it's not easily accessible, or is that inaccurate? 

Matteo: Technically, it's like a different level. Like, technically, it's deep web in so much as it's unindexed, it's not properly indexed by these major search engine engines. There are small little dark web Googles, kind of, that try to index a lot of the deep web.

There's a search engine called Torch, for example, that tries to collect as much data from the dark web so that they can then, so, it’s then searchable by their users. But no, for the most part, like, the, really, the key difference is that you can get to the deep web from Firefox, Chrome, Brave, whatever browser you use.

You just type in a query. You do one jump, you get to that database, and then you do your search. Boom! You're in the deep web. 

The dark web, you can't do that. The dark web depending on which darknet, inside the dark web you wanna access, you have to act, use a specialized, like, browser-specialized tool, or a different program altogether. So you can't access it with a normal browser.

Niki: Okay. So, what you just said is you can't use Chrome, or Firefox, or Brave, which I've never heard of till just this minute [chuckling]!  [Matteo: Yeah] Should we be using that? Should we be using that? What should we be using? You're using Firefox, I think, and you’re in security. Should I be using Firefox? 

Matteo: Oh yeah. I think Firefox is, is, is a great browser to use.

The Mozilla Foundation is kind of a privacy-first organization. Brave is a modified version of Chrome that's more privacy-centric. And, and that's honestly, one of the reasons a lot of people use the dark web altogether is because of privacy concerns. And their privacy concerns are incredibly valid, especially when you consider the information that browsers are giving up about you.

So, that's why I like to, and this could be a whole different podcast [chuckling], but I like to use kind of more privacy-centric browsers like Firefox, like Brave, some other ones out there, that don't give up as much information about you. 

Niki: So, okay. I actually love the privacy tips while we're doing this because when we get into the dark web, we're gonna talk about what's living there. But I, I've gotten much more concerned about privacy, not just from what's collected by search engines, but then whatever's collected or indexed is making me more vulnerable, my privacy, my data. 

So, okay! You like Firefox, you like Brave, which is a subset of Chrome. Sounds like?  And the version of that that you need to get to the dark web is Tor? T.O.R. 

Matteo: Yes. So that's, so that's to get to the kind of the most popular dark web, right? So, there's, I guess, four or five different darknets at this point. Right? And collectively, they form the dark web. We have, there's one called Freenet, I2P,  Lokinet, and then there's ToR, which is, like, the most popular one.

So, most people, when they say, “I'm searching the dark web,” they mean to say they're searching Tor, but there are all these other platforms, and Tor is the most popular one. It has the most number of users, most number of content. I mean, there's a reason most people use it and it’s, it's the easiest to use, easiest to configure, the easiest access.  And, and yeah, people want to use it because it has the, the most users.

 So that, that is, yeah, that is the most popular one.

Niki: Okay. So that's the most popular darknet of the dark web? 

Matteo: Correct

Niki: Okay. So, I, I think the dark web has a PR problem. [chuckling] I mean, it's not helpful that it's called the dark web and also it's perceived to be sort of rife with illicit activities. 

But I think it's worth talking through, cause you spend a lot of time in that space, the sort of positive things that it can be used for and why people might find themselves there. And also maybe just, technically, like, my understanding is essentially Tor, like, jumbles your searches and sends them all around and they're just hard to trace what you're spelunking around for. 

Matteo: Kind of- Yeah!

So, I mean, the, the main reason that people are attracted to the dark web, in general, is the promise of privacy and anonymity. And I mean, that's why it was created in the first place. 

I think Tor was created in the nineties by the Naval Research Intelligence to basically mask Naval Intelligence Officers who wanted to do things privately. As, as one wants  to do if you're an intelligence officer. So, a lot of people were drawn to that promise in the beginning and that, that, that continues to this day. 

And it doesn't have to be for bad it can be for good as well, right? The idea is when you make a request to a website, and this is part of the reason why Tor is slow, that's one of the big problems and why a lot of people don't use it is because it's, it's pretty slow. 

And the reason it's slow is because, essentially, you have to make three or four different hops before you can get your content. So, what happens is when you make a request, you're basically making that request to what's called an entry node. So, this is, kind of, a machine, and this is all decentralized. It's not one central repository of machines that have this data. There's, anyone can volunteer to set up a, a node within the Tor network.  

So, you make a request to this entry node. And this entry node will know who's making the request. So, they'll know your IP address and they'll know where eventually you want to end up. Right? If you're trying to go to, let's say, the dark web version of the BBC.  ‘Cause the BBC news has a dark web mirror. You're saying, “I want to search, go to the BBC News.” 

So, you first, kind of, you asked that first entry note. “Hey! I want to go there.” So, that entry node then talks to another node, kind of, like, in the middle. And that middle node does not know your original IP address. All they know is where you wanna end up, right? 

And then, that goes to an exit node, same idea: that exit node knows nothing about you whatsoever. They don't know the, the second jump, they don't know the first jump, all they know is where you wanna end up.

And then, eventually, that exit node will go to that BBC News website. And then you'll return the content. 

I hear people ask like, “Oh, well. So, it's basically kind of hiding your IP address, like a VPN.”  It's, like, “Yes and no.” Right? Because with this Tor network, you're going through a decentralized network. 

So, the middle node doesn't have any idea what's going on. Only the first node does. Whereas in a VPN, you have this centralized system where all it's doing is obscuring your IP address, but that's it. Like, you're talking to one server. Right? And it's, like, the big ones are NordVPN. That's, it's a really big one. Nord will see your request. They know who's making the request. They know exactly what you want cuz then they're gonna return that content to you. 

It's very, very, very centralized and they know everything about you. They see everything about you. So, that's not something you would want to do if you're very interested in being super private. The VPN's not gonna cut it. [Niki: Mm-hmm] You would need to use something like Tor or these other darknets.

Niki: So, the VPN is basically just obscuring your IP address, sort of, superficially or for the end website, but they have everything. So, if Nord gets like, I don't know, say a subpoena, they could easily just be, like, “Oh, this is what this person was doing.”

Matteo: Yeah. They, they will know everything. And a lot of 'em will say, “Oh, we don't do logs. We don't do anything.” But like how well do you really know these VPN providers? 

Like, a lot of these are in odd jurisdictions if they get requests from, from law enforcement or from government. Right? A lot of people want to be private, not to do illegal things, but to maybe get around government censorship or to contact a journalist as part of a whistleblower thing. And that VPN provider could just as well tell that government, tell that company, “Hey, this IP address, which is inside your company, or is associated with one of your employees, or whatever is trying to do this. Let's block it. Let's stop it. Let me share that information with you.”

Whereas with Tor, you don't have that issue because, like I said, you're going through these three different nodes with varying levels of anonymity where one doesn't know what the other is doing. And none of 'em know who you are except for that entry node, which doesn't become an issue.

Niki: I remember during the Arab Spring, that's when I first started hearing about Tor and sort of these autocratic states, and maybe you were organizing protests, or say you live in China and you want to freely search, you would use something like Tor. [Matteo: Yeah] 

Which is, like, a positive use case. It's to avoid government censorship. 

Matteo: Yeah, exactly! Exactly. So, it's, it's, it's gonna let, it's not gonna let governments know what you're searching for. It’s like, so if you go onto a Tor website and it's whatever kind of search, you don't want the government to know what you're doing, like, you'd wanna search it. 

Or if you want free and uncensored access to news, you can go to Tor, like, a lot of repressive governments will censor BBC, they'll censor the New York Times, they'll censor, like, news organizations like ProPublica.  

These organizations realize that they all have a dark web mirror. They all have that ability for people to learn, to understand what's going on in the world without having the government censor that.

Social media's notorious for, for tracking, et cetera. There are dark web social media platforms where you can go on and communicate freely with other people.

Twitter and Facebook had, I don't know if they're still up, but they had, y’know, dark web mirrors for the longest time for that same reason. Cuz you can share more freely there.  So, there's, there's a lot of good, I mean, there is, I mean, what, what draws these kind of peoples into the dark web is the same thing that draws bad people to the dark web as well. 

So there's, there's both, but I mean, the, the surface web too, surface web's not all nice, right? The deep web's not all nice. There's a lot of bad stuff that goes on there. In fact, like some of the, the biggest hacker sites are not on the dark web. They're on the deep web. You can access 'em today with your browser. 

Niki: Interesting. Okay! So I wanna talk about the bad guys, but just to go back really quickly. So, when you first said that the BBC had a dark web mirror, I've never heard that in my life!  But it makes sense because - and you said ProPublica, which I love ProPublica!  

So, basically these news sites are trying to make it available for people in potentially autocratic regimes to freely read the news. So, they are taking the time to create a mirror in this space. 

I never heard that before just now!

Matteo: These mirrors are essentially replicas of their surface web clearnet sites that are just available on the dark web because they want people to access it. Same way they'll have, like, the CIA has a dark website cuz, y’know, they want anonymous tips. 

Different regulatory organizations will have kind of like little dark web drops so that you can anonymously make a complaint or let people know about corruption or whatever it is.

There's, yeah, there's a lot that, a lot of good that comes from anonymity. 

Niki: Okay. So, then let's talk about the bad guys and what you do for a living. 

[Matteo: Sure] 

Niki: Okay. What do you do for a living? 

Matteo: [chuckles] So yeah, I, I, I am a managing director and Head of the, the Cyber practice at Prescient, which is a consulting firm in, in the US.

So, within the cyber practice, we do a lot of deep and dark web intelligence.  So, that includes doing assessments on individuals and companies and helping them know what their dark web exposure is. ‘Cuz , if we mentioned earlier, a lot of it is unindexed, so it's, it's kind of like flying blind.

Like, you don't know what is being said out there unless you have something or someone collecting information from there. So, we do a lot of that kind of work. We do a lot of the, kind of, typical incident response, digital forensics, do a lot of online footprinting. 

And then actually, recently, co-founded a company called District 4 Labs that is only about dark web data. So, there it's just identifying, compromised PII (person identifying information), on the dark web from these hack databases from these big scrapes that happen from weird little dark web databases that hackers trade back and forth and, and kind of using that data for good.

To help investigators, to help law enforcement, corporate security, to track down bad guys. To identify who's behind certain threats, to help identify someone's online footprint.

Niki: Yeah. So, I think this is really interesting. You said investigations and, and sometimes you'll have famous people or high net worth people who have no idea what's going on, but obviously, it's, there's money to be made on the dark web [Matteo: Mm-hmm], sort of, trading their information or trying to access things, or I guess hack them. I don't know? 

Matteo: Yeah!  Yeah, no, there, there's, there's a crazy amount of data [chuckles], like, in, in the, in the deep and dark web in these databases. And it's, it's hard; you just don't know what's out there. Like, everybody has probably gotten notifications in the mail or email saying, “Hey, you were part of this breach, you're part of this, part of that, et cetera,” but it's hard to know: A, whether that data ever actually surfaced on the dark web.  So, you don't know. And then B, there's, there's thousands of other little websites, some companies that you never get these notifications for, that you just, you're part of this data. And a threat actor can get a really good understanding of who you are by looking at this data, right? 

So, like, one breach might have an email address and name. Another breach might have that same email address and your license plate number and your phone number.  Another breach might have a list of all the financial transactions you've done over the last year, have bank account numbers. And without knowing that you, you can't properly defend yourself against that.

Or we do a lot of work for high net worth individuals and they do everything possible to disassociate, disassociate themselves from an address, right? They'll put it in the name of an LLC when they buy the house, they'll, they'll send packages to a PO Box. They do all those things but one time, they accidentally sent their Drizzly order to their house [Niki: laughs], and Drizzly got breached, and now the threat actor knows where they live, and that's where they get deliveries. 

So, it's, there's just so much, so much, so much out there. And a lot of it is on the surface web. A lot of it's on the deep web. A lot, it's on the dark web, but until you kind of get that full picture - you need to be able to explore all three to get that full picture. Otherwise, you're only getting a part of it. 

Niki: Yeah. And the average person, I mean, you can search on Google to see what's on the surface web, but even then it's hard to take it down, but knowing everything that's out there. 

Also, by the way, for those who don't use it, Drizzly is booze that’s delivered, which why is why I laughed, which is why I think actually that's probably how they make mistakes on [Matteo: chuckles] their address for their on-demand booze. [chuckling]

So, let's back up just a little bit cuz I'd love to hear how you're thinking about trends across the dark web. So, I know people probably have heard of the Silk Road and have, like, the gist of it, and I know it's ten years old, but I think it's worth talking about what that was like, and then sort of what the dark web looks like now when we're talking about illicit activity and people trying to intentionally obfuscate criminal actions on the dark web.

Matteo: There is some nice parts of the dark web, right? Where people are reading the news, they're sharing cat pictures, et cetera. But I'd say probably, I don't know, like, 40, 50% has bad stuff on there, right? Illicit material.

And that's everything from hackers discussing how to hack, sharing exploits, sharing data from different breaches, maybe selling illegal materials whether it's drugs or selling exploits, selling data. And they also offer services. Right? Bad services. And that's out there. 

And the reason why people are on the darknet as opposed to selling these services on the surface web or even the deep web is because of that anonymity that they can kind of get.

A lot of these threat actors will sell things on, on the dark web cuz they feel comfortable selling dark items there and they want their customers to quote-unquote to feel comfortable ‘cause it's shrouded in this privacy.

And one of the biggest ones, one of the first ones really, like, kind of, modern dark web markets, was the Silk Road. And that was starting about. 2011. Yeah, mid-2011. Lasted for a couple years and it was, it was kind of an interesting concept. Like, there they were selling all these illegal drugs. They were selling-  had, sort of different services like murder for hire, even, like, all, like, the, the really crazy stuff you hear about the dark web when you think about dark web, that's the Silk Road that was happening there. 

Niki: [interrupts incredulously] And they were really doing that?!  Sorry, I mean, I guess I think of it a little bit, like, an urban legend, but really? Hitmen were on the Silk Road?

Matteo: Yeah! Yeah! I, I, in fact, I think the, the founder, Ross Ulbricht, the founder of Silk Road, he actually, I think, spent, like, I dunno, half a million dollars hiring hitmen to kill people. They never actually went through with it, but that was, that was a thing cuz [chuckling], it's also criminals, right? Like, you can say you're gonna do something and, you don't always have to do it. Right?

It was not the most honorable place. I mean, that was one of the things he was trying to do later on was establish more of an escrows kind of system where you can kind of have more fidelity in the process. But at the end of the day, it's, it's criminals selling to criminals a lot of the time.

In 2013, FBI finally caught up to him, I think. Actually, mostly the IRS helped a lot with the investigation and was able to, to put him away. But y’know, as soon as that was put away, I think, different administrators of the Silk Road started their own market. Right. 

And since then, dozens of other markets have sprung up and, and closed down, usually by law enforcement. 

Niki: So, one thing that when I was sort of prepping for this episode, I ended up listening to this really well-done podcast called Hunting Warhead, which is about child sexual abuse imagery on the dark web. We'll actually put a link to it in the notes cuz I think it's worth a listen. They talked about how it's exploding on the dark web access to these materials because it's, like, the only place that they feel that they can be. And then, it's sort of this cat-and-mouse game with law enforcement. 

I was surprised when you just said, 40 to 50% is sort of illicit activity. That's lower than I thought! But I also didn't realize like all these news sites and everything are on the dark web. And I, I guess if you go through the population of countries where people might be in regimes where they need to search- that's a big, that's actually a decent amount of it.

But it, is that a trend that people are seeing that it's, like, child sexual abuse, imagery versus drugs? Or are there trends? Or does anybody know? Can you really even know? 

Matteo: Yeah, you, you can definitely know because, while there's no Google indexing the dark web, there are other companies kind of indexing and collecting data from the dark web.

We personally do a lot of it as well, just kind of scraping different forums, extracting information from forums, from different messaging platforms, from different marketplaces to kind of see what is trending, right? Like, like, especially marketplaces are, are pretty easy, right? ‘Cause they have different categories. They'll have data, drugs, images, et cetera.  So, you can kind of get a sense of what's more, I guess, in demand. 

But there's also messaging platforms and that's, I guess, technically, would be kind of more of the deep web. That’s using messaging platforms like Telegram, which you might have heard of, or Signal, or Skype, to kinda go back in time, or WhatsApp. These are platforms where you can create groups around similar or shared interest.

So, there are a lot, a lot, a lot of hacking groups that have Telegram channels. There's also child abuse, where they share, list, images of children on Telegram. So, it's not only happening on the dark web. In fact, I think sometimes it even happens on, on Facebook, like, in, like, closed groups cuz they think it's closed and no one can see. Like, It's, it's, it's insane.

And the, the problem, or one of the things with, with the darknet is there is a barrier to entry, right? You have to know how to access it.

Niki: Yeah! That's, so that's actually what one of the things that was discussed in this podcast was that people running these horrific sites would do the same thing, just the average person does, which is reuse their usernames. And so, once they could identify 'em on the dark web, they just found them on Facebook! 

Matteo: Yeah! Exactly! And, and that's one of the interesting things, like, and that's why a lot of people, for example, use VPNs, right?

Because they think, “If I'm not sharing my IP address, I can't be tracked,” but the interesting thing is, like, over the last, I dunno, 10, 15 years, like, IP address have become less of a relevant issue. There are much, much, much better ways to track you: online cookies, browser fingerprinting. These, these give you a much better sense of who you are, what you're doing than an IP address ever can.

But because people have this sense, like, “Oh, they don't see my IP address ‘cause that's, then I'm good.” That's, that's not how it works!  

I mean, every, every time you visit a website, your browser is giving a lot, a lot, a lot of information about you. It's letting you know, like, your screen resolution, maybe plugins that you have installed, fonts that you have installed, different, different services you might have used. And individually doesn't mean a lot, right?

Like, “Oh, Matteo has his monitor set to this. The plugins he has on his browser are this. The, the fonts he has installed are this.” Like, doesn't mean anything, but everything combined together is incredibly unique. 

And you can basically fingerprint someone, like, “Oh, that is Matteo Tomasini. And here he is visiting this website again. He's using a completely different IP address cuz he's using a VPN. And he thinks he's smart, but I know it's him because he has the same exact configuration and everything.” 

And that's why one of the other things that these darknet browsers do is they eliminate that. They don't let, y’know, JavaScript run. They don't allow requests from cookies, things like that. 

Niki: That creates a lot of friction, which is why a lot of people are still doing things on surface web and the deep web. 

Matteo: These cookies that, that they're not just tracking you on one website. They're tracking you from website to website to website. So, they can see where you have an account, right? Because they see you logged into Facebook. So, now that company that you're buying a sink from knows you have a Facebook account.

So, now they're gonna target you on Facebook. Browser fingerprint is, is, is really, really, really unique. So, yeah, helps you kind of track someone. 

Niki: Okay, so that's how we're being tracked!

I do wanna talk just quickly at the end about any tips you have for privacy, but before we do that, I recently read an article that AI is being used, I think, by folks in South Korea to, sort of, scan the dark web?  Or try to use AI to come up, kind of what you were talking about, you guys can go through and look at like drugs or for sale, and this is for sale, but they're trying to figure out what's on the dark web.

Do you think that's a thing? What is, sort of, your assessment of it? 

Matteo: Yeah, so it's, it's, it's kind of interesting. I mean, so, with the whole Chat GPT thing, like, in creating these large language models to help create conversational bots and, and to help answer questions, et cetera. So, the whole idea is we need; you need to first train it on a data set, right?

So, a ChatGPT, Bard (Google's thing) like, they're trained on- so Google would be on their websites, maybe on like their Quora on Ask Questions, like, whatever it is to try and get a sense of how people talk, how people should respond, and, what information is out there, right?

Based on all these billions of pages that they've indexed that have all this information but on the, on the dark web, it's, it's a little different, right? Especially, like, kind of, on the illicit side, right? 

People talk in slang, right? And kind of they're referring to different exploits in not natural English. They're using kind of a different vernacular.  So, these models have not been trained on the vernacular. So, like, they wouldn't be able to go on the dark web and understand this is what people are talking about because it, for all they know, it looks like they're talking different language and they're not trained on them. 

So, what these, I think, South Korean, researchers did was basically train a large language model, kinda like another ChatGPT only on the dark web. They basically crawled Tor, I think is, is the document that they used instead of any of the other ones, and basically looked, all right, “This is what people are saying, this is how they're saying it,” so that they could better classify content on the dark web. So, that they could look at something like, “Oh, this is a new ransomware shame site.” Right? “Where they wanna out someone from being hacked and pressure them to pay the money. Or this is, talking about some exploit for a company.”

So, it's, it's interesting in, in that it'll help potentially even automate ransomware negotiations with, with threat actors. Maybe, who knows? But just the fact that it can kind of help classify things better is interesting, but at the end of the day, there's already a lot of tools that are already doing this.

But it's, it was just kind of like an interesting exercise, kind of, like, a, a thought exercise. 

Niki: It is an interesting thought exercise! I hadn't thought about it, but if there are sort of code words that people use or vernacular that they're using that isn't gonna be in - [chuckling] I can't; I actually hadn't thought about ChatGPT using Quora, which is sort of maybe an indictment of why you get weird relationship advice on [both chuckle] from ChatGPT.

But I think that that's, it's interesting and maybe we'll learn something from it, but to your point, there are people, including you, who are doing like really targeted investigations of things, which is different than having like a general assessment of what's happening. 

Matteo: Right. Yeah. I mean, with all these things, it's kind of “Garbage in, garbage out.”

Like, if you trained it on a good data set and you trained it well, you're gonna get good results. Prior models weren't based on the dark web and, and, unfortunately, fortunately, they just, they, they used weird language there. They don't, it's, there's a lot of slang, a lot of kind of slang that they use that is just not obvious unless you've kind of been in the dark web for a while or unless you train classification or a large language model on that.

Niki: Yeah, it's super interesting!

Okay, so, last thing, which is not exactly the topic of this, but you have some thoughts on, is just basic cyber hygiene. So, I use Signal, which a lot of people in Washington DC use, because that's how you talk to reporters, because it's encrypted. In DC, people don't tend to use Telegram [Matteo: mm-hmm] because I think it's owned by Russian nationals? I'm not sure if that's even true, but nobody here uses it. They use Signal. 

Matteo: Yeah. No, that's, that's the preferred messaging platform. Yeah, Telegram is, it’s a private company. The source code isn't laid bare.

Whereas Signal is, is is something that's vetted by a large, large, large community. So, that's what I use more than Telegram, more than WhatsApp. If I wanna do secure communications or even just talking with friends and I just don't want to, to deal with iMessage or whatever, I'll, I'll use Signal.

Niki: Okay. And then I do use Nord VPN although you've just educated me on how that's sort of a false, also a false sense of privacy when you get down to it.

Is there anything else you use or any other tips you have for people? 

Matteo: Just understanding that by hiding your IP address, you're not hiding who you are because there, there's browser fingerprinting, there's cookies, et cetera, and understanding that just because you're hiding IP address doesn't mean you can do all sorts of illegal stuff.

Not that you are!

But some people think, “Oh, they don't know who my IP address is, so I'm gonna go now post all this, all these threats on Twitter, or I'm gonna I'll do all this crazy stuff.”  When law enforcement could just as easily get a subpoena and go to Nord and, and get your logs, or this has happened with six or seven really, really large VPN providers, they will get hacked.  

Niki: Right! This is what I always think with the VPN because it's got access to everything. First of all, I don't even know why I have a VPN. To your point,  I'm not really doing anything interesting. I'm definitely not doing anything interesting or illegal, but I do have a VPN.

Matteo: Yeah, it's, it's, even if you're not doing anything interesting or illegal, like, I, we still, I still use a VPN for home browsing ‘cause I think it's just it's one step more of obfuscation. 

I understand that it's not everything. I understand that it's not concealing my identity a hundred percent. I understand that this VPN is probably keeping logs on what my actual IP address is and maybe what the websites I'm visiting are, but it's one step better. It's better than not having the VPN. 

Niki: Right! So it's another step. And it's a step. I mean, at least in my experience, it doesn't slow anything down, really. I think there is definitely, in general, just the average person is more privacy conscious than we used to be. I certainly am! [Matteo: Yeah]  Even if it's just sort of on principle that I wanna be more privacy-conscious.

Matteo: Yeah, I, I think the, the most important thing is, is to be able to make that decision yourself. It’s to understand if I use this VPN or if I don't use this VPN this is what people will have access to. Or if I accept this cookie, or I turn on JavaScript while I'm browsing, this is what will happen and then kind of make an informed choice from there, right? 

A lot of people don't know that when you accept certain cookies or when you, you don't even have a choice. They just, they're on there, like they're loaded by default that that cookie is tracking you, not just on that website, but across 30 different sites. They don't realize that they're, when they're, basically, loading a website, that that website will know everything about their device and that they can then track them even without a cookie across the internet.

So, it's just kind of, like, “Okay, I understand that this is the case. Do I care? No. Do I care? Yes.” Okay. Then, in that case, “You know what? Let me download this other browser that doesn't give my information away or let me disable JavaScript across every website, so I have to, by default, not be tracked.”

So, as long as you're just educated on what the, what the repercussions of certain actions are, that's fine. And if you accept them, you accept them. But I think the hard part is a lot of people just don't know. So, they just kind of will blindly visit websites, click on things, allow location, do this without understanding what the repercussions of that are.

Niki: So, that's a perfect place to end because you've just spent some time explaining to all of us what the [chuckling] repercussions of some of that are. 

And I really appreciate you explaining the deep web versus the dark web and what a darknet is because I sort of had the gist of it, but I didn't totally understand.

So, thank you so much, Matteo, for taking the time! 

Matteo: Of course!  It was my pleasure. 

Thank you so much.

Outro: 

Niki: Next week I’m stoked to talk about the space race with reporter and writer Ashlee Vance. He’s the author of ‘When the Heavens Went on Sale’ which is his latest book and it’s all about the capitalists who are monetizing low earth orbit.  His book is written in a fun and storytelling style and we talk about the misfits and geniuses who are staking a claim in this final frontier.