Over the past year I’ve been asked by investors and startups on what to do about the dark web. Crime is rampant. The increasing variety of dark web technologies is confusing. And there is much hyperbole about the size, secrecy, and strength of the technologies. I’m helping advising businesses-some small, some large-on how to tackle the various dark webs.
The same basic questions are asked again and again. Answers appear elusive so far, but there are some teams making progress. The basic business model is a crawler and alerting system for targeted content appearing on various dark web sites or locations. The core technology is indexing the dark web. Once you have an index, you then sell access to it for keyword searches as defined by the customers. When a keyword has results, you send an alert to the customer. An analog could be “Google Alerts for the dark web”. Pretty straightforward so far. The challenge is in the technology, the vastness of the address spaces, and what’s happening in these dark spaces today.
The first questions asked are:
How do you define the dark net or dark web? Do you mean un-lit fiber? unassigned IPv4/IPv6 addresses? overlay networks like I2P or Tor? Known blackhat hacker hangouts? Distributed networks like ZeroNet or Tribler?
How many darknet sites have you crawled? What scope of the darknet do you think you have in a content index? 25%? 50%? 77.4%? How do you know?
Have you crawled behind authentication walls? How do you get into “pay to play” forums/sites? Meaning those which require new content to get access.
Do you have archives of sites/content for searching in the past? The darknet is fluid, so as data dumps occur, they tend not to stay online for long. Do you have legal authority to handle certain content? To store it?
How many languages do you crawl? Do you translate them or just serve them up raw as indexed content? Right to left? Left to right? How do you handle accompanying images?
Are you able to comply with evidence requirements? Snapshots of the sites? Has your data/content been through a court case? Has your methodology been through a court case?
Do you analyze the data for connections? Correlate edges where surface web cross into dark web and vice versa?
How do you discover the sites for crawling? Are you dependent on exploits in current code for discovery? How fast have you adapted to bug fixes or technology changes?
The more successful teams are able to answer these questions, or at least have answers for their planned approach. This is vastly more than just “google search for the dark web”. As mentioned in prior posts, the technologies operate unlike an IP address (whether v4 or v6) and the address spaces are huge and algorithmic generated. Learning what information you can glean from the interconnections between addresses and where they map to other address spaces can help make the difference between success and failure.
The other side of this coin are the businesses looking to host or really embrace the same technologies for good. How they learn, utilize, and leverage the technology to give them an edge over others fascinates me. I’ll put some of these thoughts into another post in the future.
I hope these questions help you as you build your business plan or pivot your current operations. I look forward to helping more dark web focused startups succeed and grow the market for their services. It’s an exciting time for all.
Over the past few months, Sarah and I have been visiting cemeteries. Some to look for relatives, others to simply see what is the composition of the grounds. A few thoughts have come to our minds as we wander around. What’s the most efficient way to search a cemetery for a specific name? Are there some patterns to discover when noticing the layout of the cemetery? What materials last the longest? What is the most legible over time? Could you get a 5,000-year grave which is still readable? 10,000-year? How much does grave care affect the longevity of a gravestone?
America’s oldest cemetery is in Duxbury, MA. It’s about 360 years old. Many of the original Mayflower descendents are buried here. Most of the graves appear to be slate made from local stones. The New England weather hasn’t helped these gravestones longevity. In most cases, they’re exposed to the weather, versus being covered by trees, brush, or other flora to protect the stone. Of course, the flora itself can add to the wear and tear of the stone. One thing that caught my attention when reading up on the history of the cemetery is that Myles Standish was re-buried three times, where “he was finally placed into a copper box and sealed in a cement chamber.” Why copper? Why cement? How has it aged since then?
Taking the opposite approach, let’s ask how to build a 10,000 year monument. Of course, one comes across the 10,000 year clock project of the Long Now Foundation. It’s a combination of stone and metal. It appears to be a mix of titanium, marine grade 316 stainless steel, stone, and ceramics. The Great Pyramids of Giza are about 4,600 years old. They composed of limestone, granite from Aswan, and mortar. Two things in common here are the environments. Both the clock and pyramids are constructed in dry climates in relatively geologically stable locations.
I vaguely remembered an old news story about how modern biodegradable plastics will take 400-500 years to degrade. A plastic gravestone won’t last long enough to meet our 5,000-year minimum lifetime. Perhaps a ceramic one will, except probably prohibitively expensive and fragile if hit by debris in a storm.
After talking to a few stone masons and gravestone masons, it seems we’re sticking with a high quality of granite, polished, and with some bronze extras. This will generally last 1,000 years; with proper care, longer. However, “with proper care” is highly unreliable. Something more or less impervious to the weather, debris, falling objects (like trees), and more major weather like hurricanes, earthquakes, and tornados probably suggests something not so tall, grand, and monumental. Something low, not so noticeable to grave robbers, vandals, and plundering hordes of the future.
I’m sure there are mathematical formulas and research into how to search a given unknown area with maximal efficiency. I didn’t study them before searching for a given gravestone in an unknown cemetery. There is a certain structure and organization of cemeteries which once you figure it out, makes planning a search somewhat easier. The trick has been to maximize what you can see as you walk around the cemetery. The limit is really your own field of vision and readability of the gravestones as you walk around. Driving/riding/walking around the designated paths is one approach. It generally gives you a field of view into just the edges of plots and you can only see the most obvious names. Taking each section as a separate search area, lets you walk in an X pattern and see the vast majority of the head/grave stones. Of course, this has a lot of overlap as you get closer to the center of the X where your field of view overlaps. Picking a circle/oval somewhat in the midst between the center of the section and the edge you already covered can also work well. From practical experience, both patterns seem to take about the same time. The X pattern seems faster because you’re able to walk faster and scan. The circle/oval pattern seems longer because you have to scan more as you end up looking around trying to cover everything.
All of this searching, researching, thinking, and all is secondary to the fact that you’re surrounded by the dead. Some trees last 5,000 years, some single-celled organisms last minutes. It can be sobering at times, but also endlessly fascinating to research and see what you can learn about the lives of the people around you. In the end, everything dies, it’s the best reason you have to live.
Sarah and I planned a day of outdoor fun at Quabbin this July 4th. Rather than sit around at a BBQ or watching tv or behind a computer, we decided to start a new national pastime of getting out into the wilderness. We’ve been watching a number of movies about getting outdoors and spending a lot less time behind computers. Between 180 South, Wild, A Walk in the Woods, and The Art of Walking, we’re very primed to be outdoors and getting some exercise.
Quabbin Reservoir is a massive place in the center of Massachusetts. Well, massive by state standards. At the southern point, there are a number of nice hiking trails to better enjoy the area. You can also drive between most of the points, but the walk lets you see the area in far greater detail. And some of the views are far better than what you can see from a car or parking lot. Quabbin also has a personal history for me when I used to intern at Mass Fish & Wildlife in college. I carried out Loon research and helped the Eagle Restoration project for a summer in between trips to central Maine to do more research. I know the area fairly well and knew these trails would be a nice day hike.
On July 3rd, we went to research backpacks, as neither of us really has an appropriate hiking backpack. Naturally, backpacks are designed by men for men. This becomes obvious when a woman tries on a few backpacks and the sizing is off, the front straps and the sternum strap are not designed for any sort of female chest, and the whole geometry of the bag is clearly designed for male proportions. Sure, there are some “women-specific” packs, but they are mostly the same as the unisex (read male) packs but come in more hideous color combinations (how many bad shades of pink can one cram into a pack?). The whole experience will be another blog post. The people at Eastern Mountain Sports were very nice and did provide lots of help when asked. We ended up not buying a backpack and just planning to use what we already owned. I did get a new pair of Scarpa R-Evolution GTX backpacking boots. I figured the Quabbin hike with a decent weight in the pack would be a great way to break them in.
We awoke early in the morning and head out to Harry’s Restaurant for breakfast. After a filling meal, we arrived at the Quabbin Visitor Center. We packed up and head out. We brought minimal supplies, because it was just a day hike. And we’re always within a quick drive to food if we get desperate. Some cashews, fresh peaches, bottle of water, and bottle of goat’s milk yoggi were it for the day. I purposely packed a bit heavier with my full complement of camera gear, zoom lens, portable solar panel, and a tripod. Cross Windsor Dam, across the spillway, and off the pavement into the trails. Here’s a full complement of pictures from the day.
After about 5 hours of hiking around, up hills, down hills, around the edge of the water, my heavy pack became a dead weight. According to my Google Fit stats, my pace slowed dramatically in the last hour. Probably not the smartest plan to pack a heavy bag and walk around for a few hours one week after a major bike accident. However, we changed the last waypoint and headed back to the Visitor’s Center.
We then head off to Petersham to visit the Country Store, refresh ourselves, and explore the local cemeteries for ancestors. The Country Store is where I used to stop to get lunch every day I was at Quabbin during college. It’s changed hands a few times, but hasn’t dramatically changed much over the decades.
Cemeteries are another blog post in their own right. I’ve been to a handful, and start to have thoughts like, “What’s the most efficient path to take to cover the maximum ground to find gravestone X?” “How long do gravestones last?” “Is granite or bronze better for maximum readability a very long time in the future?” We found what we were looking for and head off to find dinner.
The few places we stopped at were all closed. Of course, this gave us the perfect excuse to return to Harry’s to finish the day. Another satisfying meal later, and we head home to clean up and pass out.
Today while riding home from a meeting, I flipped over the handlebars on the bike. Riding down Oxford Street, towards Harvard, a van stopped short to avoid a pedestrian. I grabbed the brakes on the bike, started to go over, and put my right hand out to brace myself against the back of the van. The van pulled away, and there I went, flipping over the handlebars, left hand firmly on the brakes (front brakes), right hand trying to find the handlebar. I landed on my left side, head first into the pavement. The phone was recording the trip.
The recording of the crash just past Everett Street
I was just passed Everett Street on Oxford Street. The little zigzag there is how the crash looked to the accelerometer and GPS in the phone. The screenshot is from just now, when I realized the phone probably recorded the entire event. My gloves saved most of my hands from being too ripped up. If anyone suggests that caliper brakes and pads are too weak, I have real world experience that they can lock up a tire at speed just fine.
I almost didn’t bring the helmet when I left for the meetings today. Besides some scrapes, bruises, and sore muscles, I’m fine. My head didn’t split open and instead, the helmet took the brunt of the force.
Continuing the search for a folding bike took many turns and walked through a tree of decision points leading to a final bike choice and purchase. This is part two of the journey to the final folding bike purchase.
We left Part One with the following list:
Less than $2,000 with all parts installed.
Not full-size wheels (700cc).
Can fit into a large suitcase for airline travel.
Folds and locks relatively fast, say in under 30 seconds.
Can accommodate a dynamo hub for power generation (lights and smartphone).
Has a local bike shop which sells and services the bike.
I’ve since added a new one:
Can accommodate a cargo rack to hold full-size panniers and possibly grocery bags for shopping; a stretch goal being a front rack/basket for a backpack as well.
Let’s walk through the desired attributes of my fantasy bike:
Less than $2,000 with all parts installed. This budgetary reality cuts out many of the fancy carbon fiber folding bikes. They sure are light, but also a price premium. My reality is that for $2k, there are plenty of light bikes, just not ultra-light. Anything under 12 KG is light enough for daily usage and hauling.
Not full-size wheels (aka 700cc). I kept thinking I needed to have the smallest wheel possible. In reality, the variety of folding arrangements doesn’t necessarily require the smallest possible wheels. Some Citizen Bike models, most Strida, and the Dahon EEZZ use 16″/40.6 cm wheels, and they do fold up pretty small. However, some of the next two larger sizes (20″ and 24″) fold up quite small as well. My informal measurements showed the following common wheels sizes on the various folding bikes I researched and tested:
16″/40.6 cm wheels
20″/50.8 cm wheels
24″/61 cm wheels
26″/66 cm wheels
Can fit into a large suitcase for airline travel. It turns out that the folding arrangement matters more than the size of the wheels (to a point). The largest folding bikes, generally Montague, with their 26″/66cm wheels, require the largest of suitcases to accommodate the wheel diameter. However, the rest of the wheels (16-24″) all fit in a large suitcase just fine.
Folds and locks relatively fast, say in under 30 seconds. I reduced the time to under 15 seconds. In thinking about actually having to fold up the bike, likely I’d be doing it to get on a subway, bus, or into a taxi/uber/ride car. While many times I have the luxury of time to be able to fold the bike in advance, the ability to fold the bike very fast will be key when I really to do so quickly. It’s the difference between making a flight or meeting and not doing so.
Can accommodate a dynamo hub for power generation (lights and smartphone). In joining the modern world, many people use their phone for navigation. I’d also like to be able to hold conversations while riding (via bluetooth headset). This takes a dynamo hub to power the phone, or at least re-charge it. I have a Voltaic Fuse 10W portable charger. Strapping it to the backpack for short-haul riding (like for errands) is a bit impractical.
It’s great for longer hauls, but not really practical for short/medium trips. There are many solutions for the smaller wheel diameters and smaller fork sizes. In fact, since the wheels are smaller, they hold a higher RPM which allows for more power to be produced than a 26″ or larger wheel. Dynamo hub manufacturers take the wheel diameter into account to try to increase power with lower RPM wheels. The same innards on a smaller diameter rotates faster, and therefore produces more power.
Has a local bike shop which sells and services the bike. The theory here is to “support your local bike shop”. I want the convenience of walking to my favorite shop and having them handle the service, parts, and such with the bike. This may limit my choices, but in reality, my local bike shops are happy to work on any bike I bring in, even if they custom order it as a one-off just for me.
So where does all this end up?
After all this research, countless test rides, understanding the trade-offs, salivating over accessories, I finally decided on the Tern Eclipse p18L. Spoiler alert, I already told the careful observer about this decision.
Tern Eclipse p18L
The 24″ wheels provide stability, the 18 speeds provides a vastly larger riding range, and the integrated hub provides the power. Adding on a rear cargo rack and front kanga rack turn this into a real adventure touring bike. It’s 11.8 KG, so light enough to pick up and haul around if needed. When I get tired of riding, I can fold it up, bring it on a bus, train, plane, or car and finish the journey. I look forward to my first century ride on the Night Rider.
Over a year ago, I started looking into folding bikes. Originally, I wanted something super small and easy to transport on buses, subways, airplanes, and in trunks of cars. The nature of a folding bike meant it wouldn’t take up too much space in my house either.
I decided to start at the top and see what were the lightest folding bikes around. I found the Allen Ultra 1. Pure carbon fiber, 20 speeds, and folds up quite well. At 10.5 KG, it’s ultra-light. At roughly $5,000, it’s the definition of the top end.
Allen Ultra 1
One day when riding through Newton Center, my partner and I stopped at Harris Cyclery. They have a ton of Brompton’s sitting in their window. Brompton’s are very nice bikes. Very intelligent design in folding mechanism, robust materials which last forever, and a community which loves to help others. They’re highly customized to what you want, colors, accessories, gears, features, etc. In fact, when ordering one, the first thing one does is to “Build a Brompton“. They are rather expensive, but are complete bikes ready for the long haul. However, after riding them around for a few demo rides, I was less and less impressed. However, Harris Cyclery were experts in Brompton’s and talked about power needs of the modern rider. This started a digression into power generating hubs and what I could fit into a folding bike, given the sizes were just smaller across the board.
These ideas percolated a bit until I found Montague’s at Warm Planet Bikes in San Francisco. Montague bikes are full-sized 700cc wheeled bikes. They ride like full-size bikes, can take all the common racks, bags, and other accessories, and the dynamo hub power generation options are more because it’s a full sized 700cc wheel. After seriously considering buying one, the reality is that the Montague is a great bike, but it’s a full-sized bike. While it can fold, it still consumes more real estate than I planned to need. While at Warm Planet, I saw my first Strida.
I loved the contrarian look, the simplicity of the bike, and the way it folded into a stalk which could be pushed along like a rolling cane. And they made a carbon fiber model for ultimate portability. The C1 weighs 8.5 KG. This is the lightest folding bike I’ve found yet. I did manage to find a place which sells them, and they quoted around $3,500 for the C1. More than I wanted to spend, but riding the lesser models was fun and became my top choice for a long time.
By this time, my requirements list for a folding bike was:
Less than $2,000 with all parts installed.
Not full-size wheels (700cc).
Can fit into a large suitcase for airline travel.
Folds and locks relatively fast, say in under 30 seconds.
Can accommodate a dynamo hub for power generation (lights and smartphone).
Has a local bike shop which sells and services the bike.
Recently, my riding partner and I head out to ride from Plymouth to Provincetown as part of an organized “P2P” ride. We left at 6:01 AM, arriving at 17:45 after some road adventures and 152 km of riding.
I ride a 2010 Felt 95X Cyclocross bike. Over the years, I’ve replaced the seat, the pedals, the brake pads and added a Jandd rack. I’ve stayed with the original Vittoria Randonneur Cross PRO 700c x 35c Kevlar Reinforced tires, as they handle mixed surfaces well; even when the bike is fully loaded. Also, the lower pressure adds to a smoother ride. I’m a huge fan of Jandd bags. They last forever, are easy to clean, and never fail me.
The ride is pretty straightforward. You can make it even more so by simply taking Route 3A to 6A and just following that until the Cape Cod Rail Trail and then back on 6 to 6A through Truro into Provincetown. Shortly after the start, you encounter some hills, some with 9% grades up and down.
The toughest part for me was the first 30 minutes. I’m not awake, my muscles are still warming up, and nothing feels good. As one continues, the ride gets easier and easier. The last hill before the Bourne Bridge is a killer, especially when loaded down. However, when you’ve climbed it, you’re awake and ready for the next few hours of riding. A number of riders show up on high performance racing bikes. They whip by you in packs or singles and a few asked why I was carrying the bags. There was a sag wagon, but the point was to train for multi-day touring, not just this one event as a sprint. Many people were trying to take the 15:00 Provincetown to Plymouth ferry. I was carrying lots of gear for my partner and I, partially to slow me down, partially because I had the bags and setup for it all.
We stopped twice for water along the way, but otherwise decided we were going to each lunch at the end of the Rail Trail. We made a bathroom stop around mile 17 of the Rail Trail and let our muscles rest after about 7 hours of constant riding. At about mile marker 20 on the rail trail, the heat and lack of breakfast or any food got to me. I hit the wall. I powered through it because I knew PB Boulangerie Bistro was on the other end as our first meal and designed break point. And what a welcome meal it was.
We finished off sandwiches, 2L of mineral water, and some quick desserts in about 45 minutes. They have a nice outdoor fire-pit going all day and night. It’s a great place to sit, relax, and enjoy the food. The taste of the food and the preparation remind me of trips to France, but the environment is all Cape Cod. After mapping out the plan to the end, we re-filled our water bottles and off we went to finish the ride. We took Ocean View Drive to avoid a death-defying ride on Route 6. The great views and cooling winds from the ocean do help a lot as well.
Somewhere in the midst of Truro, my knee stopped working. It was a bit sore, likely tendinitis, along the rail trail, but after lunch, it felt pretty good. Not anymore. I ended up taking a water/snack break for 15 minutes, and then walking for 30 minutes until it seemed well enough to keep going. We rode to Jams Gourmet and stopped for some water and to let my knee recover again.
After a fruity popsicle and some fresh, cool water, we rode off to Provincetown. The rolling hills seemed quaint at this point. The flat straight up 6A to the Ptown sign was a welcome run. It was this flat run which highlighted the limits of the two front sprockets. I’m in top gear and could go faster if I had a third, smaller ring up front. This may be an excuse for either new front gear set, or a more adventure-oriented touring bike.
And here we are, at the Ptown sign, after 11 hours on the bike:
We then headed into town to our room, cleaned up, caught up on some work, and headed out to dinner with friends.
A few lessons learned:
Pay attention to my knee earlier.
Eat smaller bits along the way, rather than starve until lunch break.
Bring my solar panel to charge my phone, which was at 1% battery by the time I arrived at the Ptown sign. It was 100% when I left at 6 AM.
Get a new water bottle, as the one I have isn’t so great (even if it is recycled metal and BPA-free).
Adjust my handlebars higher to avoid the temporary palsy in my hands after roughly 11 hours in the same position.
All in all, it was a great time. Next up is a full ride from Boston to Ptown. And then some rides around Vermont and Maine. Maybe a ride from San Francisco to Carmel or Monterey.
Sugar. It’s sweet. It’s tasty. A pinch of it is better than a pound of salt. Or so the saying implies. After years of chocolate chip cookies, I was becoming an expert in the flavors, textures, and choices available on the market. I seriously considered starting a blog about chocolate-chip cookies. At some point, others and I realized cookies were the sugar delivery mechanism of choice.Whether it was after lunch or after dinner, desserts of cookies or ice cream or popsicles or cake were becoming the norm. Without realizing it, pasta sauces, sushi, anything chocolate, maple syrup, and the like were all feeding me toxic amounts of sugar, all the time. When Honey Crisp and Red Delicious apples started to taste sour, along with a clementine, it finally hit home that something was way off.
My partner and I started doing more research, well, she did most of it up front. We read about the sugar conspiracy from the 1970s. We started reading nutrition labels and ingredients. Sugar, or it’s substitutes, are in just about everything. I’ve re-labeled Whole Foods to be Whole Sugar. If this is “America’s Healthiest Grocery Store”, we’re in some deep trouble as a nation.
I fell back on my years of biochemistry and started wondering how sugar and substitutes are processed by the body. I’m not predisposed to diabetes of any type, but it was still a curiosity. After hours of wading through thousands of links found through Google search results about sugar, substitutes, and other products, it’s really hard to figure out what’s posted by industry, by product companies, by people like me, or by actual biochemist nutritionists. I finally found Dr. Lustig’s, Sugar: The Bitter Truth video. It’s a bit tough to watch, but fascinating and thorough nonetheless. Watch that video, it’s worth it.
45 Days so far
45 days into a sugar-free diet, and everything tastes sweet. I mean everything. In many cases, we compromise with “no added sugar” because there is just sugar in everything. I look forward to the forthcoming FDA Nutrition Label changes. Among the many changes, they require “Added Sugars” to be clearly labeled. Here’s what they require:
“Added sugars,” in grams and as percent Daily Value, will be included on the label. Scientific data shows that it is difficult to meet nutrient needs while staying within calorie limits if you consume more than 10 percent of your total daily calories from added sugar, and this is consistent with the 2015-2020 Dietary Guidelines for Americans.
With careful research, we’ve been able to replicate most of the foods we enjoy with no-sugar or no-added-sugar substitutes.
The impact has been somewhat dramatic, without even trying. I’ve lost about 6 Kg of weight according to my scale. I’ve dropped 3 waist sizes in pants and shorts. I’ve gone from a Large slim fit shirt to a Medium slim fit shirt. My body is clearly re-shaping. My exercise, general caloric intake, and lifestyle has roughly remained constant. The biggest change has been the sheer amount of energy available for longer periods of time. I quit caffeine about a decade ago, and I remember the shock at how less tired I was within weeks of quitting.
That’s a snapshot of my activity for the past 5 weeks.
I started blogging in roughly 1997, mostly as a way to randomly play with technology, since I had to write my own blog software. Really, it started off as a “dot plan file” back when finger was all the rage (1980s). Then it grew and grew until telnet and finger didn’t like swapping large text files around. I’ve lost a few of the original posts, not even the Wayback Machine has them at this point.
As I’m moving between blogging platforms, migrating all of the content has been challenging. I’d like to keep everything together in one place to provide continuity. As I’m migrating, the older content is still hosted at my own blog repo. Soon, it will be converted to this platform and become much more accessible.
On May 12, 2016, I was the ending keynote speaker at the first Inside the Dark Web conference. It was held at a beautiful location in Battery Park at the southern tip of Manhattan. It’s a talk about how I’ve been crawling .onion and .i2p sites for 2 years out of curiosity of what could be in these spaces.
My presentation is available here: Exploring the dark web – Lewman. I was a last minute addition due to my return trip to Egypt being changed. I didn’t appear in any of the official conference materials, but there I was giving the talk. The video and transcript are below.
1-Andrew Lewman, Inside the Dark Web
My hobby has been exploring the dark web. It started off a few years ago, 2009… Well 2003 I started volunteering for Tor. It was a very early post on slashdot where Roger Dingledine posted, “Hey, there’s a tarball.” I was like, “Well, that’s useless. Great, written source code. 99.99% of the world can’t do anything with it.” So I compiled it and sent it back to him and he ran it. And I was like, “Hmm. Talk about blind trust.”
So then Roger’s response to that was, “Hey, let’s have dinner.” So we had dinner and he’s like, “Hey, how would you like to compile all the software for all of Tor?” And I said, “Well, I have a day job, but sure.” So I had a BeBox. Any of you remember BeOS? So I had one of those, I compiled Tor for it and and that was also useless because 99.9% of the world doesn’t run BeOS. And so I got a Mac and I got a Windows machine and I had a bunch of weird UNIX machines and I started compiling Tor for all these platforms. If you downloaded TOR dot exe or TOR browser from about 2004 on, you used my artisan and cuddled, handcrafted, hugged binaries that I released about every couple of weeks, given Tor’s release cycle.
Anyway, so the title of the talk today, Exploring the Dark Web: Mostly for fun, sometimes for profit. As Paul mentioned, back in 2006 when Tor was trying to figure out should it go for-profit or non-profit, I sat down with Roger and talked to him about, well, you know, the goal is to release source code, the goal is to do good in the world, then you should be a non-profit because that’s how people in the world think. If you’re for-profit everyone will wonder, what are you doing, why are you giving this code, who is the target?
In 2009, the State Department through a partner gave Tor a million and a half dollars – and when you get a million and a half dollars to two MIT masters students, guess what they do with it? They do research and they produce white papers. And the State Department said, “No, no, no. We paid you a lot of money to actually produce stuff that people in the world want to use this. And they said, “Great. There’s a tarball.”
I was the secretary and treasurer of the Tor project, which is a 501c3 non-profit. I had a conversation with the program manager at the time whose name is Sarah, she’s now gone off to counterterrorism stuff, and she said, ‘Please, please, please quit your job.” I said, “Well, I work for a public company. They make a lot of money. You can’t afford me.” She said, “Okay. Well we’ll try and find you some more funding.” So they did and I did.
The first thing you run into when you ran Tor at that time was it’s slow. It’s still slow, compared to the native web right now. But 2004 to 2009 it was painfully slow. You would wait 5 to 10 minutes for a page to come back, and then you want to click on something else and you didn’t have an hour to kill to browse two pages. So the first thing I did was I abused my employer who had a bunch of build machines and we performance-profiled Tor to find out why it was slow, what it was doing, down to code level.
Since then, Tor has gotten much faster, mostly because the number of relays has increased – the number of exit relays in particular has increased. It doesn’t matter how many relays there are. For 99% of browsers in Tor it matters the exit relays. If you can’t get out of the network it doesn’t matter how fast you get to the end, you’re still bottlenecked at the end. Even with 7000 relays or so relays a day, there are still about 1000 exit relays, you’ve got a lot of traffic out.
I run some of those. A number of organizations run those. Some of the organizations – well most of the organizations or most people record the Tor exit traffic. They may not tell you that. They don’t want to publish it. People have published their exit traffic in the past and they’ve been bullied into not publishing ever again. And now ethics boards at universities don’t allow publishing of the data because the users could not opt in. How a user opts in on an anonymous network to be part of a study is a whole other question.
My first commit, Roger gave me a commit bit on December 25th, 2004. At six in the morning according to git commit logs, my first commit went through. So guess what I was doing Christmas day 2004? I was trying to fix bugs in Tor. Since then I’ve done a bunch of press interviews, been all over the place.
I’ve moved on beyond Tor. There are a number of dark webs out there. Which one you look at matters to what you are trying to find. Tor just has been in the news a lot mostly because of Silk Road. If you look at the graph of Tor searches and Onion browsing searches on Google, as soon as the Silk Road became popular all these people started going to find out; what is this Tor thing? How do I get to Silk Road? Holy crap, I can buy a joint on the internet and not from a sketchy teenager down the street!
There are a ton of other dark webs. There’s Perfect Dark, which is a big Japanese language, mostly anime sharing network. It’s not used much beyond that. It hasn’t had enough research to see how secure or not it is. There’s Freenet which actually was vastly more popular that Tor from the early 2000s. Freenet’s file sharing. It takes your bits and then blows them up into a million computers so you’ve got plausible deniability of what’s on your devices. At the same time it’s all encrypted and you can recover it to get your file back. There’s of course Tor. There’s I2P which is garlic routing. And it goes on.
There’s Retroshare, GNUnet, ZeroNet, Syndie, OneSwarm and Tribler – are the big sort of overlay networks that are used in the world. Each one sort of has their own community and how they use them. How you want to investigate… If you need to investigate crime or whatever, then you need to go figure out which network their using. A lot of times the problem law enforcement has is that there are so many technologies out there to use. The criminals only have to master one, law enforcement has to master all of them. Now it’s clear how law enforcement’s failing to master all of them because new ones keep coming up, just like botnets, just like ransomware. Someone creates a library, sells the library, and then 20 more versions come out and they’re all slightly different than the original.
So you have to go try and figure out what’s going on and who’s doing what, and then look into it and try to understand. The ‘try to understand’ part takes time, which is really frustrating having worked with investigations. Is that if something’s going to happen day to day and time – I’m thinking of the person who emailed a bunch of universities, said he planned to bomb Carnegie Mellon University and was going to detonate them all at the following times. You’re not going to find that time to go find the person, figure out who it is. They all came through Tor. They came through Guerilla Mail and others, which these Tor linked to Guerilla Mail. So the time law enforcement goes, “Hey, there’s Guerilla Mail. Here’s Tor. Oh, crap. It’s a Tor address.” And then go figure out who has the motive, who has the thing, what the speech patterns are, and meanwhile the clock is ticking to when these bombs may or may not go off. Luckily there were no bombs. It was just a disgruntled person. But you don’t actually know that at the start.
There’s also the case of the Harvard student by the name Kim. He used Guerilla Mail to do Tor. What we discovered in that was that he used Tor to get to Guerilla Mail to set up a bomb threat to evacuate just before his final exam, because he didn’t study. And what happened was that there was so little Tor usage on Harvard University’s network they just went interviewed everyone who was on Tor at the time that the threat was sent. It’s like a handful of people, and he confessed. Because when you have two FBI agents come right at you, along with Harvard University police, a bunch of other people if you’re a real criminal you pass it off, you say, “Oh, whatever. I use Tor to browse porn.” He said, “Yes, I did.” And that made it pretty clear who did it.
What I have here that you can’t see is the cool graph based on Google’s Zeitgeist about who’s querying for what over time. Freenet is about three times that of Tor until around 2010, or the Silk Road. And then Silk Road flipped it so that Tor became more popular than Freenet. All the other ones I mentioned, Retroshare, GNUnet, ZeroNet and all that stuff, that’s barely ever messaged. They’re a little blurb on the bottom.
What’s also interesting is which regions use which dark webs. Who do you think is the top country for Tor usage by Google queries?
PARTICIPANT: US. China.
LEWMAN: Bangladesh. Why Bangladesh? I have no idea. But this is what everyone’s looked at. Norway is number two, Italy is number three, Germany is number four, then comes Syria at number five, then Pakistan, and then Poland. Why, I don’t know.
PARTICIPANT: Because they mine databases.
LEWMAN: Yes. They also mine databases much more. The top countries for I2P are Russia and the Ukraine, and everything else is like Arab, within largely Arab. Freenet is almost all Germany. There’s almost no usage or queries or anything about Freenet and anything else besides Germany. Retroshare is heavily French and German, but actually when you look at it in more detail it’s more like the Alsace-Lorraine region, which is sort of at the border of France and Germany. And GNUnet is mostly France and Germany because the people that developed it are German.
So in 2014 I started crawling .onion sites. Just fired up Apache Solr, fired different scripts, and just sucked everything I could find into a database. Since 2014 I’ve crawled 114,000 roughly unique .onion domains.
Why did I start doing this in 2014? Because I went to a conference at the European Monitoring Centre for Drugs and Drug Addiction and it was about a symposium on drugs on the internet. And there were a number of researchers there. The number one complaint they all had was that they can’t reliably crawl Tor or I2P websites. Partially it’s because hidden service is just unreliable. They go up, they go down, the technology is slow and crawl is not that patient. Two because they didn’t fully understand what they were doing. They were trying to take like a standard crawler and just point it at a .onion and hope it worked, and for the most part it doesn’t – as anyone knows who crawls the dark web forums now. And then trying to crawl things like Tribler and OneSwarm which are bit torrent based became even more of a challenge. So I helped them… In those couple of days I helped them set up a crawler and pointed at forums on it and just let it go.
The other thing we did was we took the hidden wiki. There are about 27 hidden wikis that are out there and we took “The Hidden Wiki”, because that’s what everyone said it was, seeded the list of domain names and just let it start going. In about six months we found about 15,000 new hidden services that were linked in forums or linked through affiliate networks where they show for one day or two days or whatever. It’s one .onion linking to another so you get a really cool graph, which you won’t see, which says, “Here’s the source site, here’s these other sites it talks to, here’s all the other sites they talk to.” Just a standard link diagram of who talks to who, how often they talk, and where it went from there.
So what do you think is the first problem you run into when you start crawling a bunch of .onion sites – besides the fact that they’re wholly unreliable? You find a lot of child abuse material. Shockingly fast. Shockingly unencrypted. Shockingly unprotected by credentials. There’s no login stuff for this. You start running into it – or you will.
I’m not going to show you child porn but it’s a picture of a girl holding a My Little Pony. It’s one of the ways enforcement does go after child abusers. There’s the image of the kid. You can also block out the image of the kid because everything else in the image matters more. What are they wearing? What is the background? What are they holding? How are they doing? And we actually collected so much of this stuff, I went to go talk to DHS and said, “Oh, so how do you anonymously…” Well how do you tell DHS, “By the way, I’ve collected a bunch of child porn by accident but I don’t actually collect any child porn, I don’t hold it whatsoever because that would be illegal.”
Thankfully our DHS agents who are part of the Child Exploitation and Obscenity Section of the DOJ and they had some lawyers work with me. I learned the definition of child abuse materials and child porn is different than one would think it is, and the laws regarding it are incredibly strict. And even some – most law enforcement agents, I think it’s I3 team, whatever section it is, don’t have the ability to view that. And they were happy to take it off my hands. It’s on the server. So I run the child porn crawls on their systems, not mine. It was under their jurisdiction.
2-Andrew Lewman, Inside the Dark Web
So I started going back to the European Monitoring Center and helping them crawl drug markets and figure out what are all these drugs on the internet. There is just a book published by EMCDDA which I have a chapter and I helped advise on the whole book about the internet and drug markets because crypto markets, as they call them, are becoming the new way to get drugs out. We also worked with a number of national police forces to figure out where these crypto markets go in the stack of organized crime. Transnational organized crime is not organized, but it’s organized in the fact that somebody gets together, someone has to be the supplier, someone has to be the distributor, someone has to do marketing, someone has to do the advertising, someone has to do affiliates. Think of any kind of business, the drug gangs work in the exact same way.
The crypto markets mostly fall into the advertising. The people that are lower ranking individuals. They are technical enough to set up a .onion site or a .i2p or whatever. There’s a lot of stuff they do inside gaming – and someone mentioned virtual gaming and currency before – there’s a lot of stuff that goes on inside there too, and that’s usually where you find is the lower ranking people get stuck with doing like street sales, which should be news to nobody.
What’s been happening though is that… There’s a screenshot of Agora, which is one of the markets that stayed online. They’ve moved from just drugs to identity credentials, counterfeits, electronics forgeries, jewelry, services for hire, a whole bunch of other stuff. Wherever something is illegal, one of these markets will sell it to your jurisdiction. A surprising number of them are now starting going to go language specific, which sort of belies who runs it, who their target market is. If it’s just Mandarin, if you’re Cantonese or something your target market is like Hong Kong and surrounds, so your supply, your organization is Hong Kong and surrounds. Which gives out a lot of information.
The Swiss… Switzerland has three different languages. The drug markets that target the Swiss French, versus Swiss German, versus the Swiss Italian sort of gives who’s likely running it and where they’re geolocated at and helps you hunt down stuff.
Passports are becoming the big thing since many passport agencies don’t have the security you think they would and/or all those terminals in the airport where they have you put your passport in for international flights, skimming is not just for credit cards. Your passport can be skimmed too and used to get a hold of information that they can go get and find and clone.
Since then I’ve worked with a number of police organizations to help take down some hidden sites that were serving up some horribly bad stuff. I won’t traumatize you with the content on there, but it brings child abuse and trafficking to a whole new level. It’s not so much the quantity, it’s the quality of what’s on there. Its that this is new stuff no one’s seen before. New kids. New people. Kids that have been brought up solely to be abused online, is what’s showing up. And all the money right now is on counter-terrorism but there’s plenty of money going towards child abuse because it’s become such a growing problem. So there’s the whole law enforcement side in all this.
What also happened is there is entrepreneurs, we’ll call them, who are moving from the open web – they’re easily being able to be tracked from their IP address to their financial information to their credit card information. The financial chain is the easiest way to hunt down any criminal, especially the unsophisticated ones who think they’re internet gods but don’t actually understand how the financial world works.
And revenge porn has become a thing. There is one site that made a lot of press called Pink Meth, which has since been taken down, because the Pink Meth site, much like Facebook and the DuckDuckGo site, the Pink Meth site was just another domain address pointing at the public Pink Meth dot com. So it was pretty easy to tell who hosted that, who was doing.
The way I got involved in it is one of the victims called up and blamed me for creating the technology that let her whole – all the pictures she’d shared with her boyfriend go up online and then get spread all over the world. And she had the… Her first response when people started contacting her was really angry and she started just insulting people, which just spread it further because the people who were trolling and attacking her now had a viable and angry target, and it just sort of made them entitled to keep going with it.
The real world effects of this have been a number of people who are affected by revenge porn are now committing suicide. These are mothers, these are fathers, these are teenagers. Predominantly women. And it’s unfortunate happening with the tie between the dark web and the real world is…
The dark web, most people felt that Silk Road, you know, what’s the matter? They’re just selling drugs to people will go buy drugs anyway. It’s a bunch of drug addicts. It’s totally harmless. It’s some libertarian dream. But as it moves beyond just the dark web to actually affecting people’s lives, this is where law enforcement gets involved. This is where people start to really questioning the main usage of the technology as negative. When you shut down the technology or it leads you to something to stop all the badness.
The revenge porn was just the start. The other slides is to correctly describe, but the affects are mostly psychological. It ties into that violence and victims of trafficking who have been just horribly psychologically abused, even though they know nothing about what the dark web is. And the real world consequences are trying to help the victims who have no control over what’s happening beyond what’s – stop sharing pictures. Many times they didn’t share the pictures. They broke up with an old boyfriend or something happened or a boss and they put the pictures online. And one part of it is well the quick answer most victims get is; well, don’t show the pictures at all, don’t take the pictures. But that sort of defeats the whole point of building a relationship with someone, especially if you’re in a long term intimate relationship, you’re going to have pictures of all sorts of stuff that you don’t think are worth anything until maybe after you break up, the jerk you break up with will then go post everything online to get back at you.
The most effective way to combat this has been legal. There’s a Cyber Civil Rights Initiative which is going after people hosting this stuff. Anybody who is a part of the dark web hosting and stuff. But actually many of the dark web people, when you eventually find a way to contact the person, whether it’s… It’s obviously never directly through the forum or whatever, many of them will take down revenge porn and child abuse. The ahmia search engine that Paul mentioned earlier, and Tor web, if you contact them and said, “The following sites are only serving up this, this or this,” they’ll strip it out the directory so you can’t search it, can’t find it anymore. Even criminals seems to have… Not to say that all of them on Tor are criminals – but the actual criminal hosting places even seem to have a moral thing of, if it involves absolute harm or gets in the way of their business then they will stop hosting it.
So, I’d worked with victims of domestic violence and stalking and revenge porn since about 2011. I didn’t realize what the effects were until it happened to me. So in 2014 there’s a blog post, you can look it up on Tor website. It’s my May 2014 trip report. I posted up a bunch of stuff. Sweden is pro, very pro-feminist, and I put up a bunch of just really basically what’s happening at the conference, on an internet forum. And it was a very weird comment showed up on Tor blog. It was the first comment. The first comment – and this is here. Well I guess you can’t see it. The first comment is, I’ll read it to you, “Ironic it takes an Arabic man, good looks not withstanding ‘wink wink’, to stand up for women on the internet at a conference about white men in the news.”
The ‘white men in the news’ I assume is because most of the comments were about Julian Assange and the WikiLeaks at the time and his Collateral Murder video, I was the one who wrote the thing. I’m not Arabic – I guess I’m darker skinned so I look Arabic. But then there’s more comments and then someone – I won’t read this, but someone really pissed off at WikiLeaks wrote a comment and it went from there, and I became the target of both the anti-WikiLeaks crowd and of pro-WikiLeaks crowd, and the pro-feminist crowd, and the anti-feminist crowd. All because of a trip report about Stockholm Internet forum in 2014.
The first instinct… So I talked to my friends in law enforcement and said, “Hey, check this out.” And they were like, “Yeah, dude. You know what to do about it. You created your own problem. You created Tor so we can try to figure out what it is but good luck.”
So the other thing which you can’t see is my Twitter direct messages. I joined Twitter, and I have a bunch of direct messages from apparently the person who posted the email, the ‘Arabic man’ thing and the anti-WikiLeaks and the pro-WikiLeaks people flood me with DMs about all sorts of stuff. And I started getting phone calls, then I started getting more threatening emails. I started getting mail at my house. Someone attempted to dox me but doxxed the wrong Andrew Lewman, which is amazing because there’s only three of us on the internet. One is in high school, one is an insurance salesman and then there’s me. I know this story because I met them because they all came to me and said, “Dude, what the hell did you do to us?”
And those are fun conversations you don’t want to have on the phone. “Hi. Great to meet you. I’m sorry I fucked up your life.”
And then when I started looking into the IP addresses, working with Twitter and the security and trying with law enforcement to figure out where this stuff is coming from, some of it came from Tor, some of it came from these other networks. And I started trying to figure out, well how secure is I2P actually? There’s a web proxy idea; how secure is that? How secure is… You know, there’s DeepSight. Same with Tribler. Somehow someone figured out how to get to Twitter through Tribler, and start trying to figure out how all this stuff is.
It turns out they’re not very secure. I2P is based on Java, and that’s a choice they made a long time ago. Java is riddled with holes. Most JVMs and most machines are riddled with holes. The jetty they use inside I2P is old and outdated and has some well known bugs. So there’s easy ways, if you’re willing to be in the gray area, to go after these sites.
To use the big… You won’t probably see this but DARPA put up a thing that here’s the big iceberg with the DARPA all over it. So they created a project called Memex to be the deep web search engine. The number one rule of Memex is that you can talk about Memex. The number two rule of Memex is you cannot operationalize any of the tools they create. So as someone no longer of the Memex project, I can operationalize their tools. They have hybrid spiders, they have forum spiders, they have a whole lot of cool things people have written to try to do link analysis and all this. And I use it against myself to try to figure out who’s posting what where and why are these idiots harassing me.
And I ran into a bunch of interesting people, mainly the Thorn foundation. The Thorn foundation does a lot of anti-sex trafficking and child sexual exploitation work. They were looking for help in trying to figure out how to use big data to actually find and identify the kids showing up. The vast majority of where they show up on is the clearweb. There is so much stuff out there. I think to your point, the billion or so websites on the clear web, law enforcement can’t keep with that. Google can’t keep up with it. You can still Google child abuse images in Google images and they’ll show up. Only if you flag it Google will take it down, or if an AI detects, hey this looks like a naked kid, take it down. So the vast majority of it, it’s all out there. It’s just a matter of harvesting it and putting it together and looking at where these people are.
By doing what I’ve done I ran into them because I actually run into a forum where they were talking about harvesting this stuff, and I was like, “Hey, this is what I have just from my own experience.” So the good of that is that we’re now working together to try to stop… Try to identify that stuff and make the tools a lot better because right now you get a lot of noise. You get a picture of people’s elbows who look funny. You get all sorts of… Anything looks skin colored shows up as potentially child abuse. And you can’t find it… You can’t filter through that human fast enough.
Most of this comes out of just curiosity. I just start exploring the dark web, I start looking at these technologies. I load them up. I run some I2P servers. I run some Tor relays, I run a Tribler node. I run a bunch of these things. And a lot of it is just to learn how do they work, what do they do, who is using it, and what can you do with it.
3-Andrew Lewman, Inside the Dark Web
Crime is like a water balloon. You squeeze a water balloon the water will go somewhere else. That’s what’s happened to crime. Crime is slowly being squeezed off the clearweb so where does it go? It’s not going to go away, it’s just going to go to the dark web. And you need the tools and technologies to understand the dark web to stop the bad uses of it so that you can have some successful technology going forward.
PARTICIPANT: So, I don’t pretend to speak for DARPA, but my best understanding of the policies that are set up for the Memex program. So they had this… They actually wanted to do work but they were gathering data about the live internet. And so my understanding is that they were allowed to do this on the understanding that it was only for research purposes. It’s not true that you can’t operationalize the tools, you just can’t use the data sets from there. They won’t pass those data sets over to people. But the tools are, you know, for transition and my understanding is that they do work with various agencies, and those agencies are actually using the tools that have been developed as part of Memex.
LEWMAN: Right. And the data sets that come out of that are used by the agencies then go back to Memex.
PARTICIPANT: They have to then go develop their own data sets but they use it using the tools that Memex produced. They just can’t have access to the research data sets. So it’s…
PARTICIPANT: So I read up… Maybe six months to a year ago I got the opportunity to meet Brewster who created the Internet Archive. Do you think there’s any place for a tool like that, or is it a matter of time before there’s an internet archive? Is it useful to have some place like that on the dark web?
LEWMAN: It’s useful in that… Well, immediately I jump to evidence collection, when you go to try to explain to a judge or – I was an actual witness in a case – they want to see what does that look like before you modified it? What does this stuff look like, and how was as it existed at the time is really important as hard evidence. I also think that if one of the dark webs takes off, whether it’s Tor or whatever it is, you’re going to want to… Some historian is going to want to see the early parts of what were the original sites created? Just like we look at that 1995 version of websites. They look hilarious now, 20 years later, but the same thing will happen if one of these dark webs take off. You’ll want to see who used it, what did they use it for? And it’s important from a historical perspective of where did you come from to where you are now.
PARTICIPANT: So is there something like that already existing?
LEWMAN: I don’t think there’s any of that existing. I have some of it just because that’s what I’ve done for just crawling, but it’s not intended to be as a work archive even.
PARTICIPANT: So how do you protect yourself as you’re crawling to not have things that cam…
PARTICIPANT: How do you not get arrested?
LEWMAN: Well, I made friends. I made some friends with law enforcement. What people have seen them mostly interested in is a list of sites. If something looks like – if it comes back with a keyword some of the Memex tools are actually classified for it, this is like child abuse. Stop crawling it, I shunt it over to DHS servers and DHS will encrypt it because they can. I can’t.
PARTICIPANT: So you’ll stop at a point.
LEWMAN: Yup. And an automated tool is still a question of I open myself to vulnerability but, yeah an automated tool is doing the classification and it’s not actually downloading and storing it, it’s just going through the content and if it turns this looks like child abuse, versus drugs, versus phishing, versus hit man or something. All that child… I don’t want to take a risk of looking at it. I’ll send it to DHS, and DHS will feed them through the list and crawl it.
PARTICIPANT: So there is still a… You’re not validating whether it is something. You’re pushing it into another…
LEWMAN: Right. Somebody else who can legally touch it just because it’s so toxic.
PARTICIPANT: That’s the gray area that I’m trying to understand. That this is the touching part.
MODERATOR: I think… Last question.
PARTICIPANT: So do you think that dark web usage will blow up more than half a percentage, will ever be 15, 20, 30% of how people access the internet when they go online? And if so, what’s going to lead to that?
LEWMAN: I think that yes the darknets will blow up. I don’t think any of the tools right now are the ones that are going to do it. Think back to 1995 you had to install your own tcp/ip stack to be able to do this stuff, and it was just too painful for 99%. That’s why AOL or CompuServe existed. Is that it was just way too complex for people to use and to do. It’s way too complex to set it up and host it. I2P has made it easy to host your own content with the click of a button, but you still have to install this thing and then you get to the text page and 99% of the world looks at it and goes, “I’ll just publish whatever I want.”
I think that someone will come along and make it vastly easier to use and that’s when it becomes much more popular. We started seeing that with Ricochet, Onion share. That they’re really easy. Create the .onion and host your content on it.
When somebody figures out an enterprise way to do that or a really easy point-and-click way to do that, that’s when I think it will start to become much more popular. Right now it’s sort of a toy. The most technology advanced people play with it. And I’m not going to place bets on which on the overlay networks is the winner. Maybe a new one will come out that’s vastly easy to use, and everyone just uses it.
Tribler has taken bit torrent to a whole new level where it’s completely peerless and you can stream videos. So that if you’ve never heard of Popcorn Time. Popcorn Time is a bit torrent app that was completely peerless but it was literally… It stole the content, the interface from Netflix, for your Netflix user experience research, so it would just copy it and you could stream off of like a thousand computers onto your computer. So you never had to download anything. It was dead easy. It had like a hockey stick of user acceptance, to be super-easy to use. Except they hosted it on popcorntime dot com, and obviously that’s the target and you shut that down and then people can’t find it and then usage goes away.
MODERATOR: Interesting. Andrew, you’ve done some great work in some dark places to get this together. Thank you very much.