Introduction
Part 1: Sources and Datasets
Chapter 1: Protecting
Sources and Yourself
Chapter 2: Acquiring Datasets
Part 2: Tools of the Trade
Chapter 3: The Command Line
Interface
Chapter 4: Exploring Datasets in the Terminal
Chapter 5: Docker, Aleph, and Making Datasets Searchable
Chapter 6: Reading Other People's Emails
Part 3: Writing Code
Chapter 7: An Introduction to
Python
Chapter 8: Working with Data in Python
Part 4: Structured Data
Chapter 9: BlueLeaks, Black Lives
Matter, and the CSV File Format
Chapter 10: BlueLeaks Explorer
Chapter 11: Parler, the Insurrection of January 6, and the JSON
File Format
Chapter 12: Epik Fail, Extremism Research, and SQL Databases
Part 5: Case Studies
Chapter 13: Pandemic Profiteers and
COVID-19 Disinformation
Chapter 14: Neo-Nazis and Their Chat Rooms
Afterword
Appendixes
Appendix A: Using the Windows Subsystem for Linux
Appendix B: Scraping the Web
Micah Lee is a renowned investigative journalist and computer security engineer celebrated for securing Edward Snowden's NSA leak. He is the director of information security at The Intercept and an advisor to the transparency collective Distributed Denial of Secrets. A former EFF staff technologist and Freedom of the Press Foundation co-founder, Lee is also a Tor Project contributor and the developer of open source security tools like OnionShare and Dangerzone.
“Micah’s book is a fantastic and friendly introduction for
journalists, activists, and anyone else who is interested in
learning to analyze large data sets but has been too intimidated by
the technical details. I hope this book will inspire more people to
find the stories inside the data.”
—Eva Galperin, Director of Cybersecurity at the Electronic Frontier
Foundation
“Masterfully breaks down how to handle a data leak and provides the
reader with hands-on examples to hone their skills. If only I had
this book when I broke the news of the Epik data breach!”
—Steven Monacelli, Special Investigative Correspondent at the Texas
Observer
“For more than a decade, Micah Lee has been on the cutting edge of
protecting journalists and their sources from surveillance. It's a
gift to all of us that he has downloaded his wisdom into this
highly readable and vitally important guide.”
—Julia Angwin, Investigative Journalist at The New York Times
“Thanks to whistleblowing leaks, gold mines of valuable digital
data now exist. There is no better account than Micah Lee’s lively
and readable how-to guide for arming journalists and researchers
with the tools necessary to find, excavate, and make sense of this
rich data. Sourced from Lee’s experiences mining data for his
hard-hitting journalistic exposes, readers will come away inspired
and equipped to follow in his footsteps.”
—Gabriella Coleman, Harvard Professor, Founder of Hack_Curio, and
Tor Project Board Member
“As a journalist who has been working with data breaches for close
to ten years, actually getting to grips with that data is often the
hardest part of any reporting project. Lee's clear and concise book
will be an invaluable resource for reporters or researchers just
dipping into this sort of data, or those looking for new
techniques. I will certainly be using some of the tools myself.
Hacked and dumped datasets are rich sources of information that are
in the public interest, and Lee's book will only increase the
number of important stories others are able to extract from
them.”
—Joseph Cox, Senior Staff Writer at Motherboard/Vice Media
"[A] fascinating guide on how to analyze the data from several
significant data breaches over the last decade. . . Lee is a
unique author with extremely deep technical knowledge who conveys
information in a readable manner. Hacks, Leaks, and Revelations is
one of the more unique information security books of recent memory
and a fascinating read."
—Ben Rothke, RSA Conference blog
“Seamlessly blends real-world stories of whistleblowers and data
dumps with a top to bottom guide on how to approach those very
scenarios yourself. From protecting sources to accessing leaked
data, no page is wasted. A must-read for any researcher or
journalist regardless of experience.”
—Mikael Thalen, Tech and Security Reporter at The Daily Dot
“The world is awash in hacked and leaked data, and any investigator
or journalist hoping to handle it safely and find the newsworthy
threads needs to buy this book. Micah's step-by-step approach to
the ethics, safety and tooling is both approachable for the average
person with even basic data skills and will also be useful for
those with an advanced background. A guide like this was waiting to
be written.”
—AJ Vicens, Reporter at CyberScoop
"A comprehensive yet highly digestible resource that I would
wholeheartedly recommend to anyone remotely interested by modern
journalism [practices]."
—Julien Voisin, Artificial Truth
“Of special interest for anyone concerned with the increasing
issues around cyberspace and internet database security, Hacks,
Leaks, and Revelations must be considered basic, fundamental
reading.”
—Midwest Book Review
![]() |
Ask a Question About this Product More... |
![]() |