Images in this post generated by Microsoft Copilot and ChatGPT
This is a personal blog and all content therein is my personal opinion and not that of my employer.
Introduction
In this post, I’m going to talk about some topics that may or may not be new to you, how they led to me finding some things that ultimately led to me making a responsible disclosure to several organisations and ultimately a Teams meeting with members of several divisions within the Intelligence and Cybercrime communities of the Canadian Government one dark Monday night in January 2023.
What is Google Dorking?
First of all, we need to set the scene a little bit.
Since Google first hit the scene in 1998, a core part of their business has been their internet search business - long before they became the global internet and cloud behemoth they are today.
Search engines work based on a few fundamental principles:
- Crawling - Scanning websites on the internet, ingesting the contents.
- Indexing - Essentially, making sense of those contents and allowing them to turn up in search results based on keyword searches and the existence of those keywords in the pages.
- Ranking - Google ranks sites using several algorithms to try to return to you the most relevant results/websites for your search.
So where does Dorking come in?
Dorking is the practice of using Advanced Search within Google to surface information that may not be likely to be returned in a standard/basic search.
While I don’t know the exact reason why it is called Google Dorking, it is also known as Google Hacking and you can probably make an assumption based on the definition of the word dork:
- informal
- an odd, socially awkward, unstylish person
It’s entirely possible these hacks were called dorks because you’re probably particularly geeky or dorky if you’re using them in the first place?
Sounds useful, what’s the issue?
This is where it gets interesting.
While many people originally used Advanced Search to refine and improve their searches, hackers and security researchers realised that these could also be used to mine the huge amount of data that Google crawls and indexes to surface information that they can use to perform reconnaissance on internet facing infrastructure.
This could be to help surface where websites may be vulnerable to certain software vulnerabilities due to knowing what webserver software and version is in use and how it is configured.
It also has significant advantages in surfacing data you might not expect to be published publicly and indeed may not have been intended to be public.
In years gone by, this kind of issue where data was mistakenly public and therefore indexed by Google (and other search engines) was most likely through the use of misconfigured web servers.
In the era of Cloud Computing, there is another more likely reason - cloud storage.
Why is cloud storage so likely to be the source of a data breach?
Cloud storage is commonly referred to as “storage buckets” as this is the naming used by Amazon Web Services (AWS - the first major public cloud platform) and Google Cloud Platform (GCP), though other cloud platforms use different naming such as Microsoft Azure which calls them Storage Accounts.
While most storage buckets on most platforms are private by default, it is incredibly easy to make them public and often easier than configuring selective controls that allow access to only some clients.
As a result, the sprawl of so called “open buckets” has been huge and well publicised, having been the source of several notable data breaches.
Okay, but they’re not crawled and indexed right?
As these are cloud native storage technologies, they are primarily accessed via HTTP (usually over SSL - HTTPS). Though some platforms have non HTTPS endpoints (such as gRPC), HTTPS is by far the most common transport protocol.
This in essence means that a publicly accessible/unprotected storage bucket is no different in appearance to web server content and as such is indeed crawl-able and indexable.
And since these aren’t sitting inside private datacenters but instead behind public domain names - they are indeed crawl-able and indexed by search engines such as Google.
Okay so how did that lead to you talking with the Royal Canadian Mounted Police?
On a cold Sunday morning - 22nd January 2023 to be precise - I was being my usual geeky self and reading up on security related stuff.
I stumbled upon this Twitter post:
It got me thinking “it can’t be that easy right?!?!” and I started to play around (all from my smartphone, nothing special needed, just a web browser).Google Dorks - Cloud Storage:
— Mike Takahashi (@TakSec) January 21, 2023
site:https://t.co/ci8trAUge5 "target[.]com"
site:https://t.co/HlodV2F9hc "target[.]com"
site:https://t.co/F21QzXaaZU "target[.]com"
site:https://t.co/JxOCT9Cewc "target[.]com"
Find buckets and sensitive data#recon #bugbountytips #infosec #seo pic.twitter.com/9NQnPpzLrf
I initially started by trying to see if there was anything exposed publicly that shouldn’t be that belonged to my then employer.
To my relief, there wasn’t, but I kept playing.
It’s not particularly technically difficult, you can try terms in your search such as:
site:http://s3.amazonaws.com "target[.]com"
site:http://blob.core.windows.net "target[.]com"
site:http://googleapis.com "target[.]com"
site:http://drive.google.com "target[.]com"
Replace target[.]com
with the apex/root domain you are interested in or company name and replace the text after site: with domains of known hosting providers or cloud provider storage - the examples above are for files hosted on AWS, Azure, GCP and Google Drive.
What did you find?
I found data that should not have been accessible publicly from several organisations and responsibly disclosed it to them using contacts I sourced from friends and colleagues.
Where it got really interesting (and a little scary if I’m honest) is when I decided to also add into the search the terms Top Secret and/or Confidential …
No way, that sort of information wouldn’t be public!
In truth I didn’t expect to find anything sensitive.
I very quickly though found documents that I would not expect to be able to find without working for the intelligence services of major government agencies globally.
The one singular example that I will give you because it relates to the post I’m writing and I have their blessing to write it, is https://s3.amazonaws.com/s3.documentcloud.org/documents/1690224/doc-6-cyber-threat-capabilities.pdf
If you open it (it is safe to do so as you’ll read further on), you’ll see this document is the property of the Communications Security Establishment Canada (CSEC).
You’ll also notice that in the top right of every slide/page is the classification TOP SECRET//COMINT//REL TO FVEY.
What’s it all about then?
Page one says:
CSEC Cyber Threat Capabilities SIGINT and ITS: an end-to-end approach
So this is talking about out SIGnals INTelligence capabilities of the Canadian government…
At this point I did wonder if I’d gotten myself into something serious and had visions of black helicopters landing and me being taken away by armed men never to be seen again!
In truth, had this been 5-10 years prior, that would definitely have been a likely enough scenario to be suitably scared!
What did you do next?
Just like with the less sensitive findings (and another one very similar to the CSEC discovery but for another government), I knew the only ethical and responsible thing to do was to somehow make contact and make a responsible disclosure.
Just as with those other examples, I reached out to my then boss and asked if he had any contacts in Canadian cyber/intelligence (my second such query that morning - he was probably wondering what the hell I was doing!) and he recommended I reach out to Ian Thornton-Trump, also known on Twitter (at the time) by the handle @phat_hobbit.
He responded quickly and gave me the email address of a former colleague at the Royal Canadian Mounted Police.
He again responded quickly and escalated to his boss, who at that time had the title:
Director General, Cybercrime
Federal Policing Criminal Operations
Royal Canadian Mounted Police
“Graham, what the hell are you doing?!?!”
We exchanged a few emails but by this time it was late evening on Sunday so we setup a Teams meeting for the following night.
At this point I figured I should tell my wife what I’d been up to and therefore what I would be doing the following evening - definitely not a conversation either of us expected to be having!
The Meeting
On the evening of 23rd January 2023, I nervously joined a Teams meeting with the aforementioned Director General, Cybercrime and 3 members of his team (some with their cameras off because they were in a secure facility - “I don’t want to end up in a Secure Facility”) and also some members of the Canadian Security Intelligence Service (CSIS)
In truth I needn’t have worried - they were all very warm and friendly professionals who simply wanted to talk through what I found, what led me to that finding, what techniques were used etc.
The Followup
Another document that I found and reported that belonged to a different government, I also reported to them.
They advised in that case that the document in question was part of a leak from around a decade prior.
They also advised:
“The S3 bucket linked you provided is attributed to documentcloud.org, which is a Open-Source cloud storage provider that is attributed to the original publisher (redacted) of an article that reported on the leak and this document in question.”
As this one was also an S3 bucket belonging to documentcloud - it was possible the same was true of this document that I reported to RCMP and after a few days this was confirmed to be true.
I believe this may have been the Edward Snowden leaks as the timeline fits with that.
What do the RCMP say?
“Graham was a great person to work with! He provided a detailed and factual explanation of what he had discovered and it was clear he was approaching the situation in the spirit of helping protect people and organisations from the threat of cybercrime.
We commend Graham for contacting us as soon as he came across these documents. These types of disclosures are extremely helpful for police in identifying online threats and sensitivities.
Anyone who has information on potential cybercrimes – whether you are a victim or not – should report it to their local police as well as the Canadian Anti-Fraud Centre (CAFC) using their Online Reporting System or by phone at 1-888-495-8501”
Jason Greeley, Director General, IMIT Transformation and former Director General, Federal Policing Cybercrime
Summary
I hope that you found this post educational and insightful - and remember please, just because something is publicly accessible doesn’t mean it is supposed to be.
If you do decide to mess around with Google Dorking and you do find data that you think should not be publicly accessible, it is incumbent on you to responsibly disclose it to the entity that hosts the data or failing that either the entity that the data concerns or the relevant information commissioner.
As ever, thanks for reading and feel free to leave comments down below!
If you like what I do and appreciate the time and effort and expense that goes into my content you can always