Metadata is information that is embedded in a file or document. Most people don’t even realize it exists, of don’t realize just how much information you can actually extract. Metadata exists in images, word documents, and even videos. Not all metadata is bad, but some of it can be very personal and reveal information you may not want people to see.
Did you know that the B.T.K serial killer was caught because he sent the Police a Word document on a floppy disk – and from its metadata they were able to find out who he was?
Microsoft Office documents are notorious for the amount of personal data that can be embedded in a document.
To provide a real, random example of this, I searched on Google for people who posted Word Docs on their web site. (OK – maybe not totally random – I decided to pick on HOA’s for a variety of reasons…)
I did keyword search in Google to locate files of type “doc” with keywords “homeowner assocation rules“…
Here is the one I’m using for this example:
Next, I used an online meta data viewer (http://serversniff.net/file-info.php) and entered the direct URL of the Word Document the HOA posted online. (there are also meta data viewers you can run on your own computer and can give you a LOT more information)
Here are the results. I highlighted some of the interesting parts…
So from the metadata saved within the Word Document that the HOA posted online, we can assume that someone named Christine Gibson created and last edited this document. We can also see the date the file was created, the date it was last modified, and that she spent 28 minutes revising it.
We can even see her company name…. and in case you didn’t catch the compay name up there it looks like Christine should have known better if her company name is really:
By the way, if Christine is using a recent version of Microsoft Word, all she would have had to do to prevent all this information from showing up is modify the properties of her Word Document:
She could also have used the “Prepare -> Inspect Document” feature in Word to view and prepare the document for distribution. Or for older versions of Microsoft Office, Microsoft themselves offers this tool.
Or, better yet- she could have used a 3rd party professional tool to strip all the metadata to be completely safe (such as iScrub or Metadata Assistant)
As you can see in the screenshot of the actual Word Document below, I did a keyword search for “Christine” to show that her name is not in the contents of that Word Document at all:
Moral of the story? Sanatize your Word or Office files before giving them to someone or posting them on the Internet. And Christine- if you work for a company named Hackhouse Inc – shame on you…
For further reading, here’s a great write up demonstrating potentially embarrassing revision changes found in documents posted by Microsoft.