We here at milMedia Group were doing some repair work on a site that had what was believed to be a protected area that wasn’t, and sensitive data was crawled and reported by Google in search results. Getting the document offline is the simple part, but Google has a memory with its Google cache and quick view features that could leave your document available for viewing for days or weeks before Google crawls your site again and cleans up the links. Here are some tips that may help you prevent that type of problem, or what you can do when it has happened and you are trying to pull it back in a timely manner.
First off, one of the challenges of establishing a new website is the matter of getting the word out as fast as possible. I usually find myself submitting new sites to a variety of crawlers, spiders and search engines to improve the likelihood of it being found, and of course, none is quite as big as Google. So, ensuring the right place using the proper search engine optimization (SEO) tactics is usually high on my list, and restricting crawlers and bots is not always high on the new site owners list. And, if you start off small with a few pages and then grown, you have to continue to assess what you need to protect, and how. In this case, a misconfigured module restricted access due to login/password combination but allowed a directory to be crawled without requiring credentials. From there is where our triage and actions began.