Who is stealing GruntDoc's blog posts?
Another medblogging milestone...but this time, it's not good.
GruntDoc reports "Content Theft":
GruntDoc reports "Content Theft":
Scanning my Technorati watchlist (vanity: it tells bloggers who is linking to them) today I noticed quite a lot of links from a site called "Physician-Desk-Reference", which is apparently not associated with the actual PDR that's used as a source of last resort when looking up medications.What is going on? Marketing Sherpa describes two types of content thieves: (a) admiring fans of your blog, who lift entire posts because they like them, and (b) admiring fans of Google Adsense revenue.
Looking at the site it occurred to me that I'd seen these posts before, ALL of them, as I'd written them. This site is reposting my posts with about a 5 day delay, then linking to me as "more" at the end of the entry. I have no idea why anyone would do this. The contact info on the front page is blank, so I cannot ask whoever set this up. (I didn't and this isn't an inside job if you're wondering).
The second group of thieves are profit-driven...They publish as many blogs as possible populated with lifted content, and sit back to collect commission checks from Google on ad clicks. Some have created automated programs that suck up content from around the Web and post it without need for a human editor.How to avoid content theft? Ann's Sherpablog has suggestions: add a formal copyright line and "Terms & Conditions" to your blog; shorten your RSS feeds, releasing excerpts instead of full-text; embed an "invisible" copyright line in your posts. Furthermore,
Worried publishers are forming task forces now to begin to address this threat. Ideas include limiting bots' site access and requiring registration. In the end, more walls go up around the Web and an atmosphere of distrust reigns. Too bad....
(T)ell Google in writing if someone steals your copyrighted materials.Update: an article about spam blogs.
As I noted last week, one reason some people steal others' content is because they want to get Google AdSense revenue with content-rich pages without the effort of actually creating content.
To that end, many sites I've seen appear to be using automated bots to scrape content from other sites, and then post hundreds, even thousands of pages online with AdSense listings. I'm not going to accuse any sites in particular here, suffice to say it's a quickly increasing problem and loads of folks in the online publishing community have been noticing it.
Here's what Barry Schnitt in Google's PR department said in response to my query about this problem:
"Copyright violations are against our policies. We ask that the owner of the copyrighted material comply with the Digital Millennium Copyright Act (the text of which can be found at the U.S. Copyright Office website: http://lcWeb.loc.gov/copyright/) and other applicable intellectual property laws. In this case, this means that if we receive proper notice of infringement, we will forward that notice to the responsible web site publisher. To file a notice of infringement with us, you must provide a written communication."
My take on this? It's not awfully reassuring. Google seems to want to put the policing ball in the copyright owner's corner despite the fact that few of these stolen content sites would exist if it were not for AdSense revenues.
Plus, he didn't comment at all on my second question, which was in essence, what about policing those sites -- known in the industry as "Google Spam" -- that post such short snippets of scraped content that they don't actually break copyright law. They dance around the law and usually present no real value to the visitor.
Again, these sites are a burgeoning cottage industry that appears to be wholly funded by AdSense revenue potential...