Content scrapers have become a huge nuisance online. They’ve always been a problem, but since anyone can now have a robot that crawls the Web, anyone can effectively have a content scraper. Note that I don’t believe that simply owning a robot makes you a content scraper. There are legitimate reasons for scraping content and doing so ethically, but there are many unethical content scrapers who are ranking higher for content that they have scraped from someone else – even big brand names.
So how do you protect your content from the content scrapers?
First, you should put a copyright notice on your blog or website. Let the content scrapers know that you are aware of your rights and that you intend to pursue legal action if necessary. This, of course, won’t stop many of them. A lot of content scrapers are in third world countries where U.S. law won’t reach them.
You should also use the disallow command in your Robots.txt file to stop any robots that you know are content scrapers. One use recommended by Gab Goldenberg is to block Yahoo! Pipes, which is used by many content scrapers to program online aggregators.
Gab also has other neat tricks you can try to get the content scrapers. His blog post is a good read. You should give it a try. I really like his “baiting the trap” suggestion.