If you have been affected by a content scraper or are scared of having your content stolen by a content scraper – this article is for you!
You know those websites – the ones that take your content without your permission and post it on their own websites. This can happen manually – or via a scraper which utilizes code to pull in content – but either way there is potential for this content to be harmful to your ranking.
Now before you get too paranoid – it is actually rare that a content scraper will negatively affect your rank, but we are going to review everything you need to know to alleviate any issues before they happen!
Content Scraping versus Content Syndication
Content Scraping: This is when someone takes your content and uses it as their own on a website without giving the original author credit. This can be done manually or with code and is against copyright laws. Often done to improve SEO or gain revenue from advertising.
Content Syndication: Alternatively syndication is when you purposely republish your content on websites other than your own. For example posting your content to LinkedIn or Medium as an alternate traffic source and to build credibility.
How do I know if my content is showing on another website?
If your website is built on WordPress you will see notifications in your comments are that are called “trackbacks” which are essentially notifications that other websites have linked to you.
This could be a general link – a valid link to your post because it is useful to their readers – these are OK and offer no potential SEO issues.
The other type are where the linking website takes all of your content as it is on your website and posts it on their own website – this is what we do not want.
An example below shows a sample of a link back in a clients comments area. It shows the website that the link is coming from and when we click on the link – we see the clients post copied word for word.
Here is a link to the scraped post – and because we DO NOT want that website to gain any link juice benefit from us – you will have to copy and past the URL to see it: https://jodivey.wordpress.com/2018/02/26/cocker-spaniel-road-to-agility-at-westminster/
What is really funny — if you go to the bottom of the page you will see they are crediting the content to a completely different blog that was probably also scraping content since they seem to be shut down.
But right now this scraped content post is not an issue since their post is not yet even indexed as shown in the image below.
Copyscape is a great free tool where you can input your URL and it will let you know if there is any of your content being duplicated elsewhere. The free version is limited – but if you upgrade you can actually run your whole website through the tool to see if you can find any duplicate content issues.
Google Alerts is a great way to get a quick view of all the pages that are mentioning your post – whether via a link or content. Below you can see an example we did for our friends over at Fidose of Reality – we are using them because we know they had their most recent post completely scraped as shown in the screenshot above from their trackbacks.
So with Google Alerts you can jump in and enter the TITLE of your post with the quotation marks to see any exact copies of your page.
Once you enter the information you will see it bring up if there are any mentions of this post – and as you can see below there are a few. DZ’s Adventure is a fellow blogger who just shows the title of the post as a “recommended blogger” link- this is A-OK!
The iLead page takes us to a page with a few words and a read more button – when you click that you go to yet another page where you then see a few sentences and yet another read more, but that last read more is a link to the Fidose website. This too is OK — annoying, but OK.
You can use this tool without actually creating alert like we have done below, just entering the title will bring up current results – but if you are really serious about monitoring your content, then go ahead and set up an alert so you can be notified of any mentions of each and every post.
In this specific test the jodivey.wordpress.com website result was not returned – because it has not been indexed yet, but if you actually go ahead and setup this alert – you will be notified if and when it is indexed.
Will this stolen content affect your ranking?
There have been cases of the stolen content outranking the original, but rarely is it for the keywords you are trying to optimize for. Typically when we see these scraped content pages rank it is for keywords that are not really being searched.
If the page that scraped your content does not use a rel=canonical tag, give you credit or noindex the post – then Google may think it is their original content.
Yes, this is INCREDIBLY frustrating.
But, in most cases – there is not a lot of concern. Let’s take the sample above – we ran that website through Ahrefs.com and as you can see they have no authority, no value and almost no keywords ranking. So although they have been posting to this blog from what we can tell at least a year – they are not gaining any traffic from it.
So in this specific case we would not even worry about the content being on the website – from an SEO perspective you would be just fine. Of course there is the issue of controlling your content – and if you have a brand you value, you may want to look into a takedown request.
How does Google know if my content is MY content!
Google is pretty smart – they have a lot going on behind the scenes that will help them understand who is the original author. Here are a few ways they can establish who the real author of the content is:
- Publish date
- Social signals
When should I be concerned about content scraping?
If there is a website that has a decent amount of traffic that has scraped your content that is a whole other conversation!
While a large portion of scraped content is done by low quality websites that should not be a concern – getting scraped by a quality site is a much different case and you should take some action to have it removed.
What can I do if my content has been stolen?
There are several options for handling scraped content – sadly none of them are quick or easy. But if you feel a real threat is in place or your websites authority/brand can be effected then we highly recommend the following:
- First see if you can find who owns the website. Check the about page, footer area or do a domain search to see if there is a public record available if none of the other options show you an email address.
- If there IS an email available – contact them and ask them to remove the content. Send a couple of emails if you get no response.
- If there is NO email available or if there is no response to your emails you can do a DMCA complaint. This will incur a fee and you do have to find the information for the website to provide to DMCA.
- You can submit a Google Removal Request
- If it is a website on WordPress or Blogger – you can reach out to those entities to see if they can help you take some kind of action.
- Although they rarely get involved – you can try the hosting company that hosts the website to see if they will assist you as a last resort.
Recommendation For A Disavow
If the above options do not result in your favor – a final option to at least tell Google you want nothing to do with these websites is to do what is called a disavow.
Google has a tool that will allow you to note all the URL’s that you think are harming your website – then Google will consider those links when they evaluate your website.
Having your content stolen, no matter how it is done – is frustrating. We have had it done to us and in some cases we did not bother to try and get it removed because we had no concerns about it hurting us. We have had some success with a simple email and in 1 case we filed a complaint with their host who helped us in getting the content removed.
Have you had this happen to you?
What actions did you take and did it work?