Sinlung: Dealing with Duplicate Content

by Sunil Setty

It seems that every month or so more and more webmasters are getting hit by Google’s attempt to sort out the issue of duplicate content. Duplicate content of course is content or blocks or content that appear across multiple sites.

And in Google’s mind, and rightfully so in most cases, why should they waste their resources (i.e., their money!) indexing stuff that’s already out there. “If it’s already on the web in one place, why do we need to index it again?” Makes sense, right?

But what’s the definition of duplicate content? Well, that’s tough to answer and only Google knows for sure, assuming they do at all.

But be forewarned that if you run chunks of text and especially full content that’s already on the web than you may get hit. If you do run some duplication – and many of us do, especially news sites like sinlung.com – make sure you have plenty of original content around it. Don’t expect to pull a wikipedia page and immediately have it rank on the first page. It’s just not going to work in the long run.

And also be wary of copying the html code. My guess is Google also can spot similarities in coding across sites make unfavorable judgments off that.

For most webmasters though problems comes when sites have varying degrees of duplicate text. And when you add in RSS feeds, syndication, social bookmarking and scraping, it’s hard to keep all your content 100% unique to your site.

If you do find your content stolen of used without permission, you can try to get it removed by contacting the webmaster or taking it a step further by filing a DMCA.

But for Google it’s still tough to decide. For example, what if this post or pieces of this post shows up on multiple websites? That when things get tricky.

Google will try to give credit to the original poster, but if your site is slow to get indexed and large portions of your content show up on say an authoritative site, then you are probably going to lose out.

Google says that it will simply ignore duplicate content, but do they always follow that policy? If your site contains a high amount of duplicate content, even your original stuff may get clipped and taken behind the woodshed (i.e. -950 land!), and that’s not where you want to be

So if you want to avoid the wrath of Google when it comes to duplicate content, make sure you site is unique as possible and cut down on your syndication. If you do use RSS feeds like in Wordpress, make sure you are not initially syndicating the full posts but rather the excerpt of the summary. And if you can make that summary unique then even better.

Also, make sure your site doesn’t contain large chunks of you original content across multiple pages.
There’s a lot more to it of course, but those are just some quick guidelines to help you with the issue of duplicate content.