How Do You Define Duplicate Content And Should We Be Worried About It?

By:


The argument about exactly how duplicate content is defined and whether duplicate content is a problem has been underway for a long time now and there is no sign that it is going away. So exactly what constitutes duplicate content and is it a problem?

It is widely accepted that duplicate content is important and, although one high profile search engine optimization expert recently expressed the opposite view, even a cursory peek at the mountain of material that has been written on this topic recently will clearly demonstrate that this is a minority view.

If we accept that duplicate content is in fact important, then just how ought we to go about defining duplicate content? If I write an original article for an article directory and then re-work that same article for submission to a second article directory how will the search engines check these two articles and decide whether they contain duplicate content? The fact of the matter is that we do not know, however, here is this writer's view.

When duplicate content checking was first undertaken by the major search engines it was very much a case of viewing one web page as a whole against another and there was no attempt to begin cutting up the pages and comparing individual elements of the pages. At that time it was possible to take identical content and just add an introduction and conclusion to one of the pages and that would be enough to escape any duplicate content penalty. Sadly for many publishers these days have long since disappeared.

Nowadays, the major search engines cut up the two pages to permit them to examine individual elements and it is this which is the core of the present disagreement. Most webmasters agree that attention is focused upon the central content of a page rather than the structure of the page. A great many website owners make use of templates for their pages which define the structure of each page including such things as menus, headers and footers. This is widely thought to be accepted and the major search engines do not see this as being duplicate content. What the major search engines are looking at is the actual content contained within the body of the page. But exactly how do they go about examining this page content?

Some people think that this examination is carried out at 'block' level (looking at individual sentences or paragraphs), while others contend that filtering searches for phrases or even individual words. Nobody really knows the answer but it might seem reasonable to conclude that the most likely basis of comparison would be to make use of either phrase or sentence matching.

Sentence matching is reasonably straightforward and merely means cutting both pages up into chunks based upon the page's punctuation. For example, take a look at this sentence:

It is reasonably simple to get a good deal on a package holiday, providing you know how to negotiate.

This would either be viewed as one single sentence or as two sentences, depending upon whether you use the time honored definition of a full-stop as indicating the end of a sentence or adopt a flexible approach which would make use of other punctuation marks, such as commas.

Matching based upon phrases is a little more difficult. What is the definition of a phrase? Should a phrase be made up of 2 words or 3 words or 4 words or�?

For now let us assume that a phrase is defined as 3 words. If this is the case the following phrases would all be viewed as duplicate content if they were to appear on two pages which were being compared:

Did you know
In the end
Take a look
One way to
In those days

These five phrases are all ordinary everyday phrases that could appear on pages about tropical fish, cycling, making money online or any other topic you care to mention. Now some people contend that the major search engines do check pages down to this level. Indeed, when I questioned the support staff for one popular content checker (Dupecop) about the basis on which they examined duplicate content they said:

"DupeCop compares both individual words and 3-word phrases. It also ignores all punctuation and scans across sentences"

It was no surprise therefore that when I ran a number of articles through this software (comparing articles on the topic of dogs against articles about Christmas decorations) I found that they showed an average of 25% of duplicate content!

On this basis, I believe it would be absurd that the major search engines would filter down to this level. So just how low would the filters be set? Should they be at 4, 5, 6 words? To be honest, your guess is as good as mine.

Over the years I have published hundreds of articles and have watched the results for signs of duplicate content penalties, as far as it is possible for anybody to do so. Based upon my own experience I am content that filtering is not conducted down to the level of short phrases but is much more likely to stop at the sentence level. Consequently, as long as you are changing articles down to sentence level, you should have no problem in escaping the filters. As a matter of fact, even if a couple of sentences are duplicated you should be fine.


About the Author:
WebMarketingCentre.com provides information on article writing and article submission and is also an article directory where you can pick up free articles for your website or ezine and to which you can submit articles on a wide variety of topics including article marketing and much more.



Article Originally Published On: http://www.articlesnatch.com


|

Loading...
Related....
Videos...

Recent Broadband-Internet Articles

Comments

Still can't find what you are looking for? Search for it!

Loading

Copyright 2005-2011 ArticleSnatch, LLC - All Rights Reserved.
Privacy Policy | Terms of Service.