Duplicate Content has become a huge topic of discussion lately, thanks to the new filters that search engines have implemented. This article will help you understand why you might be caught in the filter, and ways to avoid it. We'll also show you how you can determine if your pages have duplicate content, and what to do to fix it.
Search engine spam is any deceitful attempts to deliberately trick the search engine into returning inappropriate, redundant, or poor-quality search results. Many times this behavior is seen in pages that are exact replicas of other pages which are created to receive better results in the search engine. Many people assume that creating multiple or similar copies of the same page will either increase their chances of getting listed in search engines or help them get multiple listings, due to the presence of more keywords.
In order to make a search more relevant to a user, search engines use a filter that removes the duplicate content pages from the search results, and the spam along with it. Unfortunately, good, hardworking webmasters have fallen prey to the filters imposed by the search engines that remove duplicate content. It is those webmasters who unknowingly spam the search engines, when there are some things they can do to avoid being filtered out. In order for you to truly understand the concepts you can implement to avoid the duplicate content filter, you need to know how this filter works.
First, we must understand that the term "duplicate content penalty" is actually a misnomer. When we refer to penalties in search engine rankings, we are actually talking about points that are deducted from a page in order to come to an overall relevancy score. But in reality, duplicate content pages are not penalized. Rather they are simply filtered, the way you would use a sieve to remove unwanted particles. Sometimes, "good particles" are accidentally filtered out.
Knowing the difference between the filter and the penalty, you can now understand how a search engine determines what duplicate content is. There are basically four types of duplicate content that are filtered out:
Websites with Identical Pages - These pages are considered duplicate, as well as websites that are identical to another website on the Internet are also considered to be spam. Affiliate sites with the same look and feel which contain identical content, for example, are especially vulnerable to a duplicate content filter. Another example would be a website with doorway pages. Many times, these doorways are skewed versions of landing pages. However, these landing pages are identical to other landing pages. Generally, doorway pages are intended to be used to spam the search engines in order to manipulate search engine results.
Scraped Content - Scraped content is taking content from a web site and repackaging it to make it look different, but in essence it is nothing more than a duplicate page. With the popularity of blogs on the internet and the syndication of those blogs, scraping is becoming more of a problem for search engines.
E-Commerce Product Descriptions - Many eCommerce sites out there use the manufacturer's descriptions for the products, which hundreds or thousands of other eCommerce stores in the same competitive markets are using too. This duplicate content, while harder to spot, is still considered spam.
Distribution of Articles - If you publish an article, and it gets copied and put all over the Internet, this is good, right? Not necessarily for all the sites that feature the same article. This type of duplicate content can be tricky, because even though Yahoo and MSN determine the source of the original article and deems it most relevant in search results, other search engines like Google may not, according to some experts.
So, how does a search engine's duplicate content filter work? Essentially, when a search engine robot crawls a website, it reads the pages, and stores the information in its database. Then, it compares its findings to other information it has in its database. Depending upon a few factors, such as the overall relevancy score of a website, it then determines which are duplicate content, and then filters out the pages or the websites that qualify as spam. Unfortunately, if your pages are not spam, but have enough similar content, they may still be regarded as spam.
There are several things you can do to avoid the duplicate content filter. First, you must be able to check your pages for duplicate content. Using our Similar Page Checker, you will be able to determine similarity between two pages and make them as unique as possible. By entering the URLs of two pages, this tool will compare those pages, and point out how they are similar so that you can make them unique.
Since you need to know which sites might have copied your site or pages, you will need some help. We recommend using a tool that searches for copies of your page on the Internet: www.copyscape.com. Here, you can put in your web page URL to find replicas of your page on the Internet. This can help you create unique content, or even address the issue of someone "borrowing" your content without your permission.
Let's look at the issue regarding some search engines possibly not considering the source of the original content from distributed articles. Remember, some search engines, like Google, use link popularity to determine the most relevant results. Continue to build your link popularity, while using tools like www.copyscape.com to find how many other sites have the same article, and if allowed by the author, you may be able to alter the article as to make the content unique.
If you use distributed articles for your content, consider how relevant the article is to your overall web page and then to the site as a whole. Sometimes, simply adding your own commentary to the articles can be enough to avoid the duplicate content filter; the Similar Page Checker could help you make your content unique. Further, the more relevant articles you can add to compliment the first article, the better. Search engines look at the entire web page and its relationship to the whole site, so as long as you aren't exactly copying someone's pages, you should be fine.
If you have an eCommerce site, you should write original descriptions for your products. This can be hard to do if you have many products, but it really is necessary if you wish to avoid the duplicate content filter. Here's another example why using the Similar Page Checker is a great idea. It can tell you how you can change your descriptions so as to have unique and original content for your site. This also works well for scraped content also. Many scraped content sites offer news. With the Similar Page Checker, you can easily determine where the news content is similar, and then change it to make it unique.
Do not rely on an affiliate site which is identical to other sites or create identical doorway pages. These types of behaviors are not only filtered out immediately as spam, but there is generally no comparison of the page to the site as a whole if another site or page is found as duplicate, and get your entire site in trouble.
The duplicate content filter is sometimes hard on sites that don't intend to spam the search engines. But it is ultimately up to you to help the search engines determine that your site is as unique as possible. By using the tools in this article to eliminate as much duplicate content as you can, you'll help keep your site original and fresh.ws
More SEO...
Posted by
alharits
at
Friday, September 26, 2008
The Issue at Hand
Websites that utilize databases which can insert content into a webpage by way of a dynamic script like PHP or JavaScript are increasingly popular. This type of site is considered dynamic. Many websites choose dynamic content over static content. This is because if a website has thousands of products or pages, writing or updating each static by hand is a monumental task.
There are two types of URLs: dynamic and static. A dynamic URL is a page address that results from the search of a database-driven web site or the URL of a web site that runs a script. In contrast to static URLs, in which the contents of the web page stay the same unless the changes are hard-coded into the HTML, dynamic URLs are generated from specific queries to a site's database. The dynamic page is basically only a template in which to display the results of the database query. Instead of changing information in the HTML code, the data is changed in the database.
But there is a risk when using dynamic URLs: search engines don't like them. For those at most risk of losing search engine positioning due to dynamic URLs are e-commerce stores, forums, sites utilizing content management systems and blogs like Mambo or WordPress, or any other database-driven website. Many times the URL that is generated for the content in a dynamic site looks something like this:
http://www.somesites.com/forums/thread.php?threadid=12345&sort=date
A static URL on the other hand, is a URL that doesn't change, and doesn't have variable strings. It looks like this:
http://www.somesites.com/forums/the-challenges-of-dynamic-urls.htm
Static URLs are typically ranked better in search engine results pages, and they are indexed more quickly than dynamic URLs, if dynamic URLs get indexed at all. Static URLs are also easier for the end-user to view and understand what the page is about. If a user sees a URL in a search engine query that matches the title and description, they are more likely to click on that URL than one that doesn't make sense to them.
A search engine wants to only list pages its index that are unique. Search engines decide to combat this issue by cutting off the URLs after a specific number of variable strings (e.g.: ? & =).
For example, let's look at three URLs:
http://www.somesites.com/forums/thread.php?threadid=12345&sort=date
http://www.somesites.com/forums/thread.php?threadid=67890&sort=date
http://www.somesites.com/forums/thread.php?threadid=13579&sort=date
All three of these URLs point to three different pages. But if the search engine purges the information after the first offending character, the question mark (?), now all three pages look the same:
http://www.somesites.com/forums/thread.php
http://www.somesites.com/forums/thread.php
http://www.somesites.com/forums/thread.php
Now, you don't have unique pages, and consequently, the duplicate URLs won't be indexed.
Another issue is that dynamic pages generally do not have any keywords in the URL. It is very important to have keyword rich URLs. Highly relevant keywords should appear in the domain name or the page URL. This became clear in a recent study on how the top three search engines, Google, Yahoo, and MSN, rank websites.
The study involved taking hundreds of highly competitive keyword queries, like travel, cars, and computer software, and comparing factors involving the top ten results. The statistics show that of those top ten, Google has 40-50% of those with the keyword either in the URL or the domain; Yahoo shows 60%; and MSN has an astonishing 85%! What that means is that to these search engines, having your keywords in your URL or domain name could mean the difference between a top ten ranking, and a ranking far down in the results pages.
The Solution
So what can you do about this difficult problem? You certainly don't want to have to go back and recode every single dynamic URL into a static URL. This would be too much work for any website owner.
If you are hosted on a Linux server, then you will want to make the most of the Apache Mod Rewrite Rule, which is gives you the ability to inconspicuously redirect one URL to another, without the user's (or a search engine's) knowledge. You will need to have this module installed in Apache; for more information, you can view the documentation for this module here. This module saves you from having to rewrite your static URLs manually.
How does this module work? When a request comes in to a server for the new static URL, the Apache module redirects the URL internally to the old, dynamic URL, while still looking like the new static URL. The web server compares the URL requested by the client with the search pattern in the individual rules.
For example, when someone requests this URL:
http://www.somesites.com/forums/the-challenges-of-dynamic-urls.html
The server looks for and compares this static-looking URL to what information is listed in the .htaccess file, such as:
RewriteEngine on
RewriteRule thread-threadid-(.*)\.htm$ thread.php?threadid=$1
It then converts the static URL to the old dynamic URL that looks like this, with no one the wiser:
http://www.somesites.com/forums/thread.php?threadid=12345
You now have a URL that only will rank better in the search engines, but your end-users can definitely understand by glancing at the URL what the page will be about, while allowing Apache's Mod Rewrite Rule to handle to conversion for you, and still keeping the dynamic URL.
If you are not particularly technical, you may not wish to attempt to figure out the complex Mod Rewrite code and how to use it, or you simply may not have the time to embark upon a new learning curve. Therefore, it would be extremely beneficial to have something to do it for you. This URL Rewriting Tool can definitely help you. What this tool does is implement the Mod Rewrite Rule in your .htaccess file to secretly convert a URL to another, such as with dynamic and static ones.
With the URL Rewriting Tool, you can opt to rewrite single pages or entire directories. Simply enter the URL into the box, press submit, and copy and paste the generated code into your .htaccess file on the root of your website. You must remember to place any additional rewrite commands in your .htaccess file for each dynamic URL you want Apache to rewrite. Now, you can give out the static URL links on your website without having to alter all of your dynamic URLs manually because you are letting the Mod Rewrite Rule do the conversion for you, without JavaScript, cloaking, or any sneaky tactics.
Another thing you must remember to do is to change all of your links in your website to the static URLs in order to avoid penalties by search engines due to having duplicate URLs. You could even add your dynamic URLs to your Robots Exclusion Standard File (robots.txt) to keep the search engines from spidering the duplicate URLs. Regardless of your methods, after using the URL Rewrite Tool, you should ideally have no links pointing to any of your old dynamic URLs.
You have multiple reasons to utilize static URLs in your website whenever possible. When it's not possible, and you need to keep your database-driven content as those old dynamic URLs, you can still give end-users and search engine a static URL to navigate, and all the while, they are still your dynamic URLs in disguise. When a search engine engineer was asked if this method was considered "cloaking", he responded that it indeed was not, and that in fact, search engines prefer you do it this way. The URL Rewrite Tool not only saves you time and energy by helping you use static URLs by converting them transparently to your dynamic URLs, but it will also save your rankings in the search engines.ws
More SEO...
Posted by
alharits
at
Thursday, September 25, 2008
If you've read anything about or studied Search Engine Optimization, you've come across the term "backlink" at least once. For those of you new to SEO, you may be wondering what a backlink is, and why they are important. Backlinks have become so important to the scope of Search Engine Optimization, that they have become some of the main building blocks to good SEO. In this article, we will explain to you what a backlink is, why they are important, and what you can do to help gain them while avoiding getting into trouble with the Search Engines.
What are "backlinks"? Backlinks are links that are directed towards your website. Also knows as Inbound links (IBL's). The number of backlinks is an indication of the popularity or importance of that website. Backlinks are important for SEO because some search engines, especially Google, will give more credit to websites that have a good number of quality backlinks, and consider those websites more relevant than others in their results pages for a search query.
When search engines calculate the relevance of a site to a keyword, they consider the number of QUALITY inbound links to that site. So we should not be satisfied with merely getting inbound links, it is the quality of the inbound link that matters.
A search engine considers the content of the sites to determine the QUALITY of a link. When inbound links to your site come from other sites, and those sites have content related to your site, these inbound links are considered more relevant to your site. If inbound links are found on sites with unrelated content, they are considered less relevant. The higher the relevance of inbound links, the greater their quality.
For example, if a webmaster has a website about how to rescue orphaned kittens, and received a backlink from another website about kittens, then that would be more relevant in a search engine's assessment than say a link from a site about car racing. The more relevant the site is that is linking back to your website, the better the quality of the backlink.
Search engines want websites to have a level playing field, and look for natural links built slowly over time. While it is fairly easy to manipulate links on a web page to try to achieve a higher ranking, it is a lot harder to influence a search engine with external backlinks from other websites. This is also a reason why backlinks factor in so highly into a search engine's algorithm. Lately, however, a search engine's criteria for quality inbound links has gotten even tougher, thanks to unscrupulous webmasters trying to achieve these inbound links by deceptive or sneaky techniques, such as with hidden links, or automatically generated pages whose sole purpose is to provide inbound links to websites. These pages are called link farms, and they are not only disregarded by search engines, but linking to a link farm could get your site banned entirely.
Another reason to achieve quality backlinks is to entice visitors to come to your website. You can't build a website, and then expect that people will find your website without pointing the way. You will probably have to get the word out there about your site. One way webmasters got the word out used to be through reciprocal linking. Let's talk about reciprocal linking for a moment.
There is much discussion in these last few months about reciprocal linking. In the last Google update, reciprocal links were one of the targets of the search engine's latest filter. Many webmasters had agreed upon reciprocal link exchanges, in order to boost their site's rankings with the sheer number of inbound links. In a link exchange, one webmaster places a link on his website that points to another webmasters website, and vice versa. Many of these links were simply not relevant, and were just discounted. So while the irrelevant inbound link was ignored, the outbound links still got counted, diluting the relevancy score of many websites. This caused a great many websites to drop off the Google map.
We must be careful with our reciprocal links. There is a Google patent in the works that will deal with not only the popularity of the sites being linked to, but also how trustworthy a site is that you link to from your own website. This will mean that you could get into trouble with the search engine just for linking to a bad apple. We could begin preparing for this future change in the search engine algorithm by being choosier with which we exchange links right now. By choosing only relevant sites to link with, and sites that don't have tons of outbound links on a page, or sites that don't practice black-hat SEO techniques, we will have a better chance that our reciprocal links won't be discounted.
Many webmasters have more than one website. Sometimes these websites are related, sometimes they are not. You have to also be careful about interlinking multiple websites on the same IP. If you own seven related websites, then a link to each of those websites on a page could hurt you, as it may look like to a search engine that you are trying to do something fishy. Many webmasters have tried to manipulate backlinks in this way; and too many links to sites with the same IP address is referred to as backlink bombing.
One thing is certain: interlinking sites doesn't help you from a search engine standpoint. The only reason you may want to interlink your sites in the first place might be to provide your visitors with extra resources to visit. In this case, it would probably be okay to provide visitors with a link to another of your websites, but try to keep many instances of linking to the same IP address to a bare minimum. One or two links on a page here and there probably won't hurt you.
There are a few things to consider when beginning your backlink building campaign. It is helpful to keep track of your backlinks, to know which sites are linking back to you, and how the anchor text of the backlink incorporates keywords relating to your site. A tool to help you keep track of your backlinks is the Domain Stats Tool. This tool displays the backlinks of a domain in Google, Yahoo, and MSN. It will also tell you a few other details about your website, like your listings in the Open Directory, or DMOZ, from which Google regards backlinks highly important; Alexa traffic rank, and how many pages from your site that have been indexed, to name just a few.
Another tool to help you with your link building campaign is the Backlink Builder Tool. It is not enough just to have a large number of inbound links pointing to your site. Rather, you need to have a large number of QUALITY inbound links. This tool searches for websites that have a related theme to your website which are likely to add your link to their website. You specify a particular keyword or keyword phrase, and then the tool seeks out related sites for you. This helps to simplify your backlink building efforts by helping you create quality, relevant backlinks to your site, and making the job easier in the process.
There is another way to gain quality backlinks to your site, in addition to related site themes: anchor text. When a link incorporates a keyword into the text of the hyperlink, we call this quality anchor text. A link's anchor text may be one of the under-estimated resources a webmaster has. Instead of using words like "click here" which probably won't relate in any way to your website, using the words "Please visit our tips page for how to nurse an orphaned kitten" is a far better way to utilize a hyperlink. A good tool for helping you find your backlinks and what text is being used to link to your site is the Backlink Anchor Text Analysis Tool. If you find that your site is being linked to from another website, but the anchor text is not being utilized properly, you should request that the website change the anchor text to something incorporating relevant keywords. This will also help boost your quality backlinks score.
Building quality backlinks is extremely important to Search Engine Optimization, and because of their importance, it should be very high on your priority list in your SEO efforts. We hope you have a better understanding of why you need good quality inbound links to your site, and have a handle on a few helpful tools to gain those links.ws
More SEO...
Posted by
alharits
at
Wednesday, September 24, 2008


