I’ve never heard of folks having trouble getting EE urls indexed- though removing the query string is of course a good idea. I’d also suggest dropping the .php bit. But if that’s how your urls are structured, EE should be a spider magnet. At least- it can be, but that will pretty much depend on SEO practices. You don’t want to construct your site based on frames or all js navigation or all flash navigation- and give the spiders no way to do their thing.
About the only concern I’ve heard is folks worried about duplicate content penalties- given that EE may default to the site index template in the case of bad urls. However, with the ability to show a custom 404 through ee (and deliver a 404 response) and the ability to show a 404 header in the case of ‘no results’ on single entry pages, good design and site practices should be able to eliminate that potential issue.
In other words- there’s no reason an EE site shouldn’t be conducive to and benefit from good SEO practices, and if such practices are employed, it won’t look any different than a static site as far as the spiders are concerned.
Serps?
EE composes standard html pages and serves them. To the outside world (browsers, SEs) they are standard pages. There shouldn’t be any reason a search engine doesn’t index them.
If you search the forums, there have been several discussions on this.
EE definitely plays well in the search engine realm. It’s one of our specialities and I have never had problems getting an EE site to rank well.
If an EE site isn’t fully indexed in the SERPs, it’s probably because of duplicate content (either real or imagined) or something else that has nothing to do with EE is going on.
EE is just the engine so if you are using your own templates you will need to make sure you use valid and semantic (X)HTML. There is no reason why EE should not be extremely SEF with a bit of care.
The mod_rewrite to remove index.php posted on the wiki is not very SEF as it will serve pages with a 200 code even when they don’t exist. This isn’t by the EE team though but be careful. Google will penalise duplicate content.
I played around with my mod_rewrite rules and I’ve got it working really well.
A good tip to is to create a Google sitemap template and submit it via Google Sitemaps.
You can find a tutorial here http://expressionengine.com/wiki/Google_Sitemaps
Yea- I just switched back to an alternative htaccess- the exclude method I do believe. Which has got a few unwanted 404’s showing up now, but I wanted to be serving up true 404’s when I should be.
Also wanted to keep up with you guys when you ask about it!
Hmm, that’s interesting. Just today I checked out my sites through Google’s webmaster tools, and when I did the verification to add my sites to my account, it wouldn’t work at first because it said my 404s came up with status code 200. However, my 404s are actual files, so I am not sure why the rewrites I use to remove index.php would affect that.
I would have thought, though, that the exclude method would be worse than the include. Isn’t the exclude where you rewrite everything except some specific urls? Or is it the other way around?
The include method just didn’t work for me, and the exclude was easier. I don’t add static files/folders as often as I add template groups. But- I’m pretty sure either would be fine. It’s the other option that rewrites everything that doesn’t exist- thus sending the 200 headers.
I’ve checked it out- http://media-cow.net/boo - there is no ‘boo’. The link should show my 404 template, which is just a link to the search page. And FF shows the headers:
Response Headers - http://media-cow.net/boo
Date: Mon, 25 Sep 2006 17:05:39 GMT edit for space..... Connection: close Transfer-Encoding: chunked Content-Type: text/html
404 Not Found
I haven’t dinked with it much, just switched back over recently. But looked right to me in the initial run down. Only problem I seem to have is a few real folders that have similar names to templates not getting picked up by the exclude and thus throwing 404’s where they do exist. I haven’t tried real hard to fix that yet.
www.yourseoplan.com has some tips on checking links to your site and your site’s popularity. And the book is a good start for SEO wannabes. I"m about half way through and have learned a lot from the book and various forums. Hell, I’m even blogging on SEO and click fraud these days.
One way to see whether you’re being indexed is to put this in Google search:
Link:yourdomainname.com. Put this in for all of your key landing pages. Check also at MSN, Yahoo and Ask.
Hmm, curious. I rewrite only specific URLs, and it looks like a non-existing page does generate a 404, as it should. However, the actual 404.html generates a 200 ... which it seems it shouldn’t, at least not according to Google. I wonder if that’s the rewrites, or something else.
I just ran into problems with Google. We were listed very well from the beginning.
My pages now produce duplicate content and a lot of Supplemental Results.
There are:
index.php/site/ pages
index.php/site/comments/ pages (I can use a different title here through a diff. template)
index.php/site/Px/ pages from pagination ! Big Problem
index.php/site/Cx/ pages from categories
index.php/site/2006/12/ pages from archives
Did I forget some?
Almost all the pages are now supplemental results, because the title and description meta tags are basicly the same.
Is there a way to distinguish between al those different types of pages to give them different titles and decriptions?
for the index template title tag, so all the other pages use the same (newest in my case) title. When I add a new entry, all those pages get the new title also. Google does not like it!
Any ideas?
Just found:
{if category_request}show something else{/if}
That should help for the index.php/site/Cx/ pages from categories
It seems that all your templates show your entire posts. You may want to consider having the index (default) template just show the summary part of your text. This template is used for all but one of your examples above.
Then show the entire post only through your “comments” template. This will then have unique content and a unique URL, and it is very likely that it will be indexed.
You may even want to avoid indexing the other variants. One way is to add the following line to the <head> section of your index template:
<meta name=“robots” content=“noindex, follow”>
Not all search engines respect this tag, but at least you’ve given it a try.
I like to have about 5 to 10 complete articles on the index.php. If an entry is really long then I use the {extended} tag.
When someone clicks the title of the article the index.php/site/comments/ page will be shown. So the index and the comments pages can have different headers/titles/descriptions through templates and with
I’ve never heard of folks having trouble getting EE urls indexed- though removing the query string is of course a good idea. I’d also suggest dropping the .php bit. But if that’s how your urls are structured, EE should be a spider magnet. At least- it can be, but that will pretty much depend on SEO practices. You don’t want to construct your site based on frames or all js navigation or all flash navigation- and give the spiders no way to do their thing.
About the only concern I’ve heard is folks worried about duplicate content penalties- given that EE may default to the site index template in the case of bad urls. However, with the ability to show a custom 404 through ee (and deliver a 404 response) and the ability to show a 404 header in the case of ‘no results’ on single entry pages, good design and site practices should be able to eliminate that potential issue.
In other words- there’s no reason an EE site shouldn’t be conducive to and benefit from good SEO practices, and if such practices are employed, it won’t look any different than a static site as far as the spiders are concerned.
Ditto. EE’s known for being more than SEF, from all I can find. I’ve never had a problem getting search engines to index EE sites. Google does a superb job with some of my EE sites, creating a monopoly position for itself vs. MSN and Yahoo, who do a crummy job in comparison.
EE’s wonderful Template Group and Template naming scheme are absolutely perfect for gaining points in Google’s ranking system (better, of course, is a high number of high ranking sites which link back to yours—nothing is better than that).
EE is just the engine so if you are using your own templates you will need to make sure you use valid and semantic (X)HTML. There is no reason why EE should not be extremely SEF with a bit of care.
The mod_rewrite to remove index.php posted on the wiki is not very SEF as it will serve pages with a 200 code even when they don’t exist. This isn’t by the EE team though but be careful. Google will penalise duplicate content.
I played around with my mod_rewrite rules and I’ve got it working really well.
Agreed. EE doesn’t have much to do with poor SE results, but could help results dramatically. Removing “index.php” has no effect on SE results. Ditto for hacking EE to add “.html” to each file name in the URL string.
I had worries a few weeks ago concerning this because I optimize my sites very well. My relatively new site runs on EE. I had doubts why it did not get any PageRank. But, alas, it now jumped from 0 to 5 eventhough I do not update my site more than once or twice a month. I know PR does not have all that much to do with SEO, but overall, my sites ranks very well for my keywords and I do have the index.php removed via htacess, so EE in and of itself does not negatively influence SEO or PR. You can name your entries, weblogs, categories etc. as SE friendly as you like. That and well written content will return optimal results.
SEO is not something “built-in” but rather a process. The CMS needs to be flexible enough to accommodate the configuration adjustments desired for SEO purposes.
So while it might initially seem sensible to call something “SEO friendly”, after the first 2 minutes that is actually not a good way to discuss the issue.
Looking above at Shape Shed, look at how these three URLs produce the same view:
Any SEO or Google engineer will tell you that is a sub-optimal configuration. Now how does one fix it? Can it be easily fixed though adjustments to EE’s configuration parameters, to take effect site-wide, or do we go to the server (rewrite rules) and are we even able to do that if we want to?
Those types of questions define whether or not a system is “seo friendly”. How optimal can you make the site, via how much work…and how does publishing on one platform compare to publishing the same content on another platform, when the optimization requirements are included in the evaluation?
I have not worked with EE as an SEO project yet because I am happy with ModXcms, but I could if there were a good reason to put in the work and I will be evaluating it as ModX exhibits an uncertain future. EE is obviously a good system for many other reasons, and well supported.
but keep in mind they are posted by an admin of the EE site. I doubt they will address the “strict URLs” issue I noted above, although I wouldnot be surprised if they did or someone else did with a plugin.
People talk about “SEO Friendly” as if it’s some sort of absolute, like you will or won’t be found in Google without it. That’s just not the case ( unless you decide to build your sites entirely out of flash, and even that isn’t totally opaque any more).
There are an almost endless list of tweaks that you can make to ensure that the search engines can find, and understand your content. For most of them, you’d be better off just writing something useful for your site and going and getting some links to it.
Assuming you’re using reasonably well coded templates, and the links from page to page are text links rather than using flash or javascript menus you’re going to be ok on an out of the box EE installation unless your site is huge.
If your site IS huge, invest some time in building a Google sitemap and some RSS feeds for Feedburner, then get on with your content again.
As an idea whether EE is ‘google friendly’ I built SEO Training UK as my first practice site (you’ll recognise the template if you’ve done the Boyink tutorials). All I did that’s non standard is remove the index.php, set up a google sitemap, and an rss feed for the blog and it now rates #1 for ‘SEO Training’in Google UK.
Apart from a site-wide link from my own blog it only has a couple of other links to it, the rest is all down to writing relevant content.