Jump to content

Community

AaronP

URL Enhancements

Recommended Posts

I can't stress how important it is for IPB to update their URL structure to today's standards. The current URLs are redundant, confusing and bordering on anti-SEO.

According to Google's own SEO guide written 4 years ago, webmasters are urged to maintain a logical hierarchy and avoid using unnecessary characters in URLs:

  • "Create a naturally flowing hierarchy"
  • "[...] effectively work these into your internal link structure"
  • "Google is good at crawling all types of URL structures, even if they're quite complex, but spending the time to make your URLs as simple as possible for both users and search engines can help"
  • "A site's URL structure should be as simple as possible."
  • "Avoid: using lengthy URLs with unnecessary parameters and session IDs"

This is much more true today than it was 4 years ago, for the same reasons as Google mentions in the guide.

Here is why the current structure is bad for SEO:

forum.com/topic/#-topic-name/

Makes no sense as forum.com/topic/ does not exist. It leads to a "Sorry, we couldn't find that!" page. Bad for SEO, bad for visitors and does not make sense as a hierarchy. Even if /topic/ did exist, it's not a logical parent for the topic itself -- which is inside of it's forum.

What it should look like:

forum.com/forum-name/topic-name/

This makes sense because:

The topic is inside of the forum, which is inside of the root of the website (unless you have your forum inside of a sub-folder). Leaving out the unnecessary characters improves crawling, gives more weight to the actual keywords inside of the URL (extra characters dilutes the weight search engines put on the other keywords) and improves usability -- all on top of being more aesthetically pleasing.

Caveats:

If you have long forum names with long sub-forums, your URL may look like this:

forum.com/long-forum-name/long-sub-forum-name/topic-name/

The solution to all of this:

Implement the same functionality WordPress provides, which would allow users to keep their existing structure and switch to a different one (in this case, today's standard).

I would be more than happy to pay you guys out of pocket for implementing this. I see no valid reason to continue the current URL structure other than the continuation of the current. In the long run IPB would benefit greatly from this.

post-497541-0-54881600-1391924369_thumb.

Share this post


Link to post
Share on other sites

Also note:

The addition of the ID's in the current structure seems to serve two purposes (I presume):

1) Locate the thread

2) Avoid duplicate URLs.

WordPress's structure handles #2 in the most efficient way. Once the thread title is added, it already creates the permalink. So if a post blog.com/happy-new-year/ already exists, it would create blog.com/happy-near-year-2/

There's no point in having potentially tens of thousands of URLs with ever-growing ID's in the same for the sake of avoiding a handful of duplicate posts. It's redundant!

Share this post


Link to post
Share on other sites

I can't stress how important it is for IPB to update their URL structure to today's standards.

I confess that this is my biggest concern about IPB and I really hope it is addressed in IPB 4.0. I understand why are are where we are with IPB 3.x, given how the software has developed over time around the forum as the core but I will be very disappointed if IPB 4 maintains this illogical URL/directory structure.

Share this post


Link to post
Share on other sites

It's nice to see others concerned about this as well. Making a 6 figure living off SEO for the past 7 years I can say with great certainty that this functionality should be mandatory.

Share this post


Link to post
Share on other sites

I also want this URL structure, have asked here and here but seems no one knows how to do it or just doesn't bother.

If I remember correctly, I also asked this structure on another forum software I used. They answered that there will be problems when we moving a topic to another forum category, I don't bother to argue but in my mind I have "301 redirect?".

Share this post


Link to post
Share on other sites

I've asked elsewhere as well and even tried to hire a programmer on eLance but they ended up not completing it and giving me a refund. It's easier for IPS to do it -- they know their way around the software.

Just to re-iterate. I think it would be awesome for them to implement their own "permalink" structure.

Some forums I would implement something like this:

forum.com/forum-name/topic/

But on others I would implement:

forum.com/topic/

Share this post


Link to post
Share on other sites

Also note:

The addition of the ID's in the current structure seems to serve two purposes (I presume):

1) Locate the thread

2) Avoid duplicate URLs.

WordPress's structure handles #2 in the most efficient way. Once the thread title is added, it already creates the permalink. So if a post blog.com/happy-new-year/ already exists, it would create blog.com/happy-near-year-2/

There's no point in having potentially tens of thousands of URLs with ever-growing ID's in the same for the sake of avoiding a handful of duplicate posts. It's redundant!

It's a bad idea to leave the ID out of the url in my opinion. I really hope they DO NOT take it that far.

In your concept: What if I want to change the topic title? What happens to previous urls? How would you make the database queries efficient?

I can see it working in a blog / Wordpress environment because the amount of content is much more limited and you're more aware about making a lasting editorial decision. But atleast on the boards we host, where we move topics between forums, improve topic titles, and there are hundred thousands of topics, I can't see this working.

Share this post


Link to post
Share on other sites

the correct url of a forum should be forum.com/id-forumname/id-topicname/ and if I go to the url forum.com/id/id/ should be redirected to forum.com/id-forumname/id-topicname/.

And if the name of the discussion changes during the redirect (the id do the redirect: forum.com/id-forumname/id-topicname/) the url is updated with the new name of the topic forum.com/id-forumname/id-newtopicname/ or forum.com/id-newforumname/id-topicname/.

Share this post


Link to post
Share on other sites

And what happens when you move the topic to a new forum? The forum id and name both change. You can only base the URL based on topic ID alone and 301 redirect everything else.

Adding more data that has the liklihood to change increase the 301's seen by search engines and increases the length of the URL and also adds MORE unique identifiers. You'll find quite a few SEO topics around here of people trying to get rid of the unique identifier. forum.com/topicname/

IMO, IP.Board strikes a great balance in this as there is a content identifier, /topic/, a single unique ID and a name. I don't pretend to be an SEO expert, though. My humble opinion.

Share this post


Link to post
Share on other sites

And what happens when you move the topic to a new forum? The forum id and name both change. You can only base the URL based on topic ID alone and 301 redirect everything else.

Adding more data that has the liklihood to change increase the 301's seen by search engines and increases the length of the URL and also adds MORE unique identifiers. You'll find quite a few SEO topics around here of people trying to get rid of the unique identifier. forum.com/topicname/

While I'd like to have the structure as suggested, I don't what the unique identifiers to be omitted. So what I think the best to have is like:

Base Forum ------------------------------------> /forums/
Parent Category -------------------------------> /forums/1-a-test-category/
Viewing a Category/Viewing Topics List -> /forums/1-a-test-category/2-a-test-forum/
Viewing a Topic -------------------------------> /forums/1-a-test-category/2-a-test-forum/1-welcome/

PHPFox forum has a structure almost similar (no topic identifier) above: http://www.phpfox.com/forum/general-support-and-questions-75/how-to-safely-post-in-an-open-forum/

The topic has already a fixed ID. So if we move a topic to a new forum, only the category and forum will be changed, i.e. /forums/20-new-forum-category/25-new-forum/1-welcome/

I don't know much about SEO either but I think moving a topic can be solve with 301 redirects. I also think 301's will not increase because it will be gone as the time goes. I have tried this on my old WordPress blog where I switched from a date-base structure to just /category_name/post_title/.

Let's say we're moving 10 topics per day. At first, you would think that you'll reach 300 301 redirects in a month, 600 in two months, 900 in three months and so on. But I think that wouldn't be the case. As Google learned that an existing URL was moved to a new URL, it will start dropping the existing old URLs and use the new URL. Therefore, you may have a constant number of 301 redirects if you constantly move new topics but somehow it will reduce in due time. Or in the case of my WP blog, there are no more old URLs in their indexed - thus no more 301 redirects from search engines part, it's now a burden of the Apache to perform the 301 when visitors arrived from old links :smile:

Share this post


Link to post
Share on other sites

It's a bad idea to leave the ID out of the url in my opinion. I really hope they DO NOT take it that far.

In your concept: What if I want to change the topic title? What happens to previous urls? How would you make the database queries efficient?

I can see it working in a blog / Wordpress environment because the amount of content is much more limited and you're more aware about making a lasting editorial decision. But atleast on the boards we host, where we move topics between forums, improve topic titles, and there are hundred thousands of topics, I can't see this working.

I respectfully disagree and have to say that none of this makes any sense.

1) My suggestion would include permalink settings so users who already have the current URL structure or for whatever reason want to use this can do so. This means you can use the "legacy" links for eternity if you want.

2) My current VB forum queries roughly 3 million IP addresses in a database table each time a user hits our registration page (about 500 people/day join our forum). It takes 0.3 seconds to make that query. I don't know why you would think changing topic titles could possibly be inefficient.

3) So, what does happen when you want to change a title? What happens when you want to change a title right now? If you have a topic entitled "Eating Oranges" it amounts to www.forum.com/topic/ID-eating-oranges/ - changing the title to "Eating Two Oranges" results in the URL changing to www.forum.com/topic/ID-eating-two-oranges/ - and visiting the old URL redirects to the new one. What exactly is the problem with the old URL redirecting to the new one in my proposal? The same exact thing would happen minus the ID being present. Is this impossible? Not at all. WordPress still associates post ID's with each and every post -- you just don't see it in the actual URL. Every post in wordpress and every topic/thread in a forum will have an ID associated with it regardless of whether or not that ID appears in the URL. This is pretty elementary...

None of these arguments are valid and even if they were I've provided a more than adequate solution to them.

Having ID's in the URLs is worse than having a bad hierarchy like /topic/topic-name/.

Share this post


Link to post
Share on other sites

What if two people posted a topic called "Eating oranges" in the same forum?

You cannot compare Wordpress to a forum. It's like comparing... well, apples to oranges.

Wordpress content is authored by you or by you and a very small team. You may post a single blog a day, or many a dozen blogs a day. You have total control over the blog title and the page slug. You can avoid posting two blogs with the same title or you can manually change the slug, etc.

A forum can have tens of thousands of new content items added daily by thousands of different authors. You cannot have such tight control in this case.

The debate about IDs in URLs has raged for years. It's quite polemic.

We did consider removing IDs and changing around the URL structure for 4, but after incredibly lengthy talks we decided against it for the reasons mentioned here (and many more).

Our research leads us to believe that having IDs in a URL doesn't do any harm.

I keep going back to the most simple point when discussing SEO: is Google really that dumb that it can't filter out IDs and is Google really that dumb that it believes that most of the web is static pages?

That's not to say that your opinion isn't without merit. Just keep in mind we have to balance these things with our responsibility to our existing customer base who are really quite happily using the current URL structure and its impact on efficiency within the system. Whilst your IP address querying speeds are very impressive, it's a mistake of causation to apply that logic to other areas of code.

Share this post


Link to post
Share on other sites

I just want to point out that while we may not have materially changed the URL structure in 4 as noted above, we do have several other very useful and noteworthy SEO-related changes coming in the next version which we will be blogging about soon.

Share this post


Link to post
Share on other sites

Absolutely. I should have mentioned in my initial reply that we do strongly believe in good "SEO" practises (correct headers, micro formats, hints, clean code, fast loading times, good markup, etc) and this is now baked into the foundation of IPS 4.

Share this post


Link to post
Share on other sites

What if two people posted a topic called "Eating oranges" in the same forum?

You cannot compare Wordpress to a forum. It's like comparing... well, apples to oranges.

Wordpress content is authored by you or by you and a very small team. You may post a single blog a day, or many a dozen blogs a day. You have total control over the blog title and the page slug. You can avoid posting two blogs with the same title or you can manually change the slug, etc.

A forum can have tens of thousands of new content items added daily by thousands of different authors. You cannot have such tight control in this case.

The debate about IDs in URLs has raged for years. It's quite polemic.

We did consider removing IDs and changing around the URL structure for 4, but after incredibly lengthy talks we decided against it for the reasons mentioned here (and many more).

Our research leads us to believe that having IDs in a URL doesn't do any harm.

I keep going back to the most simple point when discussing SEO: is Google really that dumb that it can't filter out IDs and is Google really that dumb that it believes that most of the web is static pages?

That's not to say that your opinion isn't without merit. Just keep in mind we have to balance these things with our responsibility to our existing customer base who are really quite happily using the current URL structure and its impact on efficiency within the system. Whilst your IP address querying speeds are very impressive, it's a mistake of causation to apply that logic to other areas of code.

"Whilst your IP address querying speeds are very impressive, it's a mistake of causation to apply that logic to other areas of code."
There is literally no difference in query between my URL structure and the current. Both have to interact with the database to create the URL.
"Just keep in mind we have to balance these things with our responsibility to our existing customer base who are really quite happily using the current URL structure and its impact on efficiency within the system."
Once again this argument is not valid as my solution already accounts for users that want to maintain their current URL structure. I'm telling you with 100% certainty that any concerns over "efficiency" are invalid.
"I keep going back to the most simple point when discussing SEO: is Google really that dumb that it can't filter out IDs and is Google really that dumb that it believes that most of the web is static pages?"
You're missing the point entirely and this was already discussed. It has nothing to with whether or not Google can filter out ID's or analyze a web page.

"Google is good at crawling all types of URL structures, even if they're quite complex, but spending the time to make your URLs as simple as possible for both users and search engines can help" - from their own SEO guide written 4 years ago.

We're talking about methods to improve indexing and ranking. I already explained that additional characters in URLs dilute the weight of value other keywords within the URL.
"Our research leads us to believe that having IDs in a URL doesn't do any harm."
I've been doing SEO for 7 years and make half a million dollars a year in profit (when I worked longer hours I made upwards of $3,000/day profit). I've done everything from cloaking to working with top 100 alexa ranked sites. I wouldn't be "polemic" about this issue for no reason. It's not a question of "if" it's harmful. It is harmful and that's a fact. You guys develop software. You're not SEO experts. Without trying to sound patronizing or condescending -- I am. It astonishes me that the simplest of requirements for proper URL structure is being overlooked and even more astonishing that the points being argued as to why are completely invalid technically and fundamentally from an SEO standpoint.
"We did consider removing IDs and changing around the URL structure for 4, but after incredibly lengthy talks we decided against it for the reasons mentioned here (and many more)."
I wish I was on this "panel" to discuss this issue as I've explained point by point through every single argument and provided a more than reasonable solution for concerns, and more than valid points as to why they should be changed. I've yet to hear a valid argument against any of them.
"A forum can have tens of thousands of new content items added daily by thousands of different authors. You cannot have such tight control in this case."
You can and I think you know this. Some of the largest media sites in the world use WordPress as their CMS and have millions of posts. I've had numerous WordPress auto-poster sites that had upwards of 750,000 posts using the above URL permalink structure. Everything from database feeds to web scapers that pulled every "question and answer" posted online and made it into a new post. I could have had 1,000 posts entitled "How to remove spyware" with the only difference in URL being www.site.com/how-to-remove-spyware-2/ www.site.com/how-to-remove-spyware-3/ www.site.com/how-to-remove-spyware-4/ and so on. This is exactly how IPB should operate.
"What if two people posted a topic called "Eating oranges" in the same forum? You cannot compare Wordpress to a forum. It's like comparing... well, apples to oranges."
I already discussed several times what happens when a duplicate topic is created:

"2) Avoid duplicate URLs.

WordPress's structure handles #2 in the most efficient way. Once the thread title is added, it already creates the permalink. So if a post blog.com/happy-new-year/ already exists, it would create blog.com/happy-near-year-2/"

I also already mentioned that topic ID's are associated with each topic regardless of whether or not that topic ID will appear in the URL. Another simple solution would be to append the topic ID in the event that there is a duplicate URL. I can come up with at least several more simple solutions to this problem. All efficient, all workable, all much more beneficial than doing something as redundant as forcing an ID in every single URL for the .5% of topics that might share a topic title on a very large, active forum.
Also, I'm not comparing WordPress to IPB. I'm merely using WordPress an example and standard for how URLs should look with customization capabilities.
Long story short: If you are personally in opposition to this and think the current, archaic structure is the way to go. So what? Why not give people who actually care the capability so we can do things the way we think (or in my case, "know") are right? Why force your way onto everyone else? Why can't we have options?
It's nice to know IPB cares about SEO but if you guys can't even get the URL right it's not going to do much. URLs are on the top of the list for basic SEO requirements. You guys are way, way behind with this attitude.

Share this post


Link to post
Share on other sites

I'd also like to point out that when a user types in their title for a WordPress post, wordpress is already querying the database and creating the permalink before the post is actually published. In the extremely unlikely, almost impossible event this would be "inefficient" in a very large IPB forum, it wouldn't matter because by the time they finish their first sentence in creating a thread, the query is already finished. Even if you had developed it to query after the user attempts to publish their new topic, even with $2/month crap shared hosting environment it wouldn't take more than a few seconds to query hundreds of thousands of urls within a table. And if you have hundreds of thousands of threads on a shared hosting environment it's not the query that's inefficient.

Share this post


Link to post
Share on other sites

Isn't it a moot point to remove the topic ID, when you're wanting to add an ID back on to avoid duplicate urls?

There's more to it behind the scenes, first one that comes to mind is the size of the index for the topic table, it's going to be multiples times the size that it is now. This would especially affect sites with millions of topics in the database.

I'm sure you're more than welcome to hook into your own installation to order the URLs how you'd like to do it.

Share this post


Link to post
Share on other sites

Isn't it a moot point to remove the topic ID, when you're wanting to add an ID back on to avoid duplicate urls?

No, you didn't read what I said. We're not adding anything on to avoid duplicate URLs. Nothing is trying to be avoided. It's only when a duplicate URL appears do we then add on an ID or #.

For example, I'm the only person here to ever create a thread entitled "URL Enchancements".

What it looks like:

What it should look like:

http://community.invisionpower.com/topic/url-enhancements/

What a duplicate topic posted would look like when user clicks "Publish Topic":

http://community.invisionpower.com/topic/url-enhancements-2/

"There's more to it behind the scenes, first one that comes to mind is the size of the index for the topic table, it's going to be multiples times the size that it is now. This would especially affect sites with millions of topics in the database."

It would be less size in kb's. There'd be the exact same amount of rows minus the extra characters that are currently added to URLs. Having a series of #'s before each and every URL becomes redundant. If you have 1,000,000 topics, that's 1,000,000 times x number of ID's in each and every URL which most do not need to be there to begin with. It only serves a purpose when someone does create a duplicate URL, which is not efficient. It's efficient when #'s are appended only when necessary.

Share this post


Link to post
Share on other sites

I'd also like to point out that when a user types in their title for a WordPress post, wordpress is already querying the database and creating the permalink before the post is actually published. In the extremely unlikely, almost impossible event this would be "inefficient" in a very large IPB forum, it wouldn't matter because by the time they finish their first sentence in creating a thread, the query is already finished. Even if you had developed it to query after the user attempts to publish their new topic, even with $2/month crap shared hosting environment it wouldn't take more than a few seconds to query hundreds of thousands of urls within a table. And if you have hundreds of thousands of threads on a shared hosting environment it's not the query that's inefficient.

I don't want to zero in to one part of your reasoning, but "it wouldn't take more than a few seconds to query hundreds of thousands of urls within a table" sort of proves a point that it could be inefficient. It might only take a few seconds for that one process, but if you have a busy site you have thousands of processes, all fighting to get to the DB at the same time, so any bottleneck (and even a second returning results can be a bottleneck) will eventually bring down the server as PHP processes queue waiting for their turn to get into the table. As a web developer with a 14 years experience, I know this isn't some theoretical situation; it's one of the foundations of writing web applications.

Anything regarding SEO has to be treated very carefully. We have to walk slowly with the scissors. Any change can have dramatic consequences for existing customers. I give you one fringe example: Neowin upgraded to a new version which meant that as far as Google was concerned it needed deep spidering. This resulted in them being virtually offline for a few days as the server couldn't cope with the additional traffic. After much hair pulling they spotted an incorrect throttle in robots.txt.

I'm not saying you're wrong or your ideas don't have merit but there's more to this than just throwing out a URL change.

If you're willing to talk more about it, please email me (mmecham at invisionpower . com). I'll be grateful for any insight you have and a more personal method of communication will no doubt be more productive.

Share this post


Link to post
Share on other sites

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...