Jump to content

Community

AaronP

URL Enhancements

Recommended Posts

I don't want to zero in to one part of your reasoning, but "it wouldn't take more than a few seconds to query hundreds of thousands of urls within a table" sort of proves a point that it could be inefficient. It might only take a few seconds for that one process, but if you have a busy site you have thousands of processes, all fighting to get to the DB at the same time, so any bottleneck (and even a second returning results can be a bottleneck) will eventually bring down the server as PHP processes queue waiting for their turn to get into the table. As a web developer with a 14 years experience, I know this isn't some theoretical situation; it's one of the foundations of writing web applications.

While I appreciate you addressing the issue and responding to my concern, your direct quote from me is missing my preceding statement of:

"even with $2/month crap shared hosting environment"

followed by:

"And if you have hundreds of thousands of threads on a shared hosting environment it's not the query that's inefficient."

There is literally no difference in database query between the current URL structure and the one that I advocate.

From the beginning I've suggested to adopt the way WordPress uses their Permalink structure.

If you look at WordPress's table wp_posts you're not going to find a "wp_URL" column. You'll find a "guid" column.

The URL's in this column are the base URLs that look like this:

yoursite.com/?p=n (where n is your post ID)

Even if using the permalink structure /%POSTNAME%/ - none of the URLs actually change within your database table.

If you have the site.com/post-name/ structure, visiting site.com/?p=n will still lead to the post.

The point is the rewrites are taking place via .htaccess and mod_rewrite -- they're not taking place at the database. Thus, the argument that this could be inefficient is false. Like I said before: the database query remains exactly the same between the two because the database entry for the URL is the same regardless of what your "permalink" structure is.

This means that when a duplicate post/url appears, there's no duplicate within the database. The database values are all unique since each post gets it's own ID. What changes is the rewrites that recognize this and merely append a # (or insert the ID) into the URL to avoid duplicates.

You can look at how WordPress does this via their /wp-includes/rewrite.php file as well as their /wp-admin/options-permalink.php files.

I'm not comparing WP to IPB. I'm giving a starting point to see what they're doing so you can implement the rewrites. You're already implementing rewrites. All you need to do is strip the ID and then add the additional info for duplicate URL rewrites.

If you want to talk about this personally I'd be happy to but if you've already made up your mind which it sounds like it would be a waste of both of our time.

My only complaint is that it would suck to have to dish out a bunch of money, find a programmer I can trust and then worry about what happens when I upgrade or add features. It'd be nice if this was supported. If all else, people who don't want to use this don't have to with a permalink option. The people who do, can. It's win-win.

Share this post


Link to post
Share on other sites

It would be interesting to profile a query based on a numeric index vs a text index. I think that's ultimately what the difference will be here in terms of speed of data fetching.

While IP.Board does rewrite ?showtopic=x to /topic/id-name, you have to account for those that come from Google or type /topic/id-name. You have to be able to take that FURL and convert it back to something you can use to search the database. So now your search is based on a text name rather than a numeric ID. Even though that numeric ID still exists, you can't use it in your initial query if you don't have it.

Edited by Aiwa

Share this post


Link to post
Share on other sites

It would be interesting to profile a query based on a numeric index vs a text index. I think that's ultimately what the difference will be here in terms of speed of data fetching.

While IP.Board does rewrite ?showtopic=x to /topic/id-name, you have to account for those that come from Google or type /topic/id-name. You have to be able to take that FURL and convert it back to something you can use to search the database. So now your search is based on a text name rather than a numeric ID. Even though that numeric ID still exists, you can't use it in your initial query if you don't have it.

Just explained in last response there's no text index (isn't numeric, text, anyway?).

The base URL in the database will always be the topic ID.

Rewrites take place through htaccess/mod_rewrite.

Share this post


Link to post
Share on other sites

The point is the rewrites are taking place via .htaccess and mod_rewrite -- they're not taking place at the database. Thus, the argument that this could be inefficient is false. Like I said before: the database query remains exactly the same between the two because the database entry for the URL is the same regardless of what your "permalink" structure is.

Just explained in last response there's no text index (isn't numeric, text, anyway?).

The base URL in the database will always be the topic ID.

Rewrites take place through htaccess/mod_rewrite.

The thing is this, in a nutshell, searching on a primary index is fast - really really fast. By having the ID in the URL, you can intval that and grab the item from the database with such an efficient query.

The database query does not remain exactly the same. it is a switch from

SELECT * FROM topics WHERE tid=X;

to:

SELECT * FROM topics WHERE title_seo=X;

I'm not weighing in on one way or the other, just stating that there really is a difference in the query, and that indexes themselves, and the difference between a lookup on a primary auto-increment integer and an indexed varchar is indeed something to consider.

Share this post


Link to post
Share on other sites

Let me try and explain this again.

If you visit this topic from a link someone posts elsewhere you're going to be linked to this


So, lets take your proposal and remove the ID

http://community.invisionpower.com/topic/url-enhancements

The only way to find the content is from the title. So you HAVE to do an initial search of the database using the title in order to get the topic ID. You can't search by topic ID if you don't have it in the URL.

The FURL templates does change the current URL to ?showtopic=X on the backend, but if you don't have the ID, you HAVE to search by topic seo name to GET the ID.

As with Marcher, I've not profiled it either way, but I can almost guarantee a query on a numeric index would be light years faster than a query on a text field, even if it is made an INDEX as well.

Share this post


Link to post
Share on other sites

If the text index was small then this wouldn't be a huge issue and often textual indexes are used with things such as product sku codes. For topic titles however, you are talking about a potentially very large index and as others have said that is not realistic for large sites with millions of rows.

Primary keys should also be immutable, ever increasing and unique. The topic title does not satisfy any of these requirements.

Share this post


Link to post
Share on other sites

I understand what you guys are saying now, so we're on the same page at least with that.

However, I still stand with my disagreement regarding efficiency.

My VB forum has a 60,000 threads. So, I did a query by title and a query by thread ID.

Here's the time it took for querying by a title:

post-497541-0-61719200-1392361699_thumb.

And now here's the time it took for querying by threadid:

post-497541-0-97356800-1392361720_thumb.

Interestingly, they are exactly the same.

I don't have a million threads on my forum, I know. According to this wiki source, the top forums in the world only have between 1 and 3 million threads. None of which are using IPB, btw. Even so, I don't see queries taking much longer with a million thread titles -- at all. It won't even approach the 1 second mark. We're talking about fractions of a second -- completely unnoticeable to humans. It's not inefficient relative to experience, or to the point where it would create a bottleneck or bring the server down.

I'd be happy to come up with a way to take my current IPB install which I'm not using, create a script to dish out 1,000,000 different topics using a combination of dictionary words and conjunction words to make realistic thread titles. Then we could bombard the database with 500 requests/minute and see how it performs, compared to the queries to the ID.

Thoughts?

My last concern, which I have never seen anyone address:

If you guys don't agree, think it's inefficient, wouldn't use it -- so what? What about the people who disagree and do want to use this? Why can't we have the "option" to? I'm not trying to force you to agree with me. All I want is the opportunity to do what I feel is right.

Share this post


Link to post
Share on other sites

Also another interesting query:

Out of my 60,000 threads users created (have over 120k users) - there are less than 30 duplicate threads.

Of those 30, about 20 of them are threads that the user published, went into moderation, are sitting there, and they created twice. The others are purposely created (Eg. "Forum Rules") in each forum.

Duplicate threads are not very common, let alone common enough to put an ID in every thread for that reason.

Share this post


Link to post
Share on other sites

Rrmoving ID from urls is one of the funniest idea i heard and please don't get me wrong but the requester clearly has no idea what a community is...

The VB forum I just referenced did $32k last month profit. I don't know what a community is? I hope you'll show me, then maybe I'll start making some real money... I'm totally just out of my mind and have no idea what I'm talking about. What insight.

Share this post


Link to post
Share on other sites

We all want the same thing, but there's many ways to arrive there.

Text indexes aren't a good idea because they're much slower to search (and an INT is not text) and also you can only index so many characters; and how many characters you can index depends on your database character set. UTF8 takes 3 bytes per characters and UTF8MB4 gets you 4 bytes per character. If you're using an InnoDB storage engine, then you can only ever store 191 characters which means that you'll need to make sure the slug length is 191 characters or less otherwise you'll get duplicate items in the index which mesa you can no longer retrieve the correct topic.

To recap; removing IDs may or may not be a good idea, but the work involved, the amount of code, the amount of checking and the impact on the database make it much less desirable.

As mentioned earlier, I'll happily discuss this with you over a more personal medium.

Share this post


Link to post
Share on other sites

IPS 4 does have the ability to change the FURLs from within the ACP.

But does this mean that we will have the ability, via the ACP, to create a logical hierarchical structure for our IPB websites?

By this I mean for example:

Top Level

website.com

Second Level

website.com/content/index

website.com/forum/index

website.com/blogs/index

Example Forum Level

website.com/forum/ips-and-company-feedback/

website.com/forum/ips-and-company-feedback/IPB-board/

website.com/forum/ips-and-company-feedback/IPB-board/url-enhancements

This structure enables Google to see the precise relationship between the website, threads, topics, forums etc, how ever many levels of sub-topic you have. The existing IPB URL hierarchy doesn't follow this structure.

This is the hierarchy that Google suggests and in my view all the other suggested SEO enhancements will be of limited value if Google has difficulty understanding the basic structure of the website.

Thanks, in advance. (and anticipating being shot down :sweat: )

Share this post


Link to post
Share on other sites

I've been hesitant to reply to this topic because everybody everywhere is an SEO expert. I work at a digital agency and I can tell you members of our SEO team think they are the bee's knees and they know more than anybody else on the internet. I feel like a lot of it SEO's "best practices" are subjective. Sure there are a number of very good ideas like microdata, unique titles, etc.; however there also a lot of ideas that reek of BS. I'm not saying you don't know anything, I'm always just skeptical. With that said, here's my $0.02:

I like the idea that the board owner should have options in the URL - flexibility is good. I'm strongly for moving to a /forum-name/thread-name/ setup instead of the current /topic/thread-name scheme. But removing the ID from the URL doesn't make sense to me as a default item. As the others have pointed out, there is a hit to the forum's performance to do this - something I'm not willing to trade off for. In addition, let's say the current URL gets truncated -

'?do=embed' frameborder='0' data-embedContent>>

IPB will still be able to find that thread because the topic ID is still there. What happens when you move to a slug only and your link gets truncated?

http://community.invisionpower.com/topic/url-enhance

Surprise! Topic not found. User's make mistakes, they are more likely to type a word or cut off the last character(s) when copying/pasting. The current method allows users to make these mistakes and gently correct them.

Out of my 60,000 threads users created (have over 120k users) - there are less than 30 duplicate threads.


I wish I only had that many duplicate titles. I have a few thousand duplicate SEO titles; I'm more concerned about duplicate title tags than changing the URLs over to /thread-title, /thread-title-1, /thread-title-2. ;)

Share this post


Link to post
Share on other sites

I've been hesitant to reply to this topic because everybody everywhere is an SEO expert. I work at a digital agency and I can tell you members of our SEO team think they are the bee's knees and they know more than anybody else on the internet. I feel like a lot of it SEO's "best practices" are subjective. Sure there are a number of very good ideas like microdata, unique titles, etc.; however there also a lot of ideas that reek of BS. I'm not saying you don't know anything, I'm always just skeptical. With that said, here's my $0.02:

I like the idea that the board owner should have options in the URL - flexibility is good. I'm strongly for moving to a /forum-name/thread-name/ setup instead of the current /topic/thread-name scheme. But removing the ID from the URL doesn't make sense to me as a default item. As the others have pointed out, there is a hit to the forum's performance to do this - something I'm not willing to trade off for. In addition, let's say the current URL gets truncated -

'?do=embed' frameborder='0' data-embedContent>>

IPB will still be able to find that thread because the topic ID is still there. What happens when you move to a slug only and your link gets truncated?

http://community.invisionpower.com/topic/url-enhance

Surprise! Topic not found. User's make mistakes, they are more likely to type a word or cut off the last character(s) when copying/pasting. The current method allows users to make these mistakes and gently correct them.


I wish I only had that many duplicate titles. I have a few thousand duplicate SEO titles; I'm more concerned about duplicate title tags than changing the URLs over to /thread-title, /thread-title-1, /thread-title-2. ;)

I fully agree on SEO tactics. Most of them are usually useless these days.

However i believe correct URL hierarchy is critical. That is a critical reason why we developed GreenSEO. It basically allowed us to improve our traffic amazingly.

Structuring your website correctly is very important.

Let me give you guys some quick stat values to understand the value.

vBulletin to IPB initial traffic loss was around 7-8% , normal when you take a 24 hour downtime for switch and have a major software change

After IPB traffic growth in 7.5 months is 30%

Of course there are many other important variables. URL structure is not the only factor that allowed us to achieve this much increase. But it is great because google can index site perfectly

In time every forum will turn into duplicate content grave what ever you do. Having no id's will only cause more issues. Renaming threads, etc... will always be painful. And performance wise it will be terrible for big boards.

Share this post


Link to post
Share on other sites

However i believe correct URL hierarchy is critical. That is a critical reason why we developed GreenSEO. It basically allowed us to improve our traffic amazingly.

Structuring your website correctly is very important.

Exactly, but this does not appear to be possible with IPB 3.x 'out of the box' (hence mods such as Green SEO) nor does it appear to be on offer with IPB 4.0. :mad: (although I REALLY hope to be corrected on this).

This seems to me to be absolutely fundamental to search engine ranking (and web site design good practice) and should be built into the core rather than being a mod with all the complications that entails. I really wish more effort was being put into achieving correct URL structure rather than a host of other improvements which could be considered as little more than cosmetic.

Share this post


Link to post
Share on other sites

The focused on the topic now is "with our without ID", which personally, I prefer not to omit the ID.

I hope we should get an answer about the heirarchical URL structure.

Share this post


Link to post
Share on other sites

Also don't underestimate the value of correct url structuring. It gives you and your advertiser great additional stats on how to configure specific forum related advertisements plus you can easily study stats detailed and use that to improve your forums.

Share this post


Link to post
Share on other sites

I stand by three things:

1) Suggesting that removing the ID will result in a decrease in performance noticeable to humans (slower load times, etc) is hearsay until you can actually prove that in testing. It took 0.0003 seconds to query 60,000 titles on my forum

2) I've been dong SEO for 7 years and it's a fact that including redundant ID's in the URL anywhere is bad for SEO. This is through experience and testing. If ID's were so important for performance and had no effect on SEO, myself and others would be using them in our links to improve site speed. I already made it clear how successful I am with my practices. You're right. Everyone seems to want to be an "SEO expert". If someone else has more fruit on the tree than me, I'd be glad to hear what testing they've done.

3) Even if #1 and #2 were proven wrong, giving the option for users to do this doesn't change anything. While you guys argue that it's a performance hit, I'll happily be using this structure with no issue. In the even it effects my forum, then you can say "I told you so" when I have a million topics.

Share this post


Link to post
Share on other sites

I stand by three things:

1) Suggesting that removing the ID will result in a decrease in performance noticeable to humans (slower load times, etc) is hearsay until you can actually prove that in testing. It took 0.0003 seconds to query 60,000 titles on my forum

Please test that when you have 1 million + rows in table and you have concurrent 5k plus user browsing through forum sections :smile:

2) I've been dong SEO for 7 years and it's a fact that including redundant ID's in the URL anywhere is bad for SEO. This is through experience and testing. If ID's were so important for performance and had no effect on SEO, myself and others would be using them in our links to improve site speed. I already made it clear how successful I am with my practices. You're right. Everyone seems to want to be an "SEO expert". If someone else has more fruit on the tree than me, I'd be glad to hear what testing they've done.

I believe you are comparing apples and oranges as I am sure none of your tests was actually done on forum

Share this post


Link to post
Share on other sites

http://community.invisionpower.com/topic/url-enhance

Surprise! Topic not found. User's make mistakes, they are more likely to type a word or cut off the last character(s) when copying/pasting. The current method allows users to make these mistakes and gently correct them.

Not according to Google:

"Simple-to-understand URLs will convey

content information easily"
"Also,
users may believe that a portion of the URL is unnecessary, especially
if the URL shows many unrecognizable parameters. They might leave
off a part, breaking the link."
Sorry but community.invisionpower.com/topic/url-enhancements/ is far less prone to user error than "community.invisionpower.com/topic/397771-url-enhancements/
When I was big into domaining it was common knowledge that shorter URL's perform better for type-ins. Less opportunities for keystroke mistakes when typing in, easier to remember.
Using your same argument:
What happens when a user types in 4 7's instead of three in this URL?
Sorry, we couldn't find that!

Suprise!

It should be fairly easy to assume Users remember a string of words better than 7 numbers in a row (can you remember a phone number after hearing it once? No. But you can remember the name of someone or the name of a book quite easy)

Share this post


Link to post
Share on other sites

"Also,

users may believe that a portion of the URL is unnecessary, especially
if the URL shows many unrecognizable parameters. They might leave
off a part, breaking the link."
Sorry but community.invisionpower.com/topic/url-enhancements/ is far less prone to user error than "community.invisionpower.com/topic/397771-url-enhancements/

Why on earth would a user remove the digits from the middle of the url? It makes no sense. And what if the topic title was literally "397771 URL Enhancements"?

Share this post


Link to post
Share on other sites

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...