Jump to content

Browser vs Bot


Guest Luke

Recommended Posts

Posted

One of the main reasons why I have a bunch of spiders listed is because there is a difference between what a spider can see and what a guest can see. Spiders can see more than guests because we want the guests to register, and we want the spiders to index our site. That's the biggest reason why I even have this feature on. Listing the spiders on the site is nice, but isn't really necessary.

In order to make this work you have to list the spiders user agents. The problem with this is there are so many bots out there, each with a variety of user agents. There are literally hundreds of bots out there.... And there is no way to possibly list them all.

Ever once in a while I'll spend time looking at the sessions table and I can notice a difference in the user agents... I can tell which ones are bots and which ones are actual people browsing the site. Browser user agents seem to follow a certain rule or pattern.... And I'm sure there are fewer types of browsers than there are bots.... And with a wild card, it may be fairly easy to match different versions of the same browser.

So... instead of listing bots, why not list browsers instead? Have a feature to do the complete opposite. If a browser's user agent is matched to the list the board assumes it's a guest, and if it doesn't match it assumes it's a bot.

Now of cource this wouldn't work for everyone... That's why you have an option to match one or the other.

As I said I tend to give the bots more permissions to view things than guests because I want the guests to register and the bots to be able to index the content. If I don't match all the bots then they don't get to index as much as the others... And if they don't index as much, then they don't have as much search-able content for my site. But if you flipped it around, I could care less if a guest with an odd-ball browser could see a little more than an average guest could.

Posted

You totally missed my suggestion and the point of it...


You're right, I didn't understood your suggestion correctly. But listing User-Agent of browsers is not a easy and efficient way to do too. There's many browsers, some bots are using U-A srtings which could make them reconized as a web browser (so a guest), it's also possible to fake your own U-A, etc, so I don't think thats this method would work better than the one actually used.
Posted

it's also possible to fake your own U-A



Bingo, I've written software that fakes U-A and acts like a web browser, when it is indeed a bot of some sort. Just as easy to do it the other way around.
Posted

That isn't the problem. The point is there are fewer browsers than there are bots. There are hundreds of bots out there, and there are a handful of browsers that are used. I guess it depends on your site... For me indexing is far more important, but I don't want to allow my guests to see everything before they register. I would, however, rather get every bot indexing my site than get every browser out there because for me it would only mean allowing a handful of guests see a little more than the rest. And for me there's no harm in that... And to me it's more serious that I'm missing bots because how my site ranks depends on it...

Right now if you want to give bots more information to index you have to match every bot out there, which is nearly impossible to do considering how big the internet is and how many bots there are and variations of bots. Bots really don't follow a rule with user agents like browsers do as well... Some do, but not all.

Does this make sense? Being able to match browsers instead bots? It's like the flipped version of what it currently does, but you're able to choose how you want to match user agents.

Posted

I think that while this may be how you'd want it to work, the bulk of our users would prefer to specify which can override rather than which can't.

In all honesty, though, the "bots you are missing" include unknown and unimportant ones from far off countries that don't even have search engines. Just because a bot is out there doesn't mean it's worth trying to match it. If you add "google" you are already matching "googlebot" "google media" and so on. You don't need to add an entry for each variation, if that's what you were thinking.

Posted

Well the reason why I started thinking about this is because of the massive bot lists I've seen... It seems a browser list would be far smaller. :rolleyes:

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...