Jump to content

Robots.txt - IP.Board 4.x


PANL

Recommended Posts

Posted

you do know that this file is no longer needed for ips4 as noidex flag is used by IPS ? At least they state like this...

also what about that no user-agent are specified for allowed and disallowed parts of urls ?

Posted

Do you have a example that i block and is already blocked by ipb? for what I experience those pages are indexed by google.

@Qubabos i will add User-agent: *

Posted
5 hours ago, Steph40 said:

I have also added my admin directory to Disallow, shouldn't it be in there too? Just curious.

if you have standard "admin" yes, but if you changed folder name - it's too easy to find ACP by robots.txt file :) 

Posted
Quote

A robots.txt isn’t needed for 4.x. There might be specific robots.txt rules which do make sense for your site, but then add those and nothing else. Don’t just add the robots.txt provided here, just because it promises to do something good.

As an example: It currently disallows “/index.php?*”, which, if you don’t have friendly URLs turned on, would remove your entire website from the search index – not really “good SEO”.

@opentype this is true, i was expected that everyone using friendly url's if they want to SEO. maybe you are right to make a mark/readme file to explain this kind of things.

-update: i change the download page and include information about it.

Quote

This robots.txt also makes decisions for you like removing profile pages from the search index. You may or may not want that. 

This isn't true. it removes the content page from your profile. this because otherwise you will publish duplicate content. 
See the images, i blur the usernames.

profile.thumb.png.32efee51b63a9ed05e1cc01e4656757e.png

 

profilecontent.thumb.png.83fea3b0fdfcecdd677c8f81e95d9829.png

 

Like i announce in my topic. you are free to help to improving this file. :thumbsup:

Posted
14 minutes ago, PANL said:

you are free to help to improving this file. :thumbsup:

Since I don’t think it should exist to begin with, there is nothing for me to help with. 

Frankly, IMHO a better choice would be a plain-old article that explains how a robots.txt works in general and what options there would be for 4.x, so people could make informed decisions whether they need it all and if so, which specific rules might help them on their site. 

Posted
4 hours ago, Qubabos said:

if you have standard "admin" yes, but if you changed folder name - it's too easy to find ACP by robots.txt file :) 

I'd agree with this. If its still at plain old "admin" its OK but if you've changed it I'd not personally recommend putting the new directory name into it. Reason being I'd suspect someone interested in tampering with your site one of the first things they would look at would be the robots file to see what 'areas' are listed as out of bounds so if the "new admin directory" name is here then there was not a lot of point in renaming it anyway. :)

Posted

Maybe you can put in the header:

<meta name="robots" content="noindex, nofollow" />

if you use .htpasswd this won't do any effect.

Posted
15 hours ago, Steph40 said:

@PANL

I have also added my admin directory to Disallow, shouldn't it be in there too? Just curious.

You shouldn't need to add your admin directory. There won't be any links google finds to follow to it, and if they do its just a authentication page and it won't/can't go any further.

Posted
7 hours ago, Steph40 said:

Ok everyone, so hard to do the right thing, so many different opinions

I read that you your admin folder calls "admin". then i would recommend the post of Qubabos and AndyF.
Personally i don't add the admin folder to my robots.txt.
My main focus is to optimize content that is indexed by google and prevent duplicate content.
For SEO it won't harm that much.

 

 

6 hours ago, superj707 said:

Doesn't ".htaccess" only work on apache web servers? What about those of us running NGINX?

Are you refer to the information page of this file? i'm not familiar with nginx. but this is a default setting from IPB.
I think you can make a support ticket to get help setting up this Rewrite URLs.

Posted

Security wise its best not to list you out of bounds folders in ROBOTS.txt or as a META tag. Either way you can just type in the URL for the robots.txt or right click on the page and view the source code of the page and still see the META tag. I don't use either method. I don't want to give any hints to what system or staff folder or files might be. 

Posted
On 07/07/2017 at 1:30 AM, Qubabos said:

you do know that this file is no longer needed for ips4 as noidex flag is used by IPS

Hello ! :)

I've just read this topic. I don't have special needs regarding my board. So, can I safely remove my robots.txt file ? Or should I keep it and whith which content ?

Thank you  ! :)

Posted

@SecondSight i'm not quite sure what is no-indexed by ipb and what's not. but for what i am seeing it isn't enough.

You can use robots.txt for content that isn't prevent by ipb it self.

My robots.txt is mostly based on preventing duplicate content indexing by google.

Recommend things to prevent indexing:

Autogenerated pages link: Tags/discover/search
Pages that are not great value for google: profile


In my robots.txt i prevent a lot of Querystrings. those for example are:

Disallow: /?tab=*
Disallow: /index.php?*
Disallow: /*?app=*
Disallow: /*sortby=*
Disallow: /*/?do=download
Disallow: /profile/*/?do=*
Disallow: /*?do=add
Disallow: /*?do=email
Disallow: /*?do=getNewComment
Disallow: /*?do=getLastComment
Disallow: /*?do=findComment*

The goal must be that there are only quality links in google that visitors like to read. people don't like pages that don't give the right information.

Good luck! :thumbsup:

  • 3 months later...
Posted
11 minutes ago, media said:

So:

Should I put a robot.txt OR NOT to my site????

There is no simple yes or no answer to that. Having a robots.txt doesn’t make things good or bad just because it is there. The question is: does your site have requirements to give special instructions to all or specific search engines, e.g. not to index certain sections even though they are accessible to guests? If you are unsure about that, my recommendation would be not to do anything to avoid doing more harm than good. If you know what you are doing, you can do some minor fine-tuning with a robots.txt. Delisting content that appears in multiple locations. Ask certain crawlers not to crawl your page … things like that. 

Posted
5 minutes ago, opentype said:

There is no simple yes or no answer to that. Having a robots.txt doesn’t make things good or bad just because it is there. The question is: does your site have requirements to give special instructions to all or specific search engines, e.g. not to index certain sections even though they are accessible to guests? If you are unsure about that, my recommendation would be not to do anything to avoid doing more harm than good. If you know what you are doing, you can do some minor fine-tuning with a robots.txt. Delisting content that appears in multiple locations. Ask certain crawlers not to crawl your page … things like that. 

Thank you man... You answer greatly appreciated.... :)

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...