Invision Community 4: SEO, prepare for v5 and dormant account notifications By Matt Monday at 02:04 PM
PANL Posted July 6, 2017 Posted July 6, 2017 If you have and good input to improve this file, please leave a message. 
Qubabos Posted July 6, 2017 Posted July 6, 2017 you do know that this file is no longer needed for ips4 as noidex flag is used by IPS ? At least they state like this... also what about that no user-agent are specified for allowed and disallowed parts of urls ?
PANL Posted July 6, 2017 Author Posted July 6, 2017 Do you have a example that i block and is already blocked by ipb? for what I experience those pages are indexed by google. @Qubabos i will add User-agent: *
Steph40 Posted July 6, 2017 Posted July 6, 2017 @PANL I have also added my admin directory to Disallow, shouldn't it be in there too? Just curious.
Qubabos Posted July 7, 2017 Posted July 7, 2017 5 hours ago, Steph40 said: I have also added my admin directory to Disallow, shouldn't it be in there too? Just curious. if you have standard "admin" yes, but if you changed folder name - it's too easy to find ACP by robots.txt file
PANL Posted July 7, 2017 Author Posted July 7, 2017 Quote A robots.txt isn’t needed for 4.x. There might be specific robots.txt rules which do make sense for your site, but then add those and nothing else. Don’t just add the robots.txt provided here, just because it promises to do something good. As an example: It currently disallows “/index.php?*”, which, if you don’t have friendly URLs turned on, would remove your entire website from the search index – not really “good SEO”. @opentype this is true, i was expected that everyone using friendly url's if they want to SEO. maybe you are right to make a mark/readme file to explain this kind of things. -update: i change the download page and include information about it. Quote This robots.txt also makes decisions for you like removing profile pages from the search index. You may or may not want that. This isn't true. it removes the content page from your profile. this because otherwise you will publish duplicate content. See the images, i blur the usernames. Like i announce in my topic. you are free to help to improving this file.
opentype Posted July 7, 2017 Posted July 7, 2017 14 minutes ago, PANL said: you are free to help to improving this file. Since I don’t think it should exist to begin with, there is nothing for me to help with. Frankly, IMHO a better choice would be a plain-old article that explains how a robots.txt works in general and what options there would be for 4.x, so people could make informed decisions whether they need it all and if so, which specific rules might help them on their site.
AndyF Posted July 7, 2017 Posted July 7, 2017 4 hours ago, Qubabos said: if you have standard "admin" yes, but if you changed folder name - it's too easy to find ACP by robots.txt file I'd agree with this. If its still at plain old "admin" its OK but if you've changed it I'd not personally recommend putting the new directory name into it. Reason being I'd suspect someone interested in tampering with your site one of the first things they would look at would be the robots file to see what 'areas' are listed as out of bounds so if the "new admin directory" name is here then there was not a lot of point in renaming it anyway.
HCICT Posted July 7, 2017 Posted July 7, 2017 Maybe you can put in the header: <meta name="robots" content="noindex, nofollow" /> if you use .htpasswd this won't do any effect.
Ohio Guns Posted July 7, 2017 Posted July 7, 2017 15 hours ago, Steph40 said: @PANL I have also added my admin directory to Disallow, shouldn't it be in there too? Just curious. You shouldn't need to add your admin directory. There won't be any links google finds to follow to it, and if they do its just a authentication page and it won't/can't go any further.
Steph40 Posted July 7, 2017 Posted July 7, 2017 Ok everyone, so hard to do the right thing, so many different opinions
SJ77 Posted July 7, 2017 Posted July 7, 2017 Doesn't ".htaccess" only work on apache web servers? What about those of us running NGINX?
PANL Posted July 7, 2017 Author Posted July 7, 2017 7 hours ago, Steph40 said: Ok everyone, so hard to do the right thing, so many different opinions I read that you your admin folder calls "admin". then i would recommend the post of Qubabos and AndyF. Personally i don't add the admin folder to my robots.txt. My main focus is to optimize content that is indexed by google and prevent duplicate content. For SEO it won't harm that much. 6 hours ago, superj707 said: Doesn't ".htaccess" only work on apache web servers? What about those of us running NGINX? Are you refer to the information page of this file? i'm not familiar with nginx. but this is a default setting from IPB. I think you can make a support ticket to get help setting up this Rewrite URLs.
Mopar1973Man Posted July 8, 2017 Posted July 8, 2017 Security wise its best not to list you out of bounds folders in ROBOTS.txt or as a META tag. Either way you can just type in the URL for the robots.txt or right click on the page and view the source code of the page and still see the META tag. I don't use either method. I don't want to give any hints to what system or staff folder or files might be.
SecondSight Posted July 13, 2017 Posted July 13, 2017 On 07/07/2017 at 1:30 AM, Qubabos said: you do know that this file is no longer needed for ips4 as noidex flag is used by IPS Hello ! I've just read this topic. I don't have special needs regarding my board. So, can I safely remove my robots.txt file ? Or should I keep it and whith which content ? Thank you !
PANL Posted July 13, 2017 Author Posted July 13, 2017 @SecondSight i'm not quite sure what is no-indexed by ipb and what's not. but for what i am seeing it isn't enough. You can use robots.txt for content that isn't prevent by ipb it self. My robots.txt is mostly based on preventing duplicate content indexing by google. Recommend things to prevent indexing: Autogenerated pages link: Tags/discover/search Pages that are not great value for google: profile In my robots.txt i prevent a lot of Querystrings. those for example are: Disallow: /?tab=* Disallow: /index.php?* Disallow: /*?app=* Disallow: /*sortby=* Disallow: /*/?do=download Disallow: /profile/*/?do=* Disallow: /*?do=add Disallow: /*?do=email Disallow: /*?do=getNewComment Disallow: /*?do=getLastComment Disallow: /*?do=findComment* The goal must be that there are only quality links in google that visitors like to read. people don't like pages that don't give the right information. Good luck!
AlexWebsites Posted July 16, 2017 Posted July 16, 2017 Isn't it still good practice to list your sitemap path on a robots.txt file? https://developers.google.com/search/reference/robots_txt
media Posted November 14, 2017 Posted November 14, 2017 So: Should I put a robot.txt OR NOT to my site???? I am on 4.2.6 latest
opentype Posted November 14, 2017 Posted November 14, 2017 11 minutes ago, media said: So: Should I put a robot.txt OR NOT to my site???? There is no simple yes or no answer to that. Having a robots.txt doesn’t make things good or bad just because it is there. The question is: does your site have requirements to give special instructions to all or specific search engines, e.g. not to index certain sections even though they are accessible to guests? If you are unsure about that, my recommendation would be not to do anything to avoid doing more harm than good. If you know what you are doing, you can do some minor fine-tuning with a robots.txt. Delisting content that appears in multiple locations. Ask certain crawlers not to crawl your page … things like that.
media Posted November 14, 2017 Posted November 14, 2017 5 minutes ago, opentype said: There is no simple yes or no answer to that. Having a robots.txt doesn’t make things good or bad just because it is there. The question is: does your site have requirements to give special instructions to all or specific search engines, e.g. not to index certain sections even though they are accessible to guests? If you are unsure about that, my recommendation would be not to do anything to avoid doing more harm than good. If you know what you are doing, you can do some minor fine-tuning with a robots.txt. Delisting content that appears in multiple locations. Ask certain crawlers not to crawl your page … things like that. Thank you man... You answer greatly appreciated....
Recommended Posts
Archived
This topic is now archived and is closed to further replies.