Invision Community 4: SEO, prepare for v5 and dormant account notifications Matt November 11, 2024Nov 11
Posted September 2, 20222 yr Hi, I have a big problem with my forum, there are months that google don't crawl and index the forum. In search console the sitemaps won't work. The address of the sitemap is right but is right but search console don't analize it.
September 2, 20222 yr Community Expert Do you get any specific errors? Maybe you have a robots.txt rule that is stopping google from indexing your site.
September 2, 20222 yr Author Hi and thanks for reply, the robots.txt is directly ownen by ipb forum. I do not modify it
September 2, 20222 yr Community Expert Ok, so robots.txt is not the problem. What do you get if you try to submit the sitemap in Google Webmaster Tools? If there are issues it should say something there.
September 2, 20222 yr Community Expert I can see the sitemap just fine when I view the url: https://www.vespaonline.com/sitemap.php There must be something server side that is blocking Google's IP or useragent. It should show more details about the error if you click the bottom-right arrow in your screenshot.
September 2, 20222 yr As terabyte noted that error may be happening because your server IP or URL is blocked by Google, or that your server is blocking Google. You'd need to check with your Host to see if either condition above is the cause.
September 2, 20222 yr As terabyte noted that error may be happening because your server IP or URL is blocked by Google, or that your server is blocking Google. There is another indication for that: When doing a “site:” search, Google says “no information available” and links to this: https://support.google.com/webmasters/answer/7489871?hl=en
September 3, 20222 yr Author User-agent: * Crawl-Delay: 30 User-agent: AhrefsBot Disallow: / User-agent: MJ12bot Disallow: / this is the actual robots.txt
September 4, 20222 yr Author Hi, the sitemap was succesfully load but google find only 5544 pages, it's impossible! they are hundred of thousand what can be wrong?
September 4, 20222 yr Solution User-agent: * Crawl-Delay: 30 User-agent: AhrefsBot Disallow: / User-agent: MJ12bot Disallow: / this is the actual robots.txt This is not IPS original. It is modified. You do not allow to index your website for any bot. Disallow applies in your case for User-agents lines above, like this: User-agent: * User-agent: AhrefsBot Disallow: / I recommend using robots.txt generated by IPS. It will exclude tons of duplicate URLs or those with no or poor content. If you block anything, your site will not be indexed. If you allow everything, this is also bad for SEO, as Google has to crawl many URLs that have no benefit for you. Read more here
September 4, 20222 yr Hi, the sitemap was succesfully load but google find only 5544 pages, it's impossible! they are hundred of thousand what can be wrong? Keep in mind that a sitemap just says to Google to check out those pages. Indexing can still fail when the pages are blocked from indexing in any way.