Jump to content

Large community? You have a problems with sitemap!


Numbered

Recommended Posts

5 hours ago, PPlanet said:

No, they are not. Cheers.

15 hours ago, PPlanet said:

In a way it makes sense, but I suppose that Google warns me because it sees them in the sitemap yet can't access them.

That's strange. If guests can not see profiles, then they are not included in the sitemap - at least not for me.

Link to comment
Share on other sites

On 16.2.2018 at 8:45 PM, bfarber said:

I wouldn't nofollow internal embeds (those are essentially links to other pages on your site which you absolutely do want to be followed and page rank passed to), however if you visit that link and view the page source, a canonical tag is already set to the real topic URL, so Google should be able to follow that link and index it, but point back to the canonical URL.

But what about these links?

topic/*?do=findComment*
topic/*tab=comments*
*?page=1$
*?page=0$
*?view=getnextunread*
*?do=getNewComment
*?do=getLastComment
*?do=reportComment*

In robots.txt I set now the disallow directive. Does it make sense?

And should not IPS set a redirect to ?page=1 and ?page=0 to the topic ending with / ... ? Because Google tells me, that there are 3 different URL's with the same content (duplicate content).

Link to comment
Share on other sites

On 15.3.2018 at 8:52 AM, Upgradeovec said:

But one more interesting thing - i added robots.txt 2018-03-03 and graph 'index count' boost x3.. Just because i block some links, which google tried to get, but get error or get content-less page. This errors may worse indexing of correct links. May be not, who knows

Would you post your robots.txt?

Link to comment
Share on other sites

29 minutes ago, mark007 said:

That's strange. If guests can not see profiles, then they are not included in the sitemap - at least not for me.

It was explained already. A third-party app is used to block the profile pages, independent from the IPS core settings. 

Link to comment
Share on other sites

On 9-1-2018 at 9:18 AM, Upgradeovec said:

Did it.

Before:

560fb-clip-60kb.thumb.png.53405ae0d5239668fd2709ca004582d7.png

After:

8b412-clip-54kb.thumb.png.417ab7264be2c322ed9be06769de71e6.png

No issues detected by several sitemap online checking tools:

35e97-clip-24kb.png.b4a055c7a67df204799ea35ed45a67a2.png

I did it very ugly. Just for try and check. You can improve it by yourself (and share it with us, please):

/applications/core/extensions/core/Sitemap/Content.php

line 209: after $data line add that:


if (get_class($node) === 'IPS\forums\Forum' && isset($node->last_post)) {
    $data['lastmod'] = $node->last_post;
}

and line 259 (line 262 after add previous) add after $data line that:


if (get_class($item) === 'IPS\forums\Topic' && isset($item->last_post)) {
    $data['lastmod'] = $item->last_post;
}

After that the sitemap script should re-generate all sub-sitemaps for write new data to db.

And I haven't done changing correct lastmod in index sitemap, depended on newer date inside sub-sitemap.

Thanks.

How did you changed this? 

My content.php from line 209:

$data = array( 'url' => $node->url() );
                    
                    $priority = intval( isset( $settings["sitemap_{$nodeClass::$nodeTitle}_priority"] ) ? $settings["sitemap_{$nodeClass::$nodeTitle}_priority"] : self::RECOMMENDED_NODE_PRIORIY );
                    if ( $priority !== -1 )
                    {
                        $data['priority'] = $priority;
                        $entries[] = $data;
                    }
                }
            }
        }

So i need to add:

if (get_class($node) === 'IPS\forums\Forum' && isset($node->last_post)) {
    $data['lastmod'] = $node->last_post;
}

Like this?

209: $data = array( 'url' => $node->url() );

209: $data = if (get_class($node) === 'IPS\forums\Forum' && isset($node->last_post)) {
    $data['lastmod'] = $node->last_post;
}

 

Link to comment
Share on other sites

2 hours ago, mark007 said:

No, you have to add the code after the line:

After:


$data = array( 'url' => $node->url() );

if (get_class($node) === 'IPS\forums\Forum' && isset($node->last_post)) {
    $data['lastmod'] = $node->last_post;
}

@mark007 Has this worked with getting more of your sitemap indexed? Would have been nice if they just worked this in to 4.2.8.

Link to comment
Share on other sites

  • 2 weeks later...

I did implement all mods and am seeing positive results since I did it. At a low point I had 111,000 indexed, which really makes no sense at all given that I have nearly 1M posts, well over 100K topics, and an article site also that has 5K articles. As you can see it is going up fast now, and is up to 187,000. Since there are built in canonical links, I would not block anything in the robots.txt file:

image.thumb.png.ed64ee4ebeb57d88b5cc79af710b7785.png

 

image.thumb.png.790e6410aec556d4685d4afddaef830c.png

Link to comment
Share on other sites

  • 3 weeks later...
On 3/30/2018 at 1:48 AM, sadams101 said:

I did implement all mods and am seeing positive results since I did it. At a low point I had 111,000 indexed, which really makes no sense at all given that I have nearly 1M posts, well over 100K topics, and an article site also that has 5K articles. As you can see it is going up fast now, and is up to 187,000. Since there are built in canonical links, I would not block anything in the robots.txt file:

image.thumb.png.ed64ee4ebeb57d88b5cc79af710b7785.png

 

image.thumb.png.790e6410aec556d4685d4afddaef830c.png

Hi, could you tell is how the index is going at the moment? If its good ill implement is alsof. Thx. 

Link to comment
Share on other sites

On 3/17/2018 at 3:21 PM, mark007 said:

There isn't a problem with those pages, as the canonical is set to the correct one on all of them:

<link rel="canonical" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/" />

 

Link to comment
Share on other sites

Will the sitemap generation be changed in 4.3? If it is, ill wait for the update. :)

-

Google was / is removing links from the index on my site to. Reactions on (older) forum topics are taking forever to be indexed. 

screenshot-www.google.com-2018_04.15-11-19-00.thumb.png.39208c24ef7204b1efe13d6ab7b4c35c.png

screenshot-search.google.com-2018_04.15-11-23-55.thumb.png.e42395f274732945a0091a92c25851f2.png

Edited by Duken
added index
Link to comment
Share on other sites

POST EDITED: I removed my rant...I looked at the incorrect link and posted that the pagination was not included in the canonical links, but I was incorrect, it is included...

PS - There is an issue with the mycustomsiteupdater.php file here that causes too many resources to be used, so I would not use it. The mod that seems to have done the trick is the <lastmod> date that was added early in this thread.

Edited by sadams101
Incorrect info.
Link to comment
Share on other sites

2 hours ago, sadams101 said:

I disagree with the canonical link solving this issue. In fact, the canonical link is simply wrong here. IT SHOULD SAY ?page=1, 2, etc., because the unique information on all of those unique pages that are being indexed is being sent to the wrong place. <snip>

 

Errm, not sure what you're looking at there, as it's all set correctly. As per your original example, if there was a link to  page 0 or page 1, they have a canonical of:

<link rel="canonical" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/" />

Which makes sense, since those are all essentially the first page of the thread, so don't require (and shouldn't have) the pagination in the url for seo purposes. So all good so far. 

But for page 2, the canonical is:

<link rel="canonical" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/?page=2" />

Page 3 has a canonical to ?page=3, and so on. 

On top of that, Invision also have the tags to let google know it's a paginated thread, which again is good seo, as google then knows to link the pages together as one set, and it may also show the page links in search. For instance on page 2, the tags are:

<link rel="first" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/" />
<link rel="prev" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/" />
<link rel="next" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/?page=3" />
<link rel="last" href="https://invisioncommunity.com/forums/topic/442742-large-community-you-have-a-problems-with-sitemap/?page=8" />

So no need to rant, and no need to get someone to fix it for you, as it's already been done ? 

Edited by Dll
Link to comment
Share on other sites

I edited my first reply...I do see the pagination on the later category pages, so my bad! I must have been looking at the wrong link in view page...sorry!

Is anyone else using the sitemap cronjob at the start of this thread? It creates some memory issues, even though I have 130Gigs.

Edited by sadams101
Link to comment
Share on other sites

  • 2 weeks later...

Anyone have updates to report after moving to 4.3? I noticed that the number of links in my sitemap is now significantly smaller. I am not sure why that is, I ran the sitemap rebuild function. It's too early to notice any ranking changes. 

Link to comment
Share on other sites

I know this was pointed out already, but links like this are not in the site map:

https://www.celiac.com/forums/topic/102448-what-brand-of-pasta-do-you-all-buy/?page=2

and google is throwing warnings about this in their "Indexed, not submitted in sitemap" area. Is there a reason these are not in the sitemap? Is all the content in the sitemap anyway, so including the pagination doesn't matter? 

PS - I am not yet on 4.3...is this why I can't find the settings mentioned that allow the profiles to be searchable/indexable by google?

Link to comment
Share on other sites

 

12 hours ago, sadams101 said:

I know this was pointed out already, but links like this are not in the site map:

https://www.celiac.com/forums/topic/102448-what-brand-of-pasta-do-you-all-buy/?page=2

and google is throwing warnings about this in their "Indexed, not submitted in sitemap" area. Is there a reason these are not in the sitemap? Is all the content in the sitemap anyway, so including the pagination doesn't matter? 

PS - I am not yet on 4.3...is this why I can't find the settings mentioned that allow the profiles to be searchable/indexable by google?

We only include the first page in the sitemap. Google is able to index the rest of the pages (as you see there).

Link to comment
Share on other sites

On 1/16/2018 at 5:56 AM, Matt said:

We've added the timestamp into the sitemap and we're looking to add a tool to quickly rebuild the sitemap on demand.

@Matt is there a tool in 4.3 to rebuild sitemaps on demand? I didn't see one. I updated one of my sites to 4.3 and do not see timestamps for forum topics, assuming it just needs to be rebuilt after upgrade.

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...