Jump to content

Replace content within quote citation header in posts


TSP

Recommended Posts

Posted

I need to have a background task to replace some content in some posts. 

Example post: 

<blockquote class="ipsQuote" data-gramm="false" data-ipsquote="" data-ipsquote-contentapp="forums" data-ipsquote-contentclass="forums_Topic" data-ipsquote-contentcommentid="24552935" data-ipsquote-contentid="1819682" data-ipsquote-contenttype="forums" data-ipsquote-timestamp="1534422313" data-ipsquote-userid="278655" data-ipsquote-username="Slettet-lYlo5C">
	<div class="ipsQuote_citation">
		TSPbot2 skrev (På 16.8.2018 den 14.25):
	</div>
	<div class="ipsQuote_contents ipsClearfix" data-gramm="false">
		<p>
			This is my post <a href="https://www.google.com" rel="external nofollow" target="_blank">https://www.google.com</a>
		</p>
	</div>
</blockquote>
<p>
	Hello, TSPbot2
</p>

 

What I need to replace is "TSPbot2" with "SomethingElse" within ipsQuote_citation, but I don't wish to replace TSPbot within anything else. So for example inside the paragraph further below, it shouldn't be replaced. 

Do you have any suggestions on how I should do this? Utilize some multiline regex or should I do some DOM-parsing in some way?

Posted

Pseudo-code, tested

<?php

// Just loading up our libraries because they're helpful
require 'init.php';
\IPS\Dispatcher\External::i();

// Define the HTML we are working with - in practice this will probably come from the database
$html = <<<EOF
<blockquote class="ipsQuote" data-gramm="false" data-ipsquote="" data-ipsquote-contentapp="forums" data-ipsquote-contentclass="forums_Topic" data-ipsquote-contentcommentid="24552935" data-ipsquote-contentid="1819682" data-ipsquote-contenttype="forums" data-ipsquote-timestamp="1534422313" data-ipsquote-userid="278655" data-ipsquote-username="Slettet-lYlo5C">
	<div class="ipsQuote_citation">
		TSPbot2 skrev (På 16.8.2018 den 14.25):
	</div>
	<div class="ipsQuote_contents ipsClearfix" data-gramm="false">
		<p>
			This is my post <a href="https://www.google.com" rel="external nofollow" target="_blank">https://www.google.com</a>
		</p>
	</div>
</blockquote>
<p>
	Hello, TSPbot2
</p>
EOF;

// Load the HTML into domdocument
$source = new \IPS\Xml\DOMDocument( '1.0', 'UTF-8' );
$source->loadHTML( \IPS\Xml\DOMDocument::wrapHtml( $html ) );

// We are going to look for class ipsQuote_citation ONLY
$finder = new \DomXPath($source);
$nodes = $finder->query("//*[contains(@class, 'ipsQuote_citation')]");

// Loop over the nodes we found
foreach( $nodes as $node )
{
	// Create a new document fragment to host our new HTML
	$fragment = $node->ownerDocument->createDocumentFragment();

	// We do an str_replace of the name and append this into the new fragment
	$fragment->appendXML( str_replace( 'TSPbot2', 'SomethingElse', $source->saveHTML($node) ) );

	// Clone the original node and then append our new fragment into it
	$clone = $node->cloneNode();
	$clone->appendChild($fragment);

	// And then finally, replace the original node with our new one
	$node->parentNode->replaceChild($clone, $node);
}

// Print out the result
var_dump($source->saveHTML());
exit;

image.png

Posted
14 hours ago, bfarber said:

Pseudo-code, tested

This will break attributes with <> symbols

For example attachments 

Before

<a href="/monthly...

After 

<a href="&lt;fileStore.core_Attachment&gt;/monthly...

 

Posted

Thanks @bfarber. Before you posted your solution, I ended up doing this: 

<?php
// other code ...
$search = 'TSPbot2'; // just example, state of variable
$replace = 'SomethingElse'; // just example, state of variable
	foreach ( $iterator as $post )
	{
		echo "Processing {$post->pid}. See: <{$post->url()}>\n";
		$content = str_replace( $searches, $replaces, $post->post );

		if ( $countQuoteHeaders = preg_match_all( '#<div class="ipsQuote_citation">\s*' . preg_quote($search, '#') . '.+?<\/div>#is', $content, $matches ) )
		{
			$_searches = [];
			$_replaces = [];
			foreach ( $matches[0] as $match )
			{
				$_searches[] = $match;
				$_replaces[] = str_replace( $search, $replace, $match );
			}
			$content = str_replace( $_searches, $_replaces, $content );
			echo "Replaced {$countQuoteHeaders} quote headers\n";
		}

 

If you're wondering what I'm doing, I've created a script for replacing the quote headers in posts with a new value. The reason is for better anonymizing members before deleting their account. 

You can see the full script here, if anyone is curious. I take no responsibility for any problems that may occur from running it. Only works from CLI.

I might consider making it into a proper background task later, but for now I just wanted to do it like this.

Posted
8 hours ago, newbie LAC said:

This will break attributes with <> symbols

For example attachments 

Before


<a href="/monthly...

After 


<a href="&lt;fileStore.core_Attachment&gt;/monthly...

 

Touche ... it was pseudo code that I didn't thoroughly test beyond a PoC.

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...