TSP Posted December 10, 2019 Posted December 10, 2019 I need to have a background task to replace some content in some posts. Example post: <blockquote class="ipsQuote" data-gramm="false" data-ipsquote="" data-ipsquote-contentapp="forums" data-ipsquote-contentclass="forums_Topic" data-ipsquote-contentcommentid="24552935" data-ipsquote-contentid="1819682" data-ipsquote-contenttype="forums" data-ipsquote-timestamp="1534422313" data-ipsquote-userid="278655" data-ipsquote-username="Slettet-lYlo5C"> <div class="ipsQuote_citation"> TSPbot2 skrev (På 16.8.2018 den 14.25): </div> <div class="ipsQuote_contents ipsClearfix" data-gramm="false"> <p> This is my post <a href="https://www.google.com" rel="external nofollow" target="_blank">https://www.google.com</a> </p> </div> </blockquote> <p> Hello, TSPbot2 </p> What I need to replace is "TSPbot2" with "SomethingElse" within ipsQuote_citation, but I don't wish to replace TSPbot within anything else. So for example inside the paragraph further below, it shouldn't be replaced. Do you have any suggestions on how I should do this? Utilize some multiline regex or should I do some DOM-parsing in some way?
bfarber Posted December 10, 2019 Posted December 10, 2019 Pseudo-code, tested <?php // Just loading up our libraries because they're helpful require 'init.php'; \IPS\Dispatcher\External::i(); // Define the HTML we are working with - in practice this will probably come from the database $html = <<<EOF <blockquote class="ipsQuote" data-gramm="false" data-ipsquote="" data-ipsquote-contentapp="forums" data-ipsquote-contentclass="forums_Topic" data-ipsquote-contentcommentid="24552935" data-ipsquote-contentid="1819682" data-ipsquote-contenttype="forums" data-ipsquote-timestamp="1534422313" data-ipsquote-userid="278655" data-ipsquote-username="Slettet-lYlo5C"> <div class="ipsQuote_citation"> TSPbot2 skrev (På 16.8.2018 den 14.25): </div> <div class="ipsQuote_contents ipsClearfix" data-gramm="false"> <p> This is my post <a href="https://www.google.com" rel="external nofollow" target="_blank">https://www.google.com</a> </p> </div> </blockquote> <p> Hello, TSPbot2 </p> EOF; // Load the HTML into domdocument $source = new \IPS\Xml\DOMDocument( '1.0', 'UTF-8' ); $source->loadHTML( \IPS\Xml\DOMDocument::wrapHtml( $html ) ); // We are going to look for class ipsQuote_citation ONLY $finder = new \DomXPath($source); $nodes = $finder->query("//*[contains(@class, 'ipsQuote_citation')]"); // Loop over the nodes we found foreach( $nodes as $node ) { // Create a new document fragment to host our new HTML $fragment = $node->ownerDocument->createDocumentFragment(); // We do an str_replace of the name and append this into the new fragment $fragment->appendXML( str_replace( 'TSPbot2', 'SomethingElse', $source->saveHTML($node) ) ); // Clone the original node and then append our new fragment into it $clone = $node->cloneNode(); $clone->appendChild($fragment); // And then finally, replace the original node with our new one $node->parentNode->replaceChild($clone, $node); } // Print out the result var_dump($source->saveHTML()); exit;
newbie LAC Posted December 11, 2019 Posted December 11, 2019 14 hours ago, bfarber said: Pseudo-code, tested This will break attributes with <> symbols For example attachments Before <a href="/monthly... After <a href="<fileStore.core_Attachment>/monthly...
TSP Posted December 11, 2019 Author Posted December 11, 2019 Thanks @bfarber. Before you posted your solution, I ended up doing this: <?php // other code ... $search = 'TSPbot2'; // just example, state of variable $replace = 'SomethingElse'; // just example, state of variable foreach ( $iterator as $post ) { echo "Processing {$post->pid}. See: <{$post->url()}>\n"; $content = str_replace( $searches, $replaces, $post->post ); if ( $countQuoteHeaders = preg_match_all( '#<div class="ipsQuote_citation">\s*' . preg_quote($search, '#') . '.+?<\/div>#is', $content, $matches ) ) { $_searches = []; $_replaces = []; foreach ( $matches[0] as $match ) { $_searches[] = $match; $_replaces[] = str_replace( $search, $replace, $match ); } $content = str_replace( $_searches, $_replaces, $content ); echo "Replaced {$countQuoteHeaders} quote headers\n"; } If you're wondering what I'm doing, I've created a script for replacing the quote headers in posts with a new value. The reason is for better anonymizing members before deleting their account. You can see the full script here, if anyone is curious. I take no responsibility for any problems that may occur from running it. Only works from CLI. I might consider making it into a proper background task later, but for now I just wanted to do it like this.
bfarber Posted December 11, 2019 Posted December 11, 2019 8 hours ago, newbie LAC said: This will break attributes with <> symbols For example attachments Before <a href="/monthly... After <a href="<fileStore.core_Attachment>/monthly... Touche ... it was pseudo code that I didn't thoroughly test beyond a PoC.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.