Jump to content

Migrating phpBB Geshi to IPB


darkdreamingdan

Recommended Posts

Posted

Hello, we're in the process of testing a migration form phpBB 3.x.  Everything's looking smooth so far, and we've been quite impressed with how robust the conversion process is.


The only thing we're baffled by is how we can easily migrate our Geshi code over.  I'm looking for some general advice on the best way to proceed.  

 

You'll see that there are some strange artifacts during the conversion.  I actually modified the DB directly to remove these artifacts, using a python script: https://gist.github.com/darkdreamingdan/d9ec9e6ad38dcac661925842e3bea96f

This actually made the code look much better, and the posts cleaner.  However, it didn't work out as planned - although the posts appear cleaner, when you go to quote or reply to the posts, CKEditor inserts loads of massive line breaks and the formatting because really ugly (you can see these gaps in the reply box in the attached HTML page).  This makes this solution impractical, because you cant properly quote old posts.

Since this hasn't worked, I'm now trying to investigate other solutions.  Currently, my thoughts are to alter my python script to parse the DB to remove ALL heritage formatting from the conversion, so you get plaintext code.  Then, I can wrap this code with something compatible with IPB that generates the 'real' prettified code boxes.  This would effectively migrate the code properly, so you don't end up with dual syntax highlighting (migrated, and new code).  However, I have no idea how I can generate IPB prettified HTML from my script.  Ideally, I would do something simple like <pre class="prettified">CODE</pre> and then rebuild posts, so that IPB applies all the correct markup.

Just to be clear, we're looking for a solution that either

  1. Allows phpBB code to be displayed correctly post-migration. We don't need 'forwards compatibility' for heritage code, so long as they display properly.  They should also interact properly - using Quote, reply etc should not create messed up formatting (This is the problem currently).
  2. Migrate our code correctly.  This involves actually converting the old phpBB geshi to native IPB code.

I would appreciate any advice around this matter.  Due to the complexity of the issue, I thought it'd be better to get P2P support, but will probably post in the Client Area if it doesnt work out.

 

 

Post_conversion_page.zip

Posted

Not really sure why you get the broken chars. I would guess it's a charset-problem (is phpbb using utf-8 or something else?), but since it's seemingly happens on some invisible character, I'm a bit unsure about it. 

You're converting this directly to IPS 4, right? 

The easiest solution I think (assuming it works somewhat acceptable for you) could be to replace [ lua ] with [ code ] and [ /lua ] with [ /code ] (without the spaces here obviously) in the database before the conversion process is started by IPS. The big disadvantage with the code box from IPS however, is that they don't provide line numbers.

Not entirely sure, but you could try to use something like [ code="lua" ] too, and then you may be able to add a syntax highlighter for lua afterwards, but really not sure how that will be converted. It should be possible to add syntax highlighting for other languages, but you'll have to find out where that's added in the code.  

You seem like you know what you're doing, so I would strongly advice you to create a simple php script that lets you test the conversion of a single post in "real time", multiple times. If you haven't already done so. 

  • 3 weeks later...
Posted

Thanks TSP.  You were right, the PHPBB database was 'mojibaked' in UTF8, and we had to fix the issues by hand using Python library FTFY which did a great job.  This got rid of the artifacts, and we ended up with converted code correctly formatted.  However, they weren't migrated to IPB code boxes, which presented a new complexity that I was trying to solve.

For anyone interested, in the end we more or less did what you described - we changed [lua] to [ xml ] (which is actually exactly the same as [ code ] in IPS4's LegacyParser) in our DB.  Then, after migration, we modified any <pre> html tags (generated by IPB) that we knew to be Lua - changing the class from  "ipsCode" to "prettyprint lang-lua ipsCode".  What this did was mark the codebox to be pretty printed with a Lua syntax.

The final step was modifying our theme, to include Google's lang-lua.js prettyprint library.  This means our legacy code is now prettyprinted in Javascript on page load (rather than new codeboxes, which codemirror prettyprints).  To the end user, they can't tell the difference between code boxes that are cached prettyprint and on-load prettyprints.

Posted

Your post ended up a bit broken there due to you not having spaces in the [ code ] -reference. You changed from [lua] to [ code ] or to [ code="lua" ] or something else? 

Either way, happy to hear things worked out and that I could be of some assistance. ^_^

EDIT: how did you do with the line numbers? 

Posted

Fixed the post.

The line numbers are a super-hack that I'll probably release a plugin for in future.  Currently I'm waiting for this bug to be fixed because I'm really not happy with the hack I used to resolve it:

The above issue causes line numbers to get corrupted slightly when quoting a post.  

The line numbers are implemented purely in JS, and are currently added by appending the code in your theme in includeJS.  They work something like this:

  • Hack any quoted codeboxes to remove the first line (to resolve above issue).
  • Modify <pre> elements to add/remove the following classes:
    • Add prettify's "linenums" class to all <pre> elements.  This begins the line number generation of Google's prettyprint library
    • Remove prettify's "prettyprinted" class.  This flags that the code needs to be re-syntax highlighted with line numbers.
    • Add prettify's "nocode" class to <pre> elements.  This might seem confusing, but it prevents syntax highlighting being conducted twice which can cause corruption.  In other words, only line numbers are added during the second parse
  • Then, we execute prettify's prettyprint() function to execute a new highlight on load.  
  • We also hackily execute code to prettyprint just after a new post is posted, so line numbers are added immediately.

This gets line numbers working, but they break completely when you quote the code.  So the resolution to that, also implemented in JS, was:

  • Strip all line numbers from the codebox before quoting that snippet.  This involves hooking a function used to quote, and removing HTML's <li> and <ol> elements from any code boxes

 

The implementation is thoroughly hacky, and I'd like to do it more elegantly using an IPS api of some sort (though I'm not familar with how or if there are JS hooks).  Also, I'd like to see if this implementation is robust enough for general release.  If you're interested in trying it, add this snippet to the end of your global includeJS in theme settings:

<!-- MTA: Add line numbers -->
<script>
function rePrettyPrint() {
	$('pre.prettyprint').each(function(index, element) {
	  if ( !$(element).hasClass('cke_widget_element') && !$(element).hasClass('linenums') ) {
		$(element).addClass('linenums');
        $(element).addClass('nocode');
		$(element).removeClass('prettyprinted');
	  }	
	});
	// Re-prettyprint
    prettyPrint();
}

$(document).ready(function() {
	// Fix for IPS bug adding extra new line in quotes
	$('div.ipsQuote_contents').find('pre').each(function(index,element) { $(element).html($(element).html().replace(/(\r\n|\n|\r)/m,""))  });
	
  	// Add line numbers
	rePrettyPrint();
	
	// Uber hack to strip line numbers when quoting
	ips.templates._render = ips.templates._render || ips.templates.render
	var wrappedRender = function(key,obj) 
	{
	  var r = ips.templates._render(key,obj);
	  if ( key == 'core.editor.quote' )
	  {
		  var rJQ = $(r);
		  var pre = rJQ.find('pre.prettyprint');
		  pre.removeClass("prettyprinted");
		  pre.removeClass("linenums");
		  pre.find('li').append('<br />').replaceWith(function() { return $(this).contents(); });
		  pre.find('ol').replaceWith(function() { return $(this).contents(); });
		  return rJQ.prop('outerHTML');
	  }
	  return r;
	}
	ips.templates.render = wrappedRender;

	// Uberhack to prettyprint immediately after submission
	ips.utils.db._remove = ips.utils.db._remove || ips.utils.db.remove
	var wrappedRemove = function(type,key) 
	{
	  var r = ips.utils.db._remove(type,key);
	  if ( type == 'editorSave' )
	  {
		  window.setTimeout(function()
		  {
			  rePrettyPrint();
		  }, 
		  1000);
	  }
	  return r;
	}
	ips.utils.db.remove = wrappedRemove
});
</script>

 

You'll also need to modify the CSS to get line numbers to print on every line - these are standard pretty print configurations that are available with a standard google search.  Here's our snippet:

/* Add line numbers every line */
.prettyprint ol.linenums > li { list-style-type: decimal; }

 

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...