Jump to content
Matt

IPS UTF-8 Database Converter

Please note that this entry may be a little technical, if you do have any questions, please post them in the comments below.

A little history
IP.Board was first released over ten years ago when the web landscape was very different. A lot of emerging technologies were still trying to define new standards. Very early versions of IP.Board allowed one to specify the document character set and had a default of "ISO-8559-1" which is useful for languages that use latin based characters. This meant, for example, that if you needed Chinese characters you would need to change the character set to something more suitable. This disparity between character sets creates many challenges when working with a single code base.

UTF-8
Over the past handful of years there has been a push towards a single document character set; UTF-8. UTF-8 is a variable-width encoding that is able to show every character in the Unicode character set. This makes it suitable for latin and Chinese characters (and many many more!). Popular Javascript libraries such as jQuery require that data is sent and received with UTF-8 and many native PHP string functions prefer UTF-8. The future is very much UTF-8 and trying to keep our codebase working with any other character set is going to be more and more challenging.

IPS 4.0
Even though IP.Board 3 introduced UTF-8 as the default character set for new installations, we're aware that we still have many clients that are not using UTF-8 currently. IPS 4.0 is going to be strictly UTF-8 only which means we need to convert the data before or as part of the upgrade process.

Converting to UTF-8 isn't as simple as changing the database encoding. Merely doing this will simply corrupt the data you have in your database. We need to be a little smarter and use a script to do this work for us.

The great news is that even if you choose to convert your data to UTF-8 today, your IP.Board 3.x will run just fine and you may even find it more efficient as it doesn't have to convert lots of data on the fly.

The IPS UTF-8 Database Converter
We've written a script that can safely convert your database to UTF-8. The script does not overwrite your data until you manually confirm that the conversion process has been successful. This means that there is no risk of corrupting your existing data.

Of course, it is good practice to perform a full MySQL back-up before making any changes to the database as a precaution and we recommend that you do this.

You can download the converter and its instructions here.

How can I tell if I need to convert my database?
When you first run the converter, it'll check your database and let you know if you need to convert or not. Even if you are running UTF-8, you may not be using the correct collation (utf8_unicode_ci) so you have the option of changing your collation which is a very fast procedure and does not need a full conversion to complete.

If you first used IP.Board with IP.Board 3.0 then you may only need to change your database table collation. This isn't a required step and the IPS 4 upgrade process will perform this task if you'd prefer to wait until IPS 4.0 is released.
Support
Please note that while we're happy to provide some pointers within the client forums, this release is not officially supported by our technical support department.

Beta Release
As this is a beta release, please be aware that there may be bugs. If you do spot one, please post it to the IPS Extras bug tracker.



×
×
  • Create New...