Jump to content
Matt
 Share


IPS UTF-8 Database Converter

Please note that this entry may be a little technical, if you do have any questions, please post them in the comments below.

A little history
IP.Board was first released over ten years ago when the web landscape was very different. A lot of emerging technologies were still trying to define new standards. Very early versions of IP.Board allowed one to specify the document character set and had a default of "ISO-8559-1" which is useful for languages that use latin based characters. This meant, for example, that if you needed Chinese characters you would need to change the character set to something more suitable. This disparity between character sets creates many challenges when working with a single code base.

UTF-8
Over the past handful of years there has been a push towards a single document character set; UTF-8. UTF-8 is a variable-width encoding that is able to show every character in the Unicode character set. This makes it suitable for latin and Chinese characters (and many many more!). Popular Javascript libraries such as jQuery require that data is sent and received with UTF-8 and many native PHP string functions prefer UTF-8. The future is very much UTF-8 and trying to keep our codebase working with any other character set is going to be more and more challenging.

IPS 4.0
Even though IP.Board 3 introduced UTF-8 as the default character set for new installations, we're aware that we still have many clients that are not using UTF-8 currently. IPS 4.0 is going to be strictly UTF-8 only which means we need to convert the data before or as part of the upgrade process.

Converting to UTF-8 isn't as simple as changing the database encoding. Merely doing this will simply corrupt the data you have in your database. We need to be a little smarter and use a script to do this work for us.

The great news is that even if you choose to convert your data to UTF-8 today, your IP.Board 3.x will run just fine and you may even find it more efficient as it doesn't have to convert lots of data on the fly.

The IPS UTF-8 Database Converter
We've written a script that can safely convert your database to UTF-8. The script does not overwrite your data until you manually confirm that the conversion process has been successful. This means that there is no risk of corrupting your existing data.

Of course, it is good practice to perform a full MySQL back-up before making any changes to the database as a precaution and we recommend that you do this.

You can download the converter and its instructions here.

How can I tell if I need to convert my database?
When you first run the converter, it'll check your database and let you know if you need to convert or not. Even if you are running UTF-8, you may not be using the correct collation (utf8_unicode_ci) so you have the option of changing your collation which is a very fast procedure and does not need a full conversion to complete.

If you first used IP.Board with IP.Board 3.0 then you may only need to change your database table collation. This isn't a required step and the IPS 4 upgrade process will perform this task if you'd prefer to wait until IPS 4.0 is released.
Support
Please note that while we're happy to provide some pointers within the client forums, this release is not officially supported by our technical support department.

Beta Release
As this is a beta release, please be aware that there may be bugs. If you do spot one, please post it to the IPS Extras bug tracker.


 Share

Comments

Recommended Comments



After a marathon of about 15 hours, i have managed to convert the tables (though i have not tested them yet).

I already had UTF8, but i wanted to convert to UTF8MB4. I have found the following problems:

- First, i had to check all the Indexed Varchar fields with length > 191 and set them to 191 due to the 4-byte problem which limits indexes to 767 bytes, so 191 is the maximum. Took a lot of work.

- The converter complained about "null" values on Varchar fields (and it also complained about Null values where the Zero value was, which was odd), my only solution was to manually "temporarily update" the null values and zeros to values like -1 and -2, and write which fields suffered that change, to manually revert them at the new database, at the end. I finished the process with 4 sheets of paper of fields, lots of breaks at the process to due to those stops, mostly in IPS applications tables, which clearly were not designed with UTF8MB4 in mind.

- When i reached to the end of the process, checked table collation and to my surprise they were not UTF8MB4... they were just UTF8... is there a problem with the converter? Shall i do something else??

Just to conclude: the converter is not anywhere near to be completed, it is not "plug and play", gives a lot of work and headaches.

Link to comment
Share on other sites

Please report the issues you faced in the tracker.

 

1) It has code in place to automatically adjust the size of indexes already - you should not need to do anything manually. We are already aware of the length limitations for indexes.

 

2) You should also, obviously, not have to update NULL values to random values in varchar fields. Without seeing your before/after schema/data though it's hard to say why you'd have to do this, but nevertheless you should report it to the tracker so it will be investigated.

 

 

Keep in mind, this will be a released and supported script.  We encourage you to report bugs to the appropriate tracker category so that they can be investigated and resolved.  I'm concerned off-the-cuff replies here in the blog comments are simply going to be forgotten about and lost. :)

Link to comment
Share on other sites

Thank you, but the tracker takes ages... i have tried to put things there, but got no answer. As the converter is unsupported, i am not going to open a ticket for it also.

1) In my DB, even despite having indexes length of 191, the converter complains. I am trying to make a "clean" conversion This happens at the members table. The question is, i do not know where this is failing as i have put all varchar fields to 191...

2) I understand, but that is definitely not happening in my case.

 

Link to comment
Share on other sites

I have to say I was a bit nervous about this because of all the issues during and from the board merge. 

I was actually waiting for the crash on the local test board so I was ready for it....

But instead a message I can be happy with...

 

The database is set to UTF-8 and doesn't need converting.

 

Next was the live board....

 

Again I passed...

 

Thanks People!

 

 

 

 

 
Link to comment
Share on other sites

This was fixed on 1.0.5 but i still get the error:

 

 

 

Processing: cache_store                                                                  0%


Error: Incorrect string value: 'xE3o";s:...' for column 'cs_value' at row 14


File: /system/Convert/Convert.php


Line: 299

Link to comment
Share on other sites

on 1.0.4, the conversion worked after 3 hours but I was never told the conversion was complete - I only knew as I noticed all my database tables were now in utf8, I waited 15 minutes but still no message - server load was back to normal so I just closed the window

Link to comment
Share on other sites

I get this error:

Error: include(/uploads/profile/photo-3965.jpg) [function.include]: failed to open stream: No such file or directory File: /var/www/DOMAIN/conf_global.php(184) : runtime-created function Line: 1
Link to comment
Share on other sites

After conversion I get this:
Fatal error: Uncaught exception 'Exception' with message 'output is not an object' in /Applications/MAMP/htdocs/admin/sources/base/ipsRegistry.php:1135 Stack trace: #0 /Applications/MAMP/htdocs/admin/sources/base/ipsRegistry.php(1233): ipsRegistry::getClass('output') #1 /Applications/MAMP/htdocs/admin/applications/core/modules_admin/applications/applications.php(2759): ipsRegistry->__get('output') #2 /Applications/MAMP/htdocs/admin/sources/base/ipsRegistry.php(3546): admin_core_applications_applications->applicationsRecache() #3 /Applications/MAMP/htdocs/admin/sources/base/ipsRegistry.php(2279): ips_CacheRegistry::rebuildCache('app_cache', 'global') #4 /Applications/MAMP/htdocs/admin/sources/base/ipsRegistry.php(614): ipsRegistry->checkCaches() #5 /Applications/MAMP/htdocs/admin/sources/base/ipsController.php(77): ipsRegistry::init() #6 /Applications/MAMP/htdocs/admin/sources/base/ipsController.php(64): ipsController->init() #7 /Applications/MAMP/htdocs/index.php(25): ipsController::run() #8 {main} thrown in/Applications/MAMP/htdocs/admin/sources/base/ipsRegistry.php on line 1135

Any suggestions?

Link to comment
Share on other sites

Just run the IPS UTF8 converter during the upgrade process to 4.1, the converter finished successfully and now state: "

Error

The database tables are UTF-8, collations are correct and there is nothing to convert. You can proceed with the upgrade."

Problem is that our special characters (mostly umlauts and dashes) are now showing like this:
 

Quote

 

Topic title: Vernetzung 2er Häuser
Topic text: ich habe auf meinem Grundstück 2 Häuser...

Topic title: WLAN und Radius, einwahl nicht möglich
Topic title: One Click Upload Lösung für eigene Firma

Topic text: - An beiden Standorten soll es möglich kein die Dateien zu öffnen/bearbeiten...

 

This is on our testing forums, but how can I circumvent the issue or solve it?

Thanks,
Thomas

Link to comment
Share on other sites




Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...