Jump to content

Change UTF8 to UTF8MB4


Recommended Posts

I installed my Forums without utf8mb4, thats pain in the ass. There are several errors resulting in inconstistent Datebase-Tables, empty Threads etc.

The Emojis-Fix from 4.0.8 only fixes the Emojis ...

How can i reconvert Everything to utf8mb4 ???
As i see there is no Converter for IPS4.

Link to comment
Share on other sites

I went through this process prior to my IPS4 upgrade and it was a PITA. Had to change all the ones that were missed (due to default being UTF8 instead of UTF8MB4) manually. You'll run into problems with indexes using varchar(255) which in turn will cause problems when IPS release an update that has database changes.

Support have raised a bug for me so hopefully they will have a permanent fix for it soon but something to be aware off before you go changing the collation.

Link to comment
Share on other sites

I don't think there was any or little notice to utf8mb4 until the problem with Emojis was brought up by IPS.

I'm thinking that this may be a problem as IPB progresses, and it appears that UTF8MB4 is the standard or what everyone should be striving for.

I have little experience with MySQL, so I'm not sure. I see conversion scripts for doing this, but I'm very leery of doing this without the support of IPS.

Is UTF8MB4 something we should be looking into?

Link to comment
Share on other sites

I don't think there was any or little notice to utf8mb4 until the problem with Emojis was brought up by IPS.

I'm thinking that this may be a problem as IPB progresses, and it appears that UTF8MB4 is the standard or what everyone should be striving for.

Which is kind of odd as utf8mb4 is the default if you install a fresh copy of IPS4 - it's clearly the way forward. I've converted my DB to utf8mb4 yesterday and it was a real pain to get everything right.

Link to comment
Share on other sites

Do we have to convert to  UTF8MB4  ?

It was told earlier that we could have either .

During my tests I have never chosen to go for a conversion to utf8mb4 and have not faced any problem ( yet )

Now a bit confused ...

Link to comment
Share on other sites

Do we have to convert to  UTF8MB4  ?

No. The scenarios are:

  • MySQL <= 5.1
    • You cannot use UTF8MB4 at all
  • MySQL >= 5.5
    • UTF8. Certain characters (like certain Emoji) won’t be supported and could cause problems in earlier releases. Therefore the most recent version now warns you about this when you try to include such characters.
    • UTF8MB4: You are free to use any characters. 
Edited by Ralf H.
Link to comment
Share on other sites

Which is kind of odd as utf8mb4 is the default if you install a fresh copy of IPS4 - it's clearly the way forward. I've converted my DB to utf8mb4 yesterday and it was a real pain to get everything right.

Can you give any information about the conversion> Scripts, location of information?

Link to comment
Share on other sites

I just checked my live site which was and an upgrade from 3.x, and all the tables are utf8.

I just checked my development site as it was a fresh install, and there was a mixture of utf8, and utf8mb4.

All tables that start with a ipf_ are utf8, and all the others are utf8mb4. Shouldn't all the tables be utf8mb4 ?

2015-06-17_7-07-25.thumb.jpg.15594f52e2a

Link to comment
Share on other sites

Sorry, there is currently no converter between UTF8's. My best suggestion at your current time is to restore your 3.x backup and start over to ensure everything is properly done.

That's just not an acceptable answer for those of us that were early adopters of IPB 4. You actually want us to loose months of user data. This should have been taken care of when IPB moved from utf8_unicode_ci to utf8mb4_unicode_ci . We trust IPS to do what is right to keep their clients up to date with the latest releases. If the latest releases IPB has had a default character set change there should be a way for IPS's paid clients to benefit from that change, and advancing along with everyone else on the same playing field.

It seems to me that IPS should jump ahead of this and create a conversion script for its clients. This not only fixes future problems, but will also lessen support tickets.

Or better yet can we create a ticket for this. The majority of independent IPS clients are comfortable with running MySQL scripts.

Link to comment
Share on other sites

I was kicking myself for not choosing mb4 when I first created my small family project (I saw "slower" and balked and did not fully consider the emoji implications for my target users) . Later I realized I couldn't have  anyway since Dreamhost runs at  MySQL 5.1.n  So  I can reasonably foresee either moving to a  different host or eventually upgrading MySQL or many reasons for wanting to convert an existing community instance.

Link to comment
Share on other sites

The thing we have to think is: "is that so important to use emojis that they worth converting to utf8mb4?" Even some browsers cannot display them properly!

If your site runs fine and you do not need emojis, do not convert to utf8mb4: there are also some disadvantages to it also, like the increased database space, inherent to the use of 4 byte character encoding. The full UTF8 implementation needs 4 bytes for some characeters, it's true. But if see the table of the 4-byte characters, i doubt you use them.

In a normal UTF8 use, 99.99% of the caracters used are 1, 2 or 3 bytes.

According to my experience, Utf8mb4 is a must for the fellows using Windows Server and IIS, because it will solve many character conversion problems arising from the odd php implementation on windows.

Link to comment
Share on other sites

I don't understand the fuss. if using utf8_unicode_ci, characters using 4 bytes such as emoji will be refused by the database. if that is an issue, simply update the collation? it's not like it's a different character set, utf8mb4 is a superset of utf8 in mysql, you can go from utf8 to utf8mb4 easily, there's just no going back from utf8mb4 to utf8 without losing data once there is 4-byte data in the database.

Link to comment
Share on other sites

 

What Marcher said. It’s really a server-level thing that has nothing to do with the actual IPS software. If you choose to self-host, you should be able to maintain your database and set such things yourself or let it do through someone else. IPS only guarantees that 4.0 works both on UTF8 and UTF8mb4. It’s not their job to maintain your server. 

Link to comment
Share on other sites

Unfortunately it's not that simple. Some of the indexes are created with a length based on 3 bytes per char in mind. So when you try and change the collation on those tables you get an error about indexes being too large. To fix it you have to completely delete the indexes on that table, recreate them with a length suitable for 4 bytes per char. Then you can change the collation.

I did it on my board by deleting all the indexes and then changing the collation type in conf_global.php. Then when you run the support tool it spits out the commands for recreating the indexes - even with the correct index lengths for utf8mb4. Not for the faint hearted, it took about 5 hours to recreate.

Info about the index issue: https://mathiasbynens.be/notes/mysql-utf8mb4 

 

Edited by AutoItScript
Link to comment
Share on other sites

Did you select the table, click Operations, and then change collation that way? If so, you've not changed the collation of all the columns in the table. That's where the fun is :)

If you use the structure tab to show the columns, you should see the collation there too in tables that contain text fields.

Edited by AutoItScript
Link to comment
Share on other sites

I got something for you:

# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(XXXXX) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Dont blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)
Link to comment
Share on other sites

I got something for you:

# For each database:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# For each table:
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# For each column:
ALTER TABLE table_name CHANGE column_name column_name VARCHAR(XXXXX) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# (Dont blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a `VARCHAR` column.)

I saw this during my investigation online. I manually preformed the conversion, but got a index length error of some kind. I possibly faild to create the database correctly?

Link to comment
Share on other sites

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...