Invision Community 4: SEO, prepare for v5 and dormant account notifications By Matt November 11, 2024
jair101 Posted January 30, 2014 Posted January 30, 2014 Want to share with you something that I found onver the internet. As I mentioned in this thread I had huge character issues. The table fields in my db were bulgarian_1251, table collations were sometimes utf8, some database connections were even latin1. I tried looking for converters - the new IPC converter is still buggy, there are others around the internet that didn't work either. Anyway, here is a method to get your database in utf8 order that is extremly clean and at least for me - 100% working. 1. dump your current database, with its current charset. mysqldump --default-character-set=cp1251 --no-tablespaces -h host -u user -pPassword -B database > mybd.sql For me the character-set was cp1251, for you it might be something else. There is no interval between p and Password. 2. Open this sql file with notepad++. Here comes the condition that the database needs to be small. Otherwise your computer might stop responding. I did the method succesfully with up to 300mb database, depends also on the specs of your workstation. I suppose any good text redactor that can convert encodings will do. If your database is bigger it might be done in parts, no problem. In notepad++ the file should be in the dumped encoding and everything should be readable. Select All and switch the encoding to UTF8. This converts all text in the file to UTF8. There should be no losses in characters with special symbols, but double check just in case. 3. Search for your current collations and replace them with UTF8. In my case I had to search for "cp1251_bulgarian_ci" and replace it with "utf8_unicode_ci". 4. Search for "cp1251" and replace it with "utf8". This should update some other non utf8 stuff. 5. Save the file and restore the database. Everything should be in order now. Thats it. Very simple, without using some black box scripts that often give you errors that you need to troubleshoot. I haven't invented this, read it on another forum, so if you think there might be a problem with it, please comment.
Ryan Ashbrook Posted February 12, 2014 Posted February 12, 2014 I do not really recommend doing this. While it may work, it will not likely reliably convert the actual data to UTF-8, and any large database will completely lock up the users computer when attempting to open the .sql file. You also run the risk of potentially corrupting the content. Further, much of the data contained within the database is stored as a serialized array. Doing this would likely corrupt those bits of data, and cause them to not function, as they rely on the length of the data inside to be accurate. When you move from one character set to another, the data lengths may not be correct anymore, though the arrays are still set as such. It is recommended that you use an actual converter.
jair101 Posted February 12, 2014 Author Posted February 12, 2014 Thank you, I was wondering if there are any problems with this method and I couldn't find anything on the net. My forum works without a problem for more then two weeks, knock on wood, no problems yet. I would prefer a working convertor too, tried at least 5 different ones (including the old and new from IPS), none of them worked.
PeterUK Posted February 18, 2014 Posted February 18, 2014 Thank you, I was wondering if there are any problems with this method and I couldn't find anything on the net. My forum works without a problem for more then two weeks, knock on wood, no problems yet. I would prefer a working convertor too, tried at least 5 different ones (including the old and new from IPS), none of them worked. If you have command line access, which it seems you do, and are running Linux, why don't you >give mine a try?
jair101 Posted February 18, 2014 Author Posted February 18, 2014 If you have command line access, which it seems you do, and are running Linux, why don't you >give mine a try? As far as I understood yours doesn't do things differently then bfarber's script, it is just speed optimization to use more then one CPU thread for larger databases, correct? Bfarber's script didn't work for me, so I assumed yours won't work either.
PeterUK Posted February 18, 2014 Posted February 18, 2014 It now uses a different library to do the actual conversion of characters, it's faster, and significantly more accurate (from my experience). It also will not double encode existing UTF-8.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.