ejakabo Posted September 1, 2009 Posted September 1, 2009 Hello friends. I started preperations for upgrading my forum to 3.0.x I tried on my localhost, upgrading was sucessful and unfortunately there occured special character problems(Turkish characters are shown as question marks). I need to convert my database completely to utf-8 but there is no script i found. It is possible to make it manully via MYSQL queries but it is a nightmare to do it on my 1,5 GB database. Is IPS planing to make a simple and effective database charset convertor plugin/hook/tool like vBulletin and Wordpress have?
bfarber Posted September 1, 2009 Posted September 1, 2009 [quote name='ejakabo' date='01 September 2009 - 06:30 AM' timestamp='1251801059' post='1851252'] Is IPS planing to make a simple and effective database charset convertor plugin/hook/tool like vBulletin and Wordpress have? Both of the linked scripts are created by independent third parties, not the respective companies you mention. Any third party is free to make a database charset converter for IPB if they wish, just as others have for other products.
ejakabo Posted September 1, 2009 Author Posted September 1, 2009 I know they are third party solutions. I asked because IPS is innovative company :) and different from these above. I wonder that will IPS make a favour because converting to UTF-8 is a milestone for IPB users and unfortunately lots of non-engilish forums(whose language contain special characters) have similar problems. Have a nice work.
Ritsuka Posted September 2, 2009 Posted September 2, 2009 ejakabo, greetz from russian ipboard community, we have exactly what you need =)1) For average sized forums - database conversion script based on PHP: Thats it. More info and russian support topics: 1, 2.Always make a backup before performing any operations with database!<?php // Database info include("conf_global.php"); $dbhost = $INFO['sql_host']; $dbuser = $INFO['sql_user']; $dbpass = $INFO['sql_pass']; $dbname = $INFO['sql_database']; //--------------- header('Content-type: text/plain'); $dbconn = mysql_connect($dbhost, $dbuser, $dbpass) or die( mysql_error() ); $db = mysql_select_db($dbname) or die( mysql_error() ); $sql = "ALTER DATABASE `".$dbname."` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci"; $result = mysql_query($sql) or die( mysql_error() ); print "Database changed to UTF-8.\n"; $sql = 'SHOW TABLES'; $result = mysql_query($sql) or die( mysql_error() ); while ( $row = mysql_fetch_row($result) ) { $table = mysql_real_escape_string($row[0]); $sql = "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci"; mysql_query($sql) or die( mysql_error() ); print "$table changed to UTF-8.\n"; } mysql_close($dbconn); ?> Upload this script as "any_name_you_like.php" in forum's root folder and execute. Be careful with large databases - be sure to correctly configure max_execution_time limit. 2) For large forums - 2 step conversion: Step 1: <?php // Database info include("conf_global.php"); $dbhost = $INFO['sql_host']; $dbuser = $INFO['sql_user']; $dbpass = $INFO['sql_pass']; $dbname = $INFO['sql_database']; //--------------- header('Content-type: text/plain'); $dbconn = mysql_connect($dbhost, $dbuser, $dbpass) or die( mysql_error() ); $db = mysql_select_db($dbname) or die( mysql_error() ); $sql = "ALTER DATABASE `".$dbname."` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;\n"; $exec_sql = 'SHOW TABLES'; $result = mysql_query($exec_sql) or die( mysql_error() ); while ( $row = mysql_fetch_row($result) ) { $table = mysql_real_escape_string($row[0]); $sql .= "ALTER TABLE `".$table."` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;\n"; } mysql_close($dbconn); echo $sql; As in first method, upload this script into forum's root folder and execute. Follow it's URL and you'll get a mysql dump file. Save it, for example, as win2utf.sql and upload to /somewhere/at/server Step 2: Connect to your sever throught SSH and run: #mysql -uusername -p dbname < /somewhere/at/server/win2utf.sql
Developer Posted September 26, 2009 Posted September 26, 2009 ejakabo, greetz from russian ipboard community, we have exactly what you need =) 1) For average sized forums - database conversion script based on PHP: <?php // Database info include("conf_global.php"); $dbhost = $INFO; $dbuser = $INFO; $dbpass = $INFO; $dbname = $INFO; //--------------- header('Content-type: text/plain'); $dbconn = mysql_connect($dbhost, $dbuser, $dbpass) or die( mysql_error() ); $db = mysql_select_db($dbname) or die( mysql_error() ); $sql = "ALTER DATABASE `".$dbname."` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci"; $result = mysql_query($sql) or die( mysql_error() ); print "Database changed to UTF-8.\n"; $sql = 'SHOW TABLES'; $result = mysql_query($sql) or die( mysql_error() ); while ( $row = mysql_fetch_row($result) ) { $table = mysql_real_escape_string($row); $sql = "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci"; mysql_query($sql) or die( mysql_error() ); print "$table changed to UTF-8.\n"; } mysql_close($dbconn); ?> Upload this script as "any_name_you_like.php" in forum's root folder and execute. Be careful with large databases - be sure to correctly configure max_execution_time limit. 2) For large forums - 2 step conversion: Step 1: <?php // Database info include("conf_global.php"); $dbhost = $INFO; $dbuser = $INFO; $dbpass = $INFO; $dbname = $INFO; //--------------- header('Content-type: text/plain'); $dbconn = mysql_connect($dbhost, $dbuser, $dbpass) or die( mysql_error() ); $db = mysql_select_db($dbname) or die( mysql_error() ); $sql = "ALTER DATABASE `".$dbname."` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;\n"; $exec_sql = 'SHOW TABLES'; $result = mysql_query($exec_sql) or die( mysql_error() ); while ( $row = mysql_fetch_row($result) ) { $table = mysql_real_escape_string($row); $sql .= "ALTER TABLE `".$table."` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;\n"; } mysql_close($dbconn); echo $sql; As in first method, upload this script into forum's root folder and execute. Follow it's URL and you'll get a mysql dump file. Save it, for example, as win2utf.sql and upload to /somewhere/at/server Step 2: Connect to your sever throught SSH and run: #mysql -uusername -p dbname < /somewhere/at/server/win2utf.sql Thats it. More info and russian support topics: 1, 2. Always make a backup before performing any operations with database! Hi, I have run the first script and it has gone well. I have then added this code to config file $INFO = 'utf8'; and upgraded to IPB 3.x Then I have set UTF-8 encoding from withing ACP. I can confirm that my database collation is now set to utf8_unicode_ci, but posts arent readable from within phpMyAdmin and the forum. Please advice what could have gone wrong?
phpony Posted September 26, 2009 Posted September 26, 2009 Could you provide an example unreadable string from your database? The most common case of that problem is having data in DB already encoded in utf8, but named as other charset. In this case you can just make a copy of your database backup, open it in any editor more powerful than notepad, replace string "YOUR_CURRENT_CHARSET" with "UTF8" and then upload it back. After that just run this script again to fix collations where needed and that's all...
Developer Posted September 26, 2009 Posted September 26, 2009 [quote name='Aoyagi Ritsuka' date='26 September 2009 - 10:42 AM' timestamp='1253943736' post='1861165'] Could you provide an example unreadable string from your database? Here it is Ïðèâåò! ß òóò âïåðâûå The most common case of that problem is having data in DB already encoded in utf8, but named as other charset. In this case you can just make a copy of your database backup, open it in any editor more powerful than notepad, replace string "YOUR_CURRENT_CHARSET" with "UTF8" and then upload it back. After that just run this script again to fix collations where needed and that's all... SHOW VARIABLES LIKE 'character_set_database'; returns this on old database Variable_name Value character_set_database latin1 Database collation is set to latin1_swedish_ci All tables had collation set to latin1_swedish_ci SHOW VARIABLES LIKE 'character_set_database'; The new DB returns Variable_name Value character_set_database utf8 Database collation is set to utf8_unicode_ci All tables have collation set to utf8_general_ci I have also tried to add these instructions to .htaccess as mentioned as per official instructions By the way old database has had 148 tables and the converted one has 180. I guess that some data has been in UTF-8 in DB, but I am not sure about it. How may I check it? The problem with downloading,editing and uploading back is that the DB is 1.5 GB in size and I can only manipulate it on my server via SSH or similar way without downloading it. Also please note that my MySQL default charset is set to UTF-8 Unicode (utf8) and default MySQL connection collation is set to utf_unicode_ciAddDefaultCharset utf-8 AddCharset utf-8 * <IfModule mod_charset.c> CharsetSourceEnc utf-8 CharsetDefault utf-8 </IfModule>
phpony Posted September 29, 2009 Posted September 29, 2009 This is a data encoded in "CP1252". If you read it as "CP1251" it will be: Which means that it's better for as to continue this discussion on our native language =) Похоже, что у вас в БД занесены данные в кодировке CP1252, при том что в свойствах таблиц стоит latin. Самый простой способ привести такую базу в utf-8 - это: 1) снять полный дамп базы в файл на сервере, 2) сконвертировать его там же с помощью iconv (или любым другим инструментом) из CP1252 => UTF-8, 3) заменить в файле дампа информацию о кодировке таблицы с CP1252 на UTF-8, 4) импортировать дамп в новую БД. Если вы не сможете справиться со всем этим через SSH - пишите в icq 7753362, подскажу комманды.Ïðèâåò! ß òóò âïåðâûåПривет! Я тут впервые
skorean Posted October 19, 2009 Posted October 19, 2009 I am using IPB 2.3.6 and I meet a similar problem. Document character set value in ACP->System Settings->Server Environment: UTF-8 MySQL charset: UTF-8 Unicode (utf8) MySQL connection collation: utf8_unicode_ci The collation of the database (all tables): latin1_swedish_ci However, the collation for column title in table ibf_topics is utf8_general_ci Title of an existing topic is: "Wonbin thích chơi Wii của Nintendo" (Vietnamese) It is displayed exactly like that when a member browse the board. However, it is stored in database (column title of table ibf_topics) as "Wonbin thÃch chÆ¡i Wii của Nintendo" (I don't know what the charset code is) I would like to convert all existing data in database from current charset code to UTF-8 Unicode. Is there any solution? I tried to use iconv but I don't know the source charset to put into ????? in this command: but when I open the sql file in Wordpad, it does not show "Wonbin thích chơi Wii của Nintendo" as I want.iconv -f ????? -t UTF-8 inf_topics.sql > ibf_topics_utf8.sql I tried iconv -f ISO-8859-1 -t UTF-8 inf_topics.sql > ibf_topics_utf8.sql
Will Munny Posted October 19, 2009 Posted October 19, 2009 Did you read my tutorial on changing to UTF-8?... It's not a database converter but it details the exhaustive steps required to be truly running in UTF-8... and many people probably won't need to make the conversion if you take these steps. In fact, you may decide that running a converter on your database is not worth the risk, unless your board's primary language is in a complex script... After making the changes detailed in my tutorial, everything posted thereafter will work fully and properly in UTF-8, no matter what the language...Making the switch to UTF-8
MindTooth Posted November 13, 2009 Posted November 13, 2009 I tried your script. Only got "No database is selected" error. Even though I made sure that I had the correct name.Ed1t: It worked, thank you :D Birger :) Ed2t: You should build it work support for converting the symbols too. Since now I only get some nasty question marks.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.