Jump to content

Database Charset Converter for IP.Board 3 ?


ejakabo

Recommended Posts

Hello friends.

I started preperations for upgrading my forum to 3.0.x I tried on my localhost, upgrading was sucessful and unfortunately there occured special character problems(Turkish characters are shown as question marks). I need to convert my database completely to utf-8 but there is no script i found. It is possible to make it manully via MYSQL queries but it is a nightmare to do it on my 1,5 GB database.

Is IPS planing to make a simple and effective database charset convertor plugin/hook/tool like vBulletin and Wordpress have?

Link to comment
Share on other sites

[quote name='ejakabo' date='01 September 2009 - 06:30 AM' timestamp='1251801059' post='1851252']
Is IPS planing to make a simple and effective database charset convertor plugin/hook/tool like vBulletin and Wordpress have?


Both of the linked scripts are created by independent third parties, not the respective companies you mention. Any third party is free to make a database charset converter for IPB if they wish, just as others have for other products.

Link to comment
Share on other sites

I know they are third party solutions. I asked because IPS is innovative company :) and different from these above.

I wonder that will IPS make a favour because converting to UTF-8 is a milestone for IPB users and unfortunately lots of non-engilish forums(whose language contain special characters) have similar problems.

Have a nice work.

Link to comment
Share on other sites

ejakabo, greetz from russian ipboard community, we have exactly what you need =)

1) For average sized forums - database conversion script based on PHP:


Thats it. More info and russian support topics: 1, 2.

Always make a backup before performing any operations with database!

<?php

// Database info


include("conf_global.php");


$dbhost = $INFO['sql_host'];

$dbuser = $INFO['sql_user'];

$dbpass = $INFO['sql_pass'];

$dbname = $INFO['sql_database'];


//---------------


header('Content-type: text/plain');


$dbconn = mysql_connect($dbhost, $dbuser, $dbpass) or die( mysql_error() );

$db = mysql_select_db($dbname) or die( mysql_error() );


$sql = "ALTER DATABASE `".$dbname."` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci";

$result = mysql_query($sql) or die( mysql_error() );

print "Database changed to UTF-8.\n";


$sql = 'SHOW TABLES';

$result = mysql_query($sql) or die( mysql_error() );


while ( $row = mysql_fetch_row($result) )

{

$table = mysql_real_escape_string($row[0]);

$sql = "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci";

mysql_query($sql) or die( mysql_error() );

print "$table changed to UTF-8.\n";

}


mysql_close($dbconn);

?>


Upload this script as "any_name_you_like.php" in forum's root folder and execute. Be careful with large databases - be sure to correctly configure max_execution_time limit.


2) For large forums - 2 step conversion:


Step 1:

<?php

// Database info


include("conf_global.php");


$dbhost = $INFO['sql_host'];

$dbuser = $INFO['sql_user'];

$dbpass = $INFO['sql_pass'];

$dbname = $INFO['sql_database'];


//---------------


header('Content-type: text/plain');


$dbconn = mysql_connect($dbhost, $dbuser, $dbpass) or die( mysql_error() );

$db = mysql_select_db($dbname) or die( mysql_error() );


$sql = "ALTER DATABASE `".$dbname."` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;\n";


$exec_sql = 'SHOW TABLES';

$result = mysql_query($exec_sql) or die( mysql_error() );


while ( $row = mysql_fetch_row($result) )

{

$table = mysql_real_escape_string($row[0]);

$sql .= "ALTER TABLE `".$table."` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;\n";

}

mysql_close($dbconn);


echo $sql;

As in first method, upload this script into forum's root folder and execute. Follow it's URL and you'll get a mysql dump file. Save it,  for example, as win2utf.sql and upload to /somewhere/at/server


Step 2:

Connect to your sever throught SSH and run:

#mysql -uusername -p dbname < /somewhere/at/server/win2utf.sql
Link to comment
Share on other sites

  • 4 weeks later...

ejakabo, greetz from russian ipboard community, we have exactly what you need =)

1) For average sized forums - database conversion script based on PHP:
<?php
// Database info

include("conf_global.php");

$dbhost = $INFO;
$dbuser = $INFO;
$dbpass = $INFO;
$dbname = $INFO;

//---------------

header('Content-type: text/plain');

$dbconn = mysql_connect($dbhost, $dbuser, $dbpass) or die( mysql_error() );
$db = mysql_select_db($dbname) or die( mysql_error() );

$sql = "ALTER DATABASE `".$dbname."` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci";
$result = mysql_query($sql) or die( mysql_error() );
print "Database changed to UTF-8.\n";

$sql = 'SHOW TABLES';
$result = mysql_query($sql) or die( mysql_error() );

while ( $row = mysql_fetch_row($result) )
{
$table = mysql_real_escape_string($row);
$sql = "ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci";
mysql_query($sql) or die( mysql_error() );
print "$table changed to UTF-8.\n";
}

mysql_close($dbconn);
?>

Upload this script as "any_name_you_like.php" in forum's root folder and execute. Be careful with large databases - be sure to correctly configure max_execution_time limit.

2) For large forums - 2 step conversion:

Step 1:
<?php
// Database info

include("conf_global.php");

$dbhost = $INFO;
$dbuser = $INFO;
$dbpass = $INFO;
$dbname = $INFO;

//---------------

header('Content-type: text/plain');

$dbconn = mysql_connect($dbhost, $dbuser, $dbpass) or die( mysql_error() );
$db = mysql_select_db($dbname) or die( mysql_error() );

$sql = "ALTER DATABASE `".$dbname."` DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci;\n";

$exec_sql = 'SHOW TABLES';
$result = mysql_query($exec_sql) or die( mysql_error() );

while ( $row = mysql_fetch_row($result) )
{
$table = mysql_real_escape_string($row);
$sql .= "ALTER TABLE `".$table."` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci, CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;\n";
}
mysql_close($dbconn);

echo $sql;
As in first method, upload this script into forum's root folder and execute. Follow it's URL and you'll get a mysql dump file. Save it, for example, as win2utf.sql and upload to /somewhere/at/server

Step 2:
Connect to your sever throught SSH and run:
#mysql -uusername -p dbname < /somewhere/at/server/win2utf.sql

Thats it. More info and russian support topics: 1, 2.

Always make a backup before performing any operations with database!

Hi,

I have run the first script and it has gone well. I have then added this code to config file
$INFO = 'utf8'; and upgraded to IPB 3.x
Then I have set UTF-8 encoding from withing ACP.
I can confirm that my database collation is now set to utf8_unicode_ci, but posts arent readable from within phpMyAdmin and the forum.
Please advice what could have gone wrong?
Link to comment
Share on other sites

Could you provide an example unreadable string from your database?

The most common case of that problem is having data in DB already encoded in utf8, but named as other charset. In this case you can just make a copy of your database backup, open it in any editor more powerful than notepad, replace string "YOUR_CURRENT_CHARSET" with "UTF8" and then upload it back. After that just run this script again to fix collations where needed and that's all...

Link to comment
Share on other sites

[quote name='Aoyagi Ritsuka' date='26 September 2009 - 10:42 AM' timestamp='1253943736' post='1861165']
Could you provide an example unreadable string from your database?

Here it is

Ïðèâåò! ß òóò âïåðâûå

The most common case of that problem is having data in DB already encoded in utf8, but named as other charset. In this case you can just make a copy of your database backup, open it in any editor more powerful than notepad, replace string "YOUR_CURRENT_CHARSET" with "UTF8" and then upload it back. After that just run this script again to fix collations where needed and that's all...
SHOW VARIABLES LIKE 'character_set_database';
returns this on old database

Variable_name Value
character_set_database latin1

Database collation is set to latin1_swedish_ci
All tables had collation set to latin1_swedish_ci

SHOW VARIABLES LIKE 'character_set_database';
The new DB returns
Variable_name Value
character_set_database utf8

Database collation is set to utf8_unicode_ci
All tables have collation set to utf8_general_ci

I have also tried to add these instructions to .htaccess as mentioned as per official instructions


By the way old database has had 148 tables and the converted one has 180.

I guess that some data has been in UTF-8 in DB, but I am not sure about it. How may I check it?
The problem with downloading,editing and uploading back is that the DB is 1.5 GB in size and I can only manipulate it on my server via SSH or similar way without downloading it.

Also please note that my MySQL default charset is set to UTF-8 Unicode (utf8) and default MySQL connection collation is set to utf_unicode_ci



AddDefaultCharset utf-8

AddCharset utf-8 *

<IfModule mod_charset.c>

CharsetSourceEnc utf-8

CharsetDefault utf-8

</IfModule>

Link to comment
Share on other sites


This is a data encoded in "CP1252". If you read it as "CP1251" it will be:

Which means that it's better for as to continue this discussion on our native language =)

Похоже, что у вас в БД занесены данные в кодировке CP1252, при том что в свойствах таблиц стоит latin. Самый простой способ привести такую базу в utf-8 - это:

1) снять полный дамп базы в файл на сервере,
2) сконвертировать его там же с помощью iconv (или любым другим инструментом) из CP1252 => UTF-8,
3) заменить в файле дампа информацию о кодировке таблицы с CP1252 на UTF-8,
4) импортировать дамп в новую БД.

Если вы не сможете справиться со всем этим через SSH - пишите в icq 7753362, подскажу комманды.

Ïðèâåò! ß òóò âïåðâûå

Привет! Я тут впервые

Link to comment
Share on other sites

  • 3 weeks later...

I am using IPB 2.3.6 and I meet a similar problem.

Document character set value in ACP->System Settings->Server Environment: UTF-8
MySQL charset: UTF-8 Unicode (utf8)
MySQL connection collation: utf8_unicode_ci
The collation of the database (all tables): latin1_swedish_ci
However, the collation for column title in table ibf_topics is utf8_general_ci

Title of an existing topic is: "Wonbin thích chơi Wii của Nintendo" (Vietnamese)
It is displayed exactly like that when a member browse the board.
However, it is stored in database (column title of table ibf_topics) as "Wonbin thích chơi Wii của Nintendo" (I don't know what the charset code is)

I would like to convert all existing data in database from current charset code to UTF-8 Unicode. Is there any solution?

I tried to use iconv but I don't know the source charset to put into ????? in this command:

but when I open the sql file in Wordpad, it does not show "Wonbin thích chơi Wii của Nintendo" as I want.

iconv -f ????? -t UTF-8 inf_topics.sql > ibf_topics_utf8.sql

I tried 

iconv -f ISO-8859-1 -t UTF-8 inf_topics.sql > ibf_topics_utf8.sql
Link to comment
Share on other sites

Did you read my tutorial on changing to UTF-8?... It's not a database converter but it details the exhaustive steps required to be truly running in UTF-8... and many people probably won't need to make the conversion if you take these steps. In fact, you may decide that running a converter on your database is not worth the risk, unless your board's primary language is in a complex script...

After making the changes detailed in my tutorial, everything posted thereafter will work fully and properly in UTF-8, no matter what the language...
Making the switch to UTF-8

Link to comment
Share on other sites

  • 4 weeks later...

I tried your script. Only got "No database is selected" error. Even though I made sure that I had the correct name.
Ed1t: It worked, thank you :D

Birger :)

Ed2t: You should build it work support for converting the symbols too. Since now I only get some nasty question marks.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...