Jump to content

Charset Conversion Libraries


Ali Majrashi

Recommended Posts

Hi,

I have converted the script to support batch queries, and uploaded it as a Resource (and here).
I still get some "MySQL Server has gone away", that I have to fix probably by retrying the query after reconnecting, but otherwise the conversion seems to be quite successful (of course extensive testing of the new IPB setup is now needed, to actually use this).

I also had problems with Greek non-utf8 encodings occur since "windows-1253" and "iso-8859-7" have specific differences, but seem to co-exist in the database.
Actually the Euro sign (A4) and a few more should probably be added iso-8859-7 in the IPB ConvertTable (I added them in my tests).
Still a serious problem exists with character 0xA2 (Accented A symbol) that has different meaning in the two encodings that seem to co-exist and probably has to be solved manually :(
Also, I had to ignore some fields (e.g. ibf_gallery_images.metadata that has EXIF headers and ibf_gallery_images.file_name that has chars with encoding I cannot recognize, but is probably not important).
One other field already contained utf8 symbols, so I had to write special code in oder NOT to re-convert them!

Problems just seem to be un-ending, and I guess more will occur when the database will be used by IPB, so good luck to anyone that will be using this!

Link to comment
Share on other sites

[quote name='Valtasar' post='1784420' date='Feb 7 2009, 09:38 PM']Hi,

I have converted the script to support batch queries, and uploaded it as a Resource (and here).
I still get some "MySQL Server has gone away", that I have to fix probably by retrying the query after reconnecting, but otherwise the conversion seems to be quite successful (of course extensive testing of the new IPB setup is now needed, to actually use this).

I also had problems with Greek non-utf8 encodings occur since "windows-1253" and "iso-8859-7" have specific differences, but seem to co-exist in the database.
Actually the Euro sign (A4) and a few more should probably be added iso-8859-7 in the IPB ConvertTable (I added them in my tests).
Still a serious problem exists with character 0xA2 (Accented A symbol) that has different meaning in the two encodings that seem to co-exist and probably has to be solved manually :(
Also, I had to ignore some fields (e.g. ibf_gallery_images.metadata that has EXIF headers and ibf_gallery_images.file_name that has chars with encoding I cannot recognize, but is probably not important).
One other field already contained utf8 symbols, so I had to write special code in oder NOT to re-convert them!

Problems just seem to be un-ending, and I guess more will occur when the database will be used by IPB, so good luck to anyone that will be using this!

I did tried and i got as far as this


It stopped 1 minute later and displayed this... :(

Thank you Valtasar for batch addition... :)

Table inv_acp_help: fields to be converted: 4 primary keys: 1 inv_acp_help.page_key is assumed to contain iso-8859-9 characters inv_acp_help.help_title is assumed to contain iso-8859-9 characters inv_acp_help.help_body is assumed to contain iso-8859-9 characters inv_acp_help.help_mouseover is assumed to contain iso-8859-9 characters Converted: 50 rows Converted: 100 rows Converted: 150 rows Converted: 190 rows Converted: inv_acp_help Table inv_admin_login_logs: fields to be converted: 3 primary keys: 1 inv_admin_login_logs.admin_ip_address is assumed to contain iso-8859-9 characters inv_admin_login_logs.admin_username is assumed to contain iso-8859-9 characters inv_admin_login_logs.admin_post_details is assumed to contain iso-8859-9 characters Converted: 50 rows Converted: 100 rows Converted: 124 rows Converted: inv_admin_login_logs Table inv_admin_logs: fields to be converted: 4 primary keys: 1 inv_admin_logs.act is assumed to contain iso-8859-9 characters inv_admin_logs.code is assumed to contain iso-8859-9 characters inv_admin_logs.note is assumed to contain iso-8859-9 characters inv_admin_logs.ip_address is assumed to contain iso-8859-9 characters Converted: 50 rows Converted: 100 rows Converted: 150 rows Converted: 200 rows Converted: 250 rows Converted: 300 rows Converted: 340 rows Converted: inv_admin_logs Table inv_admin_permission_keys: Skipping field perm_key since it is a primary key Table inv_admin_permission_keys: fields to be converted: 3 primary keys: 1 inv_admin_permission_keys.perm_main is assumed to contain iso-8859-9 characters inv_admin_permission_keys.perm_child is assumed to contain iso-8859-9 characters inv_admin_permission_keys.perm_bit is assumed to contain iso-8859-9 characters Converted: 50 rows Converted: 100 rows Converted: 144 rows Converted: inv_admin_permission_keys Table inv_admin_permission_rows: fields to be converted: 1 primary keys: 1 Converted: inv_admin_permission_rows Table inv_admin_sessions: Skipping field session_id since it is a primary key Table inv_admin_sessions: fields to be converted: 4 primary keys: 1 inv_admin_sessions.session_ip_address is assumed to contain iso-8859-9 characters inv_admin_sessions.session_member_name is assumed to contain iso-8859-9 characters inv_admin_sessions.session_member_login_key is assumed to contain iso-8859-9 characters inv_admin_sessions.session_location is assumed to contain iso-8859-9 characters Converted: 1 rows Converted: inv_admin_sessions Table inv_announcements: fields to be converted: 3 primary keys: 1 inv_announcements.announce_title is assumed to contain iso-8859-9 characters inv_announcements.announce_post is assumed to contain iso-8859-9 characters inv_announcements.announce_forum is assumed to contain iso-8859-9 characters Converted: 4 rows Converted: inv_announcements Table inv_api_log: fields to be converted: 3 primary keys: 1 Converted: inv_api_log Table inv_api_users: fields to be converted: 4 primary keys: 1 Converted: inv_api_users Table inv_attachments: fields to be converted: 6 primary keys: 1 inv_attachments.attach_ext is assumed to contain iso-8859-9 characters inv_attachments.attach_file is assumed to contain iso-8859-9 characters inv_attachments.attach_location is assumed to contain iso-8859-9 characters inv_attachments.attach_thumb_location is assumed to contain iso-8859-9 characters inv_attachments.attach_post_key is assumed to contain iso-8859-9 characters inv_attachments.attach_rel_module is assumed to contain iso-8859-9 characters Converted: 4 rows Converted: inv_attachments Table inv_attachments_type: fields to be converted: 3 primary keys: 1 inv_attachments_type.atype_extension is assumed to contain iso-8859-9 characters inv_attachments_type.atype_mimetype is assumed to contain iso-8859-9 characters inv_attachments_type.atype_img is assumed to contain iso-8859-9 characters
Link to comment
Share on other sites

Yesterday, after many unsuccessful attempts (reconnect attempts after MySQL server connection fell did not solve the problem), I managed to run to make the script successfully by simply modifying the "max_allowed_packet" server MySQL variable from 1M to 24M!
After this, I even set $limit=100 (100 update queries per cycle) and the script ran without problems.
I guess another way to do it is by running the query mysql_query("SET max_allowed_packet=24M"); at the start.

@media: it is not clear from your output, whether you had the same problem ("MySql server has gone away") or some other timeout. However, as bfarber suggested above, the best it to run the script at home from command line, as it needed about 6hours in my case, for a database with sql dump size of about 250MB.

Link to comment
Share on other sites

[quote name='Valtasar' post='1784593' date='Feb 9 2009, 03:50 AM']Yesterday, after many unsuccessful attempts (reconnect attempts after MySQL server connection fell did not solve the problem), I managed to run to make the script successfully by simply modifying the "max_allowed_packet" server MySQL variable from 1M to 24M!
After this, I even set $limit=100 (100 update queries per cycle) and the script ran without problems.
I guess another way to do it is by running the query mysql_query("SET max_allowed_packet=24M"); at the start.

@media: it is not clear from your output, whether you had the same problem ("MySql server has gone away") or some other timeout. However, as bfarber suggested above, the best it to run the script at home from command line, as it needed about 6hours in my case, for a database with sql dump size of about 250MB.

Valtasar,
I do not have any problem with time out... Script just display the code that i posted previous message and stop runing (1-2 minutes)...
I tried to run again with mysql_query("SET max_allowed_packet=24M"); and same thing same code displayed on the scren.... :)

Link to comment
Share on other sites

[quote name='Kfir' post='1784752' date='Feb 10 2009, 03:48 PM']Sorry for my dumb question, but what is the plus in UTF-8?

The benefit of UTF-8 (which is a concrete representation of the more
abstract "Unicode" set of characters) is that it has vastly more
characters than ISO-8859-1 encoding.

The advantage of ISO-8859-1 encoding is that is enjoys slightly wider support than UTF-8.
If you are not going to use the extra characters offered by Unicode, then use the ISO-8859-1.

Link to comment
Share on other sites

[quote name='Jean' post='1784754' date='Feb 10 2009, 04:01 PM']The benefit of UTF-8 (which is a concrete representation of the more
abstract "Unicode" set of characters) is that it has vastly more
characters than ISO-8859-1 encoding.

The advantage of ISO-8859-1 encoding is that is enjoys slightly wider support than UTF-8.
If you are not going to use the extra characters offered by Unicode, then use the ISO-8859-1.

If I use the Hebrew language for my board, will it be recommended that I use UTF-8?

Link to comment
Share on other sites

  • 2 weeks later...

Why i am getting stuck at this point????



Any help?

Table inv_acp_help: fields to be converted: 4 primary keys: 1 inv_acp_help.page_key is assumed to contain iso-8859-9 characters inv_acp_help.help_title is assumed to contain iso-8859-9 characters inv_acp_help.help_body is assumed to contain iso-8859-9 characters inv_acp_help.help_mouseover is assumed to contain iso-8859-9 characters Converted: 50 rows Converted: 100 rows Converted: 150 rows Converted: 190 rows Converted: inv_acp_help Table inv_admin_login_logs: fields to be converted: 3 primary keys: 1 inv_admin_login_logs.admin_ip_address is assumed to contain iso-8859-9 characters inv_admin_login_logs.admin_username is assumed to contain iso-8859-9 characters inv_admin_login_logs.admin_post_details is assumed to contain iso-8859-9 characters Converted: 50 rows Converted: 100 rows Converted: 124 rows Converted: inv_admin_login_logs Table inv_admin_logs: fields to be converted: 4 primary keys: 1 inv_admin_logs.act is assumed to contain iso-8859-9 characters inv_admin_logs.code is assumed to contain iso-8859-9 characters inv_admin_logs.note is assumed to contain iso-8859-9 characters inv_admin_logs.ip_address is assumed to contain iso-8859-9 characters Converted: 50 rows Converted: 100 rows Converted: 150 rows Converted: 200 rows Converted: 250 rows Converted: 300 rows Converted: 340 rows Converted: inv_admin_logs Table inv_admin_permission_keys: Skipping field perm_key since it is a primary key Table inv_admin_permission_keys: fields to be converted: 3 primary keys: 1 inv_admin_permission_keys.perm_main is assumed to contain iso-8859-9 characters inv_admin_permission_keys.perm_child is assumed to contain iso-8859-9 characters inv_admin_permission_keys.perm_bit is assumed to contain iso-8859-9 characters Converted: 50 rows Converted: 100 rows Converted: 144 rows Converted: inv_admin_permission_keys Table inv_admin_permission_rows: fields to be converted: 1 primary keys: 1 Converted: inv_admin_permission_rows Table inv_admin_sessions: Skipping field session_id since it is a primary key Table inv_admin_sessions: fields to be converted: 4 primary keys: 1 inv_admin_sessions.session_ip_address is assumed to contain iso-8859-9 characters inv_admin_sessions.session_member_name is assumed to contain iso-8859-9 characters inv_admin_sessions.session_member_login_key is assumed to contain iso-8859-9 characters inv_admin_sessions.session_location is assumed to contain iso-8859-9 characters Converted: 1 rows Converted: inv_admin_sessions Table inv_announcements: fields to be converted: 3 primary keys: 1 inv_announcements.announce_title is assumed to contain iso-8859-9 characters inv_announcements.announce_post is assumed to contain iso-8859-9 characters inv_announcements.announce_forum is assumed to contain iso-8859-9 characters Converted: 4 rows Converted: inv_announcements Table inv_api_log: fields to be converted: 3 primary keys: 1 Converted: inv_api_log Table inv_api_users: fields to be converted: 4 primary keys: 1 Converted: inv_api_users Table inv_attachments: fields to be converted: 6 primary keys: 1 inv_attachments.attach_ext is assumed to contain iso-8859-9 characters inv_attachments.attach_file is assumed to contain iso-8859-9 characters inv_attachments.attach_location is assumed to contain iso-8859-9 characters inv_attachments.attach_thumb_location is assumed to contain iso-8859-9 characters inv_attachments.attach_post_key is assumed to contain iso-8859-9 characters inv_attachments.attach_rel_module is assumed to contain iso-8859-9 characters Converted: 4 rows Converted: inv_attachments Table inv_attachments_type: fields to be converted: 3 primary keys: 1 inv_attachments_type.atype_extension is assumed to contain iso-8859-9 characters inv_attachments_type.atype_mimetype is assumed to contain iso-8859-9 characters inv_attachments_type.atype_img is assumed to contain iso-8859-9 characters
Link to comment
Share on other sites

finally after 3 months of searching the web for solution .

finally i found good script.

DataBase Charset Converter v2.2 done by AL3NDALEEB :thumbsup:

i think this will help i did not test it yet becouse

i'm away from my computer soon i'll test it and provide feedback her .


it's only one file that will convert your Db even the large one becouse it's done 500 per cycle .

never try it on live forum

also when you export your DB via phpmyadmin export it with the same Charset you use in your forum .

if you use latin1 in your forum export your DB in latin1 form because this script will mainly depend on that

to 100% convert your DB into utf-8 without problems .

i hope this will help :whistle: .

c22.zip

Link to comment
Share on other sites

[quote name='aljareh4ever' post='1785984' date='Feb 19 2009, 08:12 PM']finally after 3 months of searching the web for solution .

finally i found good script.

DataBase Charset Converter v2.2 done by AL3NDALEEB :thumbsup:

i think this will help i did not test it yet becouse

i'm away from my computer soon i'll test it and provide feedback her .


it's only one file that will convert your Db even the large one becouse it's done 500 per cycle .

never try it on live forum

also when you export your DB via phpmyadmin export it with the same Charset you use in your forum .

if you use latin1 in your forum export your DB in latin1 form because this script will mainly depend on that

to 100% convert your DB into utf-8 without problems .

i hope this will help :whistle: .


I have just tried this...
It says it is completed but content of the database is still same.. Didn't convert anyything to UTF-8 (Content)

But it converted table and field and database collation to UTF-8....

:)

Link to comment
Share on other sites

  • 1 month later...

Hello,

I have finally got the database converted to UTF-8, but my problem is;

In PhpMyAdmin I can read my string right with utf-8, but on the site in IPB view all the strings from database are broken (I have already set charset in CP to utf-8 - Yes i have recreated skins in CP)

What would be the problem....???? (I see the browser is showing that site is using UTF-8 - Also I have cleared browser cache)

Do i need to change anything else in the script or ????? any suggestion or idea....

Thanks

Link to comment
Share on other sites

[quote name='bfarber' date='24 March 2009 - 07:07 AM' timestamp='1237900074' post='1792125']
You have to run a query after the mysql connection is established.

SET NAMES utf8

In IPB3 you can edit conf_global.php to tell it to do this - in 2.3 you need to modify the database driver to do this.


Thank you for your help, but i have no idea how to do those...

I am on 2.3 and would you please tell me how i can do (in 2.3 you need to modify the database driver to do this.> this???


Thanks :)

Link to comment
Share on other sites

You have to modify either class_db_mysql_client or class_db_mysqli_client (depending on which one your server uses) so that right after mysql_connect or mysqli_connect you run either mysql_query( "SET NAMES utf8", $this->connection_id ) OR mysqli_query( $this->connection_id, "SET NAMES utf8" ), depending on the file.

Link to comment
Share on other sites

  • Management

Just a note: I've added back the basic character conversion tables which only take up about 200k space, uncompressed. I will write a function to check your character set and test for local conversion methods before checking if you need to download more tables.

Link to comment
Share on other sites

[quote name='Matt' date='24 March 2009 - 10:45 AM' timestamp='1237913135' post='1792156']
Just a note: I've added back the basic character conversion tables which only take up about 200k space, uncompressed. I will write a function to check your character set and test for local conversion methods before checking if you need to download more tables.


Matt,

Sorry to ask you but, is this explanation is for me?

Are you saying that i do not have to convert my database to utf-8 for SEF links?

Link to comment
Share on other sites

[quote name='bfarber' date='24 March 2009 - 09:51 AM' timestamp='1237909861' post='1792149']
You have to modify either class_db_mysql_client or class_db_mysqli_client (depending on which one your server uses) so that right after mysql_connect or mysqli_connect you run either mysql_query( "SET NAMES utf8", $this->connection_id ) OR mysqli_query( $this->connection_id, "SET NAMES utf8" ), depending on the file.



Thanks,

This is the code that i have found, would you please show me how i can place that into this?



Thanks

   if ( IPS_MAIN_DB_CLASS_LEGACY )

         {

    $this->connection_id = @mysql_connect( $this->obj['sql_host'] ,

               $this->obj['sql_user'] ,

               $this->obj['sql_pass']

             );

   }

   else

   {

    $this->connection_id = @mysql_connect( $this->obj['sql_host'] ,

               $this->obj['sql_user'] ,

               $this->obj['sql_pass'] ,

               $this->obj['force_new_connection']

             );

   }
Link to comment
Share on other sites

  • Management

[quote name='media' date='24 March 2009 - 08:02 PM' timestamp='1237924938' post='1792226']

Matt,

Sorry to ask you but, is this explanation is for me?

Are you saying that i do not have to convert my database to utf-8 for SEF links?




You won't need too. Our DB is not UTF-8 and we're using them just fine.

Link to comment
Share on other sites

[quote name='Matt' date='25 March 2009 - 12:59 AM' timestamp='1237964397' post='1792338']

You won't need too. Our DB is not UTF-8 and we're using them just fine.



Oh that's excellent news... :) very happy to hear that....

Thanks

Link to comment
Share on other sites

[quote name='media' date='24 March 2009 - 02:16 PM' timestamp='1237925785' post='1792228']


Thanks,

This is the code that i have found, would you please show me how i can place that into this?



Thanks




Hi bfarber,

Just testing purposes, Would you please show me how i can edit this function with your recomendations



Thanks

 if ( IPS_MAIN_DB_CLASS_LEGACY )

{

$this->connection_id = @mysql_connect( $this->obj['sql_host'] ,

$this->obj['sql_user'] ,

$this->obj['sql_pass']

);

}

else

{

$this->connection_id = @mysql_connect( $this->obj['sql_host'] ,

$this->obj['sql_user'] ,

$this->obj['sql_pass'] ,

$this->obj['force_new_connection']

);

}

You have to modify either class_db_mysql_client or class_db_mysqli_client (depending on which one your server uses) so that right after mysql_connect or mysqli_connect you run either mysql_query( "SET NAMES utf8", $this->connection_id ) OR mysqli_query( $this->connection_id, "SET NAMES utf8" ), depending on the file.

Link to comment
Share on other sites

[quote name='bfarber' date='25 March 2009 - 02:55 PM' timestamp='1238014539' post='1792478']
Right after the last } in your code you posted you'd add

mysql_query( "SET NAMES utf8", $this->connection_id );


No still broken charecters...
I do not really get this why PHPMYADMIN is showing everything right and IPB is showin broken...

Link to comment
Share on other sites

I'm afraid you're simply missing something. ;) For instance, your site might be using the class_db_mysqli_client.php file - in which case the edit wouldn't do anything.

You really need to post in the peer to peer forums to get help with this sort of thing.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...