Jump to content

The problem with the Cyrillic


Ph-A

Recommended Posts

Posted

IP.Board 3.x have problems with the Russian letters "ш" (code 0xD188) and "И" (code 0xD098). The mapping of these letters, after saving messages in the database banging.
The problem occurs only on Russian hosting.

As is known, for the correct functioning of the forum, it is necessary database encoding UTF-8.
When installing a forum, creating a database and tables in utf8_unicode_ci and setting the locale Forum ru_RU.UTF-8 will still get beaten letters.

Peculiarities of Russia's hosting of the fact that our hosters in the file hard-coded windows-1251 encoding in my.cnf.

A typical configuration:

[client]

default-character-set=cp1251


[mysqld]

default-character-set=cp1251

default-collation=cp1251_general_ci

init-connect="SET NAMES cp1251"

skip-character-set-client-handshake

If we run the query

SHOW VARIABLES LIKE 'character_set%'

we obtain the result:

character_set_client utf8

character_set_connection utf8

character_set_database utf8

character_set_filesystem binary

character_set_results utf8

character_set_server cp1251

character_set_system utf8

character_sets_dir /usr/share/mysql/charsets/



Community virtual hosting, do not have access to edit the my.cnf. But character_set_server cp1251 not allow the forum to work correctly in Unicode Cyrillic.

It is possible to make a correct operation of distribution kit regardless of the settings hosting?
I know nothing of the number of failures in the application of IP.Board, as the forum.
Russia localization of ibresource fix these problems. But what about those who buy a license directly from ips?

Posted

If ibresource has fixed these problems, you might ask them to relay to us what they did to overcome the problems you're referring to. As I'm sure you can appreciate, we're not intimately familiar with the peculiarities of Russian hosting setups, so I'm not really sure what the problem is just from your description.

Posted

I can not update my forum because of this problem.
In order to apply the decision of ibresource, I must have to once again buy a license, but now they have.
And updates from ips will not be possible in the future.

Posted

Ph-A, I'm a proud owner of Russian IP.Board script. They made two subject-related changes in scripts. In file /ips_kernel/classDbMysqlClient.php and in /ips_kernel/classDbMysqliClient.php they extended this block of code:

	

     	//-----------------------------------------

     	// If there's a charset set, run it

     	//-----------------------------------------


     	if( $this->obj['sql_charset'] )

     	{

     		$this->query( "SET NAMES '{$this->obj['sql_charset']}'" );

     	}

With this lines for mysqli:


     	//-----------------------------------------

     	// If there's a charset set, run it

     	//-----------------------------------------


     	if( $this->obj['sql_charset'] )

     	{

     		$this->query( "SET NAMES '{$this->obj['sql_charset']}'" );

			$this->query( "SET CHARACTER SET '{$this->obj['sql_charset']}'");

			$this->query( "SET character_set_connection = " .$this->obj['sql_charset']);


			$res = mysqli_query($this->connection_id, "SHOW CHARSET LIKE '" . $this->obj['sql_charset']  .  "'" );


			$charset = mysqli_fetch_row($res);


			$this->query( "SET collation_connection = " . $charset[2] );

     	}

And this lines for mysql:


     	//-----------------------------------------

     	// If there's a charset set, run it

     	//-----------------------------------------


     	if( $this->obj['sql_charset'] )

     	{

     		$this->query( "SET NAMES '{$this->obj['sql_charset']}'" );

			$this->query( "SET CHARACTER SET '{$this->obj['sql_charset']}'");

			$this->query( "SET character_set_connection = " .$this->obj['sql_charset']);


			$res = mysql_query($this->connection_id, "SHOW CHARSET LIKE '" . $this->obj['sql_charset']  .  "'" );


			$charset = mysql_fetch_row($res);


			$this->query( "SET collation_connection = " . $charset[2] );

     	}

Posted

Buy the way this code contains a possible error - on some servers the query "SHOW CHARSET LIKE.." will return nothing and then the calling of $charset[2] will give you an error message on top of all forum pages. That's the Russian way of coding :D

The better way is to use this line:

                   if($charset = mysqli_fetch_row($res)) $this->query( "SET collation_connection = " . $charset[2] );

Instead of this two:

                        $charset = mysqli_fetch_row($res);


                        $this->query( "SET collation_connection = " . $charset[2] );



And the same for lassDbMysqlClient.php.

Posted

If ibresource has fixed these problems, you might ask them to relay to us what they did to overcome the problems you're referring to. As I'm sure you can appreciate, we're not intimately familiar with the peculiarities of Russian hosting setups, so I'm not really sure what the problem is just from your description.



ibresource -- Owners of the Russian version will receive its automatically. There is a desire to use original distribution kit, you have to ask the decision in client area IPS Resources.
Posted

Ph-A, I'm [s]a proud[/s] owner of Russian IP.Board script. They made two subject-related changes in scripts.


I am the owner of the Russian version of the license too. And I
Posted

The problem with letters on a virtual hosting remains.


There is only two places where the data loss can be: the mysql connection collation and the option to "remove chr(0xCA) from input" ;)

The second one is completely described here: http://forums.ibresource.ru/index.php?showtopic=51483

This is a best part of all topic:

тупые американцы удаляют свои невидимые пробелы, которых в нашей таблице кодировок нет! вместо них буквы К и р


сколько можно повторять



"К" and "р" is for cp-1251, with utf-8 we have a combinations of chars, probably including "ш" and "И" too. Just set this option to "off" and check the forum ;)
Posted

I've updated mine without any problems. Please, feel free to contact me, тем более что мы вроде как давно знакомы ;)

Posted

We are not aware of any specific bugs having to do with Russian letters. If there ARE issues, someone will need to collect the details and submit a bug report, or submit a ticket in the client area so that we can investigate.

We can't fix a problem we don't have the details of I'm afraid. You guys are talking like this is some issue that's long been communicated to us and we should have fixed it by now, but honestly I've never heard of random Russian characters disappearing before this thread was opened this afternoon.

ш
И

As you can see, the characters post fine - it's not an IPB issue here. Thus, we need more information as to the cause of the problem. Hence why I suggested having ibresource contact us (as I would presume they are familiar with the problem and the solution they have implemented).

Posted

This issue has been discussed several times in summer in the traсker, but has not been resolved.
About a week I'll write tikket with access to my hosting. It will be possible to conduct experiments there.

Thank you for your time.

Posted

There is only two places where the data loss can be: the mysql connection collation and the option to "remove chr(0xCA) from input" ;)



This doesn't help to thoroughly fix the problem. There are several hosters, where the glitch remains. E.g. (those which I've tested) Zenon and Sweb.
Today I purposely installed the test forum with corrected files classDbMysqlClient.php and classDbMysqliClient.php, but the problem remains.
I can give access to the forum, ftp and phpMyAdmin.
  • 3 weeks later...
Posted


$this->query( "SET NAMES '{$this->obj['sql_charset']}'" );

$this->query( "SET CHARACTER SET '{$this->obj['sql_charset']}'");

$this->query( "SET character_set_connection = " .$this->obj['sql_charset']);


$res = mysql_query($this->connection_id, "SHOW CHARSET LIKE '" . $this->obj['sql_charset']  .  "'" );


$charset = mysql_fetch_row($res);


$this->query( "SET collation_connection = " . $charset[2] );

See http://dev.mysql.com/doc/refman/5.1/en/charset-connection.html (or old versions)

A SET NAMES 'x' statement is equivalent to these three statements:

SET character_set_client = x;

SET character_set_results = x;

SET character_set_connection = x;

Setting each of these character set variables also sets its corresponding collation variable to the default correlation for the character set.

And:

A SET CHARACTER SET x statement is equivalent to these three statements:

SET character_set_client = x;

SET character_set_results = x;

SET collation_connection = @@collation_database;

Setting collation_connection also sets character_set_connection to the character set associated with the collation (equivalent to executing SET character_set_connection = @@character_set_database). It is not necessary to set character_set_connection explicitly.

Thus You can reduce this code to:


$this->query("SET NAMES '{$this->obj['sql_charset']}'");

$this->query("SET CHARACTER SET '{$this->obj['sql_charset']}'");                 

  • 6 months later...
  • 2 months later...
Posted

BTW it took me some time to find it - in IPB 3.0.x and newer you can just add this to the global_config.php


$INFO['sql_charset'] = 'utf8';



Илья - спасибо огромное за решение!
Posted

We are not aware of any specific bugs having to do with Russian letters. If there ARE issues, someone will need to collect the details and submit a bug report, or submit a ticket in the client area so that we can investigate.


Such bugs are always appear when there is recoding between UTF-8 and one of the legacy cyrillic encodings (windows-1251, koi-8 and so on). There can be different encodings in: database store, database collation, database output, PHP output, client input, AJAX processing. If at leas one of them uses UTF-8 while another one uses legacy encoding — there is always problem because utf-legacy recoding process is full of bugs. Some people have problems with ш and И, some have problem with Russian capital 'К' which is gust disappears in messages, some have a mess of question marks and special characters instead of the text. Things get even much worse if someone uses special characters of the extended table of cp-1251 encoding: I never saw correct recode of them on the web. This is the nightmare for programmers.

There were problems with UTF-8 when most people used win-1251 and AJAX has come. AJAX uses only UTF-8 while all data was stored in win-1251. Another problems were when people migrated from MySQL 3 to 4: MySQL developers changed work with different encodings dramatically. Now there are problems when we migrate from legacy encodings to UTF-8 — correct recording MySQL database from windows-1251 to utf-8 is a kind of magic. Another one problem: there is a special convert_cyr_string PHP-function (thanks to Russian PHP-core developers) which supports recoding from/to: koi8-r, windows-1251, iso8859-5, x-cp866 and x-mac-cyrillic. But it does not support utf-8!

The only way to solve encodings problem totally is to set utf-8 absolutely everywhere: as the MySQL default encoding, as the database, each table and each field encoding, in MySQL collation, in PHP sources and in HTML pages.

And by the way: there is a special instruction from IBR how to migrate from windows-1251-based IPB 2 to UTF-8-based IPB 3. There are some screenshots at the end:

wtf.png?w=&h=&cache=cache

The first question sign on the top-left is on the place of Russian И. The other ones are on the place of Russian ш. And there is the text below:

There can be two reasons: 1. You forgot to edit conf_global.php as instructed in step 1. Try to edit and if it doesn't help try to carefully repeat update.



2. You have converted to utf-8 the database which is in utf-8 already. Start updating from step 2 and do not convert DB, just change the SQL queries.



This is a very old manual appeared just when IPB 3 went out. So the only thing people should do is to carefully read manuals. :)
Posted

Yes, the items you describe are almost entirely out of our control. It is important to use character sets appropriate for your site. For instance, if you use windows-1251 that's fine, but you must use it everywhere. Same with UTF-8.

Posted

if you use windows-1251 that's fine


Not really. You can enter some western european characters with diactrics and they will be shown fine (usually stored as HTML entities) untill you will try to quick edit this post using AJAX. You will loose these characters, because they will be converted into cyrillic ones because cyrillic characters in windows-1251 use the places of western diactrics characters (using the same Extended ASCII Codes 127-255). UTF-8 has 65 536 codes for characters and does not have such problem at all: you can mix latin, cyrillic, chinese in one post without any problems.
Posted

Not really. You can enter some western european characters with diactrics and they will be shown fine (usually stored as HTML entities) untill you will try to quick edit this post using AJAX. You will loose these characters, because they will be converted into cyrillic ones because cyrillic characters in windows-1251 use the places of western diactrics characters (using the same Extended ASCII Codes 127-255). UTF-8 has 65 536 codes for characters and does not have such problem at all: you can mix latin, cyrillic, chinese in one post without any problems.




We have a setting in the ACP to disable AJAX functions which can be used in this case (the setting is there solely for character sets that cannot easily translate between UTF-8 and the charset in use, or for when sites do not wish for the content to be converted to HTML entities). This should be a suitable workaround for the issue, and is built into the software.

Nevertheless, we cannot do much about irregularities in individual character sets out there. We recommend using UTF-8 if possible, and this is the default character set on IP.Board 3 and above.
Posted

We recommend using UTF-8 if possible,


So do I. :) And IBR said anyone who want to upgrade to IP.Board 3 have to convert anything to utf-8. Just to avoid any troubles with encodings. I'm happy all these problems with different cyrillic encodings were finally ended!
  • 1 month later...
Posted

I would suggest make "UTF8" default char set and collation at the Tables level for Invision Power Board as it is for Word Press and other CMS engines.
It is "modern" and stable approach. The issue from this topic appeared at my Board when I've backuped it and restored (Win1251 is used for the DB at the present moment).

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...