Invision Community 4: SEO, prepare for v5 and dormant account notifications By Matt November 11, 2024
Luis Manson Posted October 26, 2013 Posted October 26, 2013 Hello, im playing with an utf9 convesion script and have found a bug somewhere in my setup The final problem is that if i set a topic title to "prueba áéíóú" its saved as: "prueba áé" the query executed is: UPDATE gatopics SET title='otro áé� óú vocalesd3',title_seo='otro-vocalesd3' WHERE tid=88970 to thest my server setup i setup a new test board and changed the title of the demo topic to something similar and it worked nicely, the query executed is: UPDATE utf8_topics SET title='Welcome áéíóú¡²³¤€¼½¾‘dsf',title_seo='welcome-%c3%a1%c3%a9%c3%ad%c3%b3%c3%ba%c2%a1%c2%b2%c2%b3%c2%a4%e2%82%ac%c2%bc%c2%bd%c2%be%e2%80%98dsf' WHERE tid=1 I think at somepoint in some migration i got everyringh talking latin1 while the database was specified utf8 or something along that line the schemas of the tables are: original: -- -- Table structure for table `utf8_topics` -- CREATE TABLE IF NOT EXISTS `utf8_topics` ( `tid` int(10) NOT NULL AUTO_INCREMENT, `title` varchar(250) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `state` varchar(8) COLLATE utf8_unicode_ci DEFAULT NULL, `posts` int(10) DEFAULT NULL, `starter_id` mediumint(8) NOT NULL DEFAULT '0', `start_date` int(10) DEFAULT NULL, `last_poster_id` mediumint(8) NOT NULL DEFAULT '0', `last_post` int(10) DEFAULT NULL, `starter_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `last_poster_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `poll_state` varchar(8) COLLATE utf8_unicode_ci DEFAULT NULL, `last_vote` int(10) DEFAULT NULL, `views` int(10) DEFAULT NULL, `forum_id` smallint(5) NOT NULL DEFAULT '0', `approved` tinyint(1) NOT NULL DEFAULT '0', `author_mode` tinyint(1) DEFAULT NULL, `pinned` tinyint(1) DEFAULT NULL, `moved_to` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL, `topic_hasattach` smallint(5) NOT NULL DEFAULT '0', `topic_firstpost` int(10) NOT NULL DEFAULT '0', `topic_queuedposts` int(10) NOT NULL DEFAULT '0', `topic_open_time` int(10) NOT NULL DEFAULT '0', `topic_close_time` int(10) NOT NULL DEFAULT '0', `topic_rating_total` smallint(5) unsigned NOT NULL DEFAULT '0', `topic_rating_hits` smallint(5) unsigned NOT NULL DEFAULT '0', `title_seo` varchar(250) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `seo_last_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `seo_first_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `topic_deleted_posts` int(10) NOT NULL DEFAULT '0', `tdelete_time` int(11) NOT NULL DEFAULT '0', `moved_on` int(11) NOT NULL DEFAULT '0', `topic_archive_status` int(1) NOT NULL DEFAULT '0', `last_real_post` int(10) NOT NULL DEFAULT '0', `topic_answered_pid` int(10) NOT NULL DEFAULT '0', PRIMARY KEY (`tid`), KEY `topic_firstpost` (`topic_firstpost`), KEY `last_post` (`forum_id`,`pinned`,`last_post`,`state`), KEY `forum_id` (`forum_id`,`pinned`,`approved`), KEY `starter_id` (`starter_id`,`forum_id`,`approved`,`start_date`), KEY `last_post_sorting` (`last_post`,`forum_id`), KEY `start_date` (`start_date`), KEY `last_x_topics` (`forum_id`,`approved`,`start_date`), KEY `approved` (`approved`,`tdelete_time`), KEY `moved_redirects` (`moved_on`,`moved_to`,`pinned`), KEY `topic_archive_status` (`topic_archive_status`,`forum_id`), KEY `last_poster_id` (`last_poster_id`), FULLTEXT KEY `title` (`title`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=2 ; after conversion: -- -- Table structure for table `gatopics` -- CREATE TABLE IF NOT EXISTS `gatopics` ( `tid` int(10) NOT NULL AUTO_INCREMENT, `title` varchar(250) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `description` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL, `state` varchar(8) COLLATE utf8_unicode_ci DEFAULT NULL, `posts` int(10) DEFAULT NULL, `starter_id` mediumint(8) NOT NULL DEFAULT '0', `start_date` int(10) DEFAULT NULL, `last_poster_id` mediumint(8) NOT NULL DEFAULT '0', `last_post` int(10) DEFAULT NULL, `icon_id` tinyint(2) DEFAULT NULL, `starter_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `last_poster_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `poll_state` varchar(8) COLLATE utf8_unicode_ci DEFAULT NULL, `last_vote` int(10) DEFAULT NULL, `views` int(10) DEFAULT NULL, `forum_id` smallint(5) NOT NULL DEFAULT '0', `approved` tinyint(1) NOT NULL DEFAULT '0', `author_mode` tinyint(1) DEFAULT NULL, `pinned` tinyint(1) DEFAULT NULL, `moved_to` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL, `topic_hasattach` smallint(5) NOT NULL DEFAULT '0', `topic_firstpost` int(10) NOT NULL DEFAULT '0', `topic_queuedposts` int(10) NOT NULL DEFAULT '0', `topic_open_time` int(10) NOT NULL DEFAULT '0', `topic_close_time` int(10) NOT NULL DEFAULT '0', `topic_rating_total` smallint(5) unsigned NOT NULL DEFAULT '0', `topic_rating_hits` smallint(5) unsigned NOT NULL DEFAULT '0', `title_seo` varchar(250) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `seo_last_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `seo_first_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `topic_deleted_posts` int(10) NOT NULL DEFAULT '0', `short_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `tweet_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `tweet` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `template_fields` text COLLATE utf8_unicode_ci, `tdelete_time` int(11) NOT NULL DEFAULT '0', `moved_on` int(11) NOT NULL DEFAULT '0', `last_real_post` int(10) NOT NULL DEFAULT '0', `topic_archive_status` int(1) NOT NULL DEFAULT '0', `topic_answered_pid` int(10) NOT NULL DEFAULT '0', PRIMARY KEY (`tid`), KEY `topic_firstpost` (`topic_firstpost`), KEY `last_post` (`forum_id`,`pinned`,`last_post`,`state`), KEY `forum_id` (`forum_id`,`pinned`,`approved`), KEY `last_post_sorting` (`last_post`,`forum_id`), KEY `start_date` (`start_date`), KEY `last_x_topics` (`forum_id`,`approved`,`start_date`), KEY `approved` (`approved`,`tdelete_time`), KEY `moved_redirects` (`moved_on`,`moved_to`,`pinned`), KEY `starter_id` (`starter_id`,`forum_id`,`approved`,`start_date`), KEY `topic_archive_status` (`topic_archive_status`,`forum_id`), KEY `last_poster_id` (`last_poster_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=88972 ; original: -- -- Table structure for table `gatopics` -- CREATE TABLE IF NOT EXISTS `gatopics` ( `tid` int(10) NOT NULL AUTO_INCREMENT, `title` varchar(250) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `description` varchar(250) COLLATE utf8_unicode_ci DEFAULT NULL, `state` varchar(8) COLLATE utf8_unicode_ci DEFAULT NULL, `posts` int(10) DEFAULT NULL, `starter_id` mediumint(8) NOT NULL DEFAULT '0', `start_date` int(10) DEFAULT NULL, `last_poster_id` mediumint(8) NOT NULL DEFAULT '0', `last_post` int(10) DEFAULT NULL, `icon_id` tinyint(2) DEFAULT NULL, `starter_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `last_poster_name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `poll_state` varchar(8) COLLATE utf8_unicode_ci DEFAULT NULL, `last_vote` int(10) DEFAULT NULL, `views` int(10) DEFAULT NULL, `forum_id` smallint(5) NOT NULL DEFAULT '0', `approved` tinyint(1) NOT NULL DEFAULT '0', `author_mode` tinyint(1) DEFAULT NULL, `pinned` tinyint(1) DEFAULT NULL, `moved_to` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL, `topic_hasattach` smallint(5) NOT NULL DEFAULT '0', `topic_firstpost` int(10) NOT NULL DEFAULT '0', `topic_queuedposts` int(10) NOT NULL DEFAULT '0', `topic_open_time` int(10) NOT NULL DEFAULT '0', `topic_close_time` int(10) NOT NULL DEFAULT '0', `topic_rating_total` smallint(5) unsigned NOT NULL DEFAULT '0', `topic_rating_hits` smallint(5) unsigned NOT NULL DEFAULT '0', `title_seo` varchar(250) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `seo_last_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `seo_first_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT '', `topic_deleted_posts` int(10) NOT NULL DEFAULT '0', `short_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `tweet_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `tweet` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `template_fields` text COLLATE utf8_unicode_ci, `tdelete_time` int(11) NOT NULL DEFAULT '0', `moved_on` int(11) NOT NULL DEFAULT '0', `last_real_post` int(10) NOT NULL DEFAULT '0', `topic_archive_status` int(1) NOT NULL DEFAULT '0', `topic_answered_pid` int(10) NOT NULL DEFAULT '0', PRIMARY KEY (`tid`), KEY `topic_firstpost` (`topic_firstpost`), KEY `last_post` (`forum_id`,`pinned`,`last_post`,`state`), KEY `forum_id` (`forum_id`,`pinned`,`approved`), KEY `last_post_sorting` (`last_post`,`forum_id`), KEY `start_date` (`start_date`), KEY `last_x_topics` (`forum_id`,`approved`,`start_date`), KEY `approved` (`approved`,`tdelete_time`), KEY `moved_redirects` (`moved_on`,`moved_to`,`pinned`), KEY `starter_id` (`starter_id`,`forum_id`,`approved`,`start_date`), KEY `topic_archive_status` (`topic_archive_status`,`forum_id`), KEY `last_poster_id` (`last_poster_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=88970 ;
Luis Manson Posted October 26, 2013 Author Posted October 26, 2013 i think that all data in MySQL is latin1, while all collations are UTF i say this because: all schemas show utf8_general_ci if i set sql_charset in conf_global.php as latin1 everything works if i set it to UTF i have a database error
Luis Manson Posted October 26, 2013 Author Posted October 26, 2013 made a new test, restored original database made a mysqldump with latin1 as codepage=latin1 and imported as utf, now i can use UTF8 in conf_global yet áéíóú in topic titles (i wonder if there is any misbehaving filter, since the cut is always in the accented I
Luis Manson Posted October 30, 2013 Author Posted October 30, 2013 weird, alfter a LOT, and i mean more than 20 dumps, restores, greps and thinkgs like that i always endup with the same problems, the í character in post title makes a mess not sure why this is happening, but im pretty much sure everything is in utf8, but found this in the logs: INSERT INTO gatopics (`title`,`title_seo`,`state`,`posts`,`starter_id`,`starter_name`,`seo_first_name`,`start_date`,`last_poster_id`,`last_poster_name`,`seo_last_name`,`last_post`,`author_mode`,`poll_state`,`last_vote`,`views`,`forum_id`,`approved`,`topic_archive_status`,`pinned`,`topic_open_time`,`topic_close_time`) VALUES('otro áé<C3> óú vocales','otro-vocales','open',0,1,'Luis Manson','luis-manson',1383171922,1,'Luis Manson','luis-manson',1383171922,1,0,0,0,12,1,0,0,0,0) ============================================================================== ============================================================================== Time Taken: 0.00158 INSERT INTO gaposts (`author_id`,`use_sig`,`use_emo`,`ip_address`,`post_date`,`post`,`author_name`,`topic_id`,`queued`,`post_htmlstate`,`post_bwoptions`,`post_key`,`new_topic`) VALUES(1,1,1,'190.229.88.203',1383171922,'<p>otro áéíóú vocales</p>n<p>äåé®þüúíóö</p>n<p>qwertyuiop</p>n','Luis Manson',89176,0,0,0,'977dba96f133077e2fd5f23842871690',1) as you can see IPB inserts into the topic table "otro áé<C3> óú vocales" while it should be "otro áéíóú vocales"
PeterUK Posted November 3, 2013 Posted November 3, 2013 It's probably not trying to actually insert C3, it just happens that the byte structure of the character í is C3AD, so it's almost like it's losing the last byte or something which is pretty crazy. I presume you got that query from a log file, in which case the fact the log file has a character corrupted in that way is a bit worrying. Where did it come from? Is it a IPB log or your MySQL slow query log? That question seems important to me because if it's an IPB log, then that means the query seems to be corrupted before it hits the DB, but if it's the MySQL slow log then perhaps it is being corrupted on the way there. In the other topic, you said that utf8 is your default character set, is it definitely? You could run this MySQL query and you should see something along these lines: mysql> SHOW VARIABLES WHERE Variable_Name LIKE 'character_set%' OR Variable_Name LIKE 'collation_%'; +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | utf8 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | utf8 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | | collation_connection | utf8_unicode_ci | | collation_database | utf8_unicode_ci | | collation_server | utf8_unicode_ci | +--------------------------+----------------------------+ 11 rows in set (0.00 sec)
Luis Manson Posted November 3, 2013 Author Posted November 3, 2013 Hi PeterUK, i have suck a big mess that i think i will end up paying for someone to fix this i posted also another topic, in ipb support forums because i got that query from the IPB log file, and in one of my tests i had disabled all hooks and apps, thats why im confused in what kind of problem i really have in my production server i have this: +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | | collation_connection | utf8_general_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +--------------------------+----------------------------+ and the one where im testing: +--------------------------+----------------------------+ | Variable_name | Value | +--------------------------+----------------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ | | collation_connection | utf8_general_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +--------------------------+----------------------------+
PeterUK Posted November 4, 2013 Posted November 4, 2013 So you do have a mix between character sets and collations within your server. I do not know enough in depth about MySQL configuration to know how having a differing character_set_database/server from the character_set_client/connection will cause problems, but I guess it probably isn't great. You can modify your MySQL configuration to make all of those utf8 which may help with some of your issues of queries seemingly being corrupted. If you want to do that, you could add the following to your my.cnf file under [mysqld]: skip-character-set-client-handshake collation-server=utf8_unicode_ci character-set-server=utf8
Luis Manson Posted November 4, 2013 Author Posted November 4, 2013 i had that and did not work, i also took a dump and replaced all latin1 for utf8 and made no difference, but i will give it a try again By any chance do you have an idea how to correctly identify this issue? Im not quite sure yet
Luis Manson Posted November 4, 2013 Author Posted November 4, 2013 this is TOO weird, i came up to one problem (for now) i can NEVER save a thread with an í on its title, it gets cut (and saved) right there, there is no problem on post content. Also, it only works if i use the "latin1" database
Luis Manson Posted November 4, 2013 Author Posted November 4, 2013 me again: Foudnd this query: SELECT title, HEX( title ) , LENGTH( title ) , CHAR_LENGTH( title ) FROM gatopics ORDER BY `tid` DESC LIMIT 0 , 5 the hex shows UTF encoding: title HEX( title ) LENGTH( title ) CHAR_LENGTH( title ) otro áéÝ 6F74726F20C3A1C3A9C39D 11 8 still dont know whats the thing with the í
PeterUK Posted November 4, 2013 Posted November 4, 2013 I would be happy to take a look around your test server for you if you wish to provide me details in PM, and you don't mind me messing with configs or restarting it etc.. Do you have a decent copy of your database, ie. one with no encoding issues with characters but maybe the problem with the mix of character sets? If so maybe you could send me a dump of that and some examples of where characters are wrong and I could have a look.
Luis Manson Posted November 5, 2013 Author Posted November 5, 2013 I would be happy to take a look around your test server for you if you wish to provide me details in PM, and you don't mind me messing with configs or restarting it etc.. Do you have a decent copy of your database, ie. one with no encoding issues with characters but maybe the problem with the mix of character sets? If so maybe you could send me a dump of that and some examples of where characters are wrong and I could have a look. thanks Peter, i sent you a PM
PeterUK Posted November 8, 2013 Posted November 8, 2013 For the record, we fixed this problem, it was related to this bug: http://community.invisionpower.com/resources/bugs.html/_/ip-board/remove-chr0xca-from-input-r43247 We also fixed the issue with mixed character sets causing potential conversion issues.
Recommended Posts
Archived
This topic is now archived and is closed to further replies.