ptprog Posted February 17, 2021 Posted February 17, 2021 I just noticed that I have a bunch of posts where emojis where replaced with a question mark. I already double checked the database setup, and the correct encoding seems to be used everywhere. Moreover, in a few test posts I created, emojis were created fine. Has anybody else experienced the same issue? Any idea of what is the cause?
Stuart Silvester Posted February 17, 2021 Posted February 17, 2021 This can happen when you export a database with the wrong character set, i.e. if moving server etc. You would need to specify --default-character-set=utf8mb4 when running mysqldump. It is irreversible, you cannot restore the now corrupt emoji data. Linux-Is-Best 1
Linux-Is-Best Posted February 17, 2021 Posted February 17, 2021 (edited) Just for future reference. mysqldump --default-character-set=utf8mb4 -h [host] -u [username] -p [database_name] --set-gtid-purged=OFF --port=3306 --protocol=tcp --skip-triggers -r dump.sql It turns out, using > is not always a safe way of exporting. Using -r dump.sql instead of > dump.sql does the trick. Edited February 17, 2021 by Linux-Is-Best format Nathan Explosion 1
ptprog Posted February 17, 2021 Author Posted February 17, 2021 There was no move of database in the past years, and the last failure is on a post from two weeks ago. There was also no recent change on DB configs, so maybe some periodic task is causing this. I need to recover older backups to confirm this.
CoffeeCake Posted February 18, 2021 Posted February 18, 2021 5 hours ago, ptprog said: There was no move of database in the past years, and the last failure is on a post from two weeks ago. Do you fully control your database server? Are you certain your web host did not make a change, such as replace the underlying hardware?
ptprog Posted February 18, 2021 Author Posted February 18, 2021 4 hours ago, Paul E. said: Do you fully control your database server? Are you certain your web host did not make a change, such as replace the underlying hardware? Yes, I fully manage my server.
marklcfc Posted February 18, 2021 Posted February 18, 2021 (edited) 16 hours ago, Stuart Silvester said: This can happen when you export a database with the wrong character set, i.e. if moving server etc. You would need to specify --default-character-set=utf8mb4 when running mysqldump. It is irreversible, you cannot restore the now corrupt emoji data. If its set in the database as default, does it automatically recognise that? I've never used the query in mysqldumps before to include utf8mb4 Edited February 18, 2021 by marklcfc
bfarber Posted February 18, 2021 Posted February 18, 2021 If you run a dump then yes you should specify the charset when running MySQL dump.
Management Matt Posted February 18, 2021 Management Posted February 18, 2021 Thanks for posting! Unfortunately, this issue is beyond the scope of our technical support. 👩💻 Our technical support is happy to help you with the Invision Community platform, but we're unable to help with things like server management, theme questions and modifications. I've moved this to our Community Support area where other Invision Community owners will see it and help where they can.
CoffeeCake Posted February 18, 2021 Posted February 18, 2021 (edited) 6 hours ago, ptprog said: Yes, I fully manage my server. Ok, then we can rule out a backup/restore that didn't factor in utf8mb4, and that this is not something introduced from a backup/restore operation that stripped out emoji characters. The other possibility is that your community uses native emoji characters instead of the Twitter emojis. If you're doing this, and you're browsing on a device that doesn't have newer emojis available, you will see oddness. You can switch to the Twitter emojis and see if those posts begin displaying correctly for you (assuming the new characters have been adopted by Twitter and not in your native platform's supported fonts). See: https://blog.emojipedia.org/first-look-217-new-emojis-in-ios-14-5/ Edited February 18, 2021 by Paul E. ptprog 1
ptprog Posted February 19, 2021 Author Posted February 19, 2021 I'm indeed using the native ones. I never really looked into the difference between native and Twitter, but now I see how the Twitter ones provide more reliability. This is something I need to discuss internally, as it should prevent losses of data. I still don't think this totally explains the issue I'm seeing, though. I would expect that if somebody is able to type an emoji, then his OS should be able to send it to the server (we could have problems with quotes or C&P, but they should be uncommon).
CoffeeCake Posted February 19, 2021 Posted February 19, 2021 1 minute ago, ptprog said: I would expect that if somebody is able to type an emoji, then his OS should be able to send it to the server (we could have problems with quotes or C&P, but they should be uncommon). That's exactly what happens. It's your browser/OS not rendering them correctly. They will appear just fine to whoever typed them. It's usually iOS that has emojis before everyone else.
Nathan Explosion Posted February 19, 2021 Posted February 19, 2021 39 minutes ago, ptprog said: then his OS should be able to send it to the server The browser does send the emoji..an emoji is really just text though, it's not an image. The browser renders the image from the text...but if the OS/Browser doesn't know the emoji (which happens when apple release new ones as per Paul's comments) then you see the question marks. But someone else viewing the same post could see the image.
ptprog Posted February 20, 2021 Author Posted February 20, 2021 My problem is not with some browsers not showing emojis. My problem is with having a "?" char (0x3F) stored in the database instead of an emoji. Meanwhile I think I was able to trace the source of the issue based on the posts affected. I used a REGEXP_REPLACE query to remove some data from posts, and it seems the REGEXP_REPLACE does not handle some UTF-8 characters properly. Time to try to recover the data from a backup 🙂
Nathan Explosion Posted February 20, 2021 Posted February 20, 2021 (edited) 1 hour ago, ptprog said: My problem is not with some browsers not showing emojis. My problem is with having a "?" char (0x3F) stored in the database instead of an emoji. And how are you viewing the database? Does the OS and/or browser through which you are viewing it support that specific emoji? Just saying that sometimes the ? isn't always an indication of a problem with database. Based on your last paragraph, looks like you figured it out yourself but I am still making the above point in case others happen across this at a later point. Edited February 20, 2021 by Nathan Explosion
ptprog Posted February 21, 2021 Author Posted February 21, 2021 (edited) On 2/20/2021 at 10:07 AM, Nathan Explosion said: And how are you viewing the database? Does the OS and/or browser through which you are viewing it support that specific emoji? Just saying that sometimes the ? isn't always an indication of a problem with database. My terminal supports UTF-8 chars, and I could see emojis on the posts that still had the emojis. But just in case this was an issue with a specific emoji, I also used the MySQL HEX function to see the hexadecimal representation of the char stored and confirmed it was a plain "?". It looks like this bug is known for 4 years: https://jira.mariadb.org/browse/MDEV-11777 (but there is no mention to it in the function documentation). Time to consider going back to MySQL... Edited February 21, 2021 by ptprog CoffeeCake 1
Recommended Posts