Sonya* Posted May 12, 2013 Posted May 12, 2013 I have suggested to make custom replacements for characters depending on the language and admin preferences. Either as admin settings in ACP where replacements sign by sign can be added manually, or at least as Data Hook that will allow to hook into FURLs. It takes time to figure out whether this feature is globally desired. We are a minority here with non-English boards ;) English boards do not suffer from special characters.
Axel Wers Posted May 12, 2013 Author Posted May 12, 2013 I am doing some testing (via mobile), it seems 3.4.5 fortunately resolved something but it needs further investigation.
Mark Posted May 12, 2013 Posted May 12, 2013 What problems do you mean? URLs can support UTF-8 characters... even Emoji works if you're on an OS that supports them.
Sonya* Posted May 12, 2013 Posted May 12, 2013 Mark, it is not about support, it is about how browsers add these URLs into clipboard. They encode them making long messy and spam looking urls, like these wikipedia URLs: http://de.wikipedia.org/wiki/Universit%C3%A4t_Z%C3%BCrich http://ru.wikipedia.org/wiki/%D0%A6%D1%8E%D1%80%D0%B8%D1%85%D1%81%D0%BA%D0%B8%D0%B9_%D1%83%D0%BD%D0%B8%D0%B2%D0%B5%D1%80%D1%81%D0%B8%D1%82%D0%B5%D1%82 http://el.wikipedia.org/wiki/%CE%A0%CE%B1%CE%BD%CE%B5%CF%80%CE%B9%CF%83%CF%84%CE%AE%CE%BC%CE%B9%CE%BF_%CF%84%CE%B7%CF%82_%CE%96%CF%85%CF%81%CE%AF%CF%87%CE%B7%CF%82 http://he.wikipedia.org/wiki/%D7%90%D7%95%D7%A0%D7%99%D7%91%D7%A8%D7%A1%D7%99%D7%98%D7%AA_%D7%A6%D7%99%D7%A8%D7%99%D7%9A They might work correct, but if somebody copies link to my board from the browser address bar and adds it somewhere else, I do not want these beautiful codes in my backlinks :D That is the reason why I substitute any non-ASCI characters in URLs. Just netiquette and common way to do it in non-English boards.
Asch Posted May 12, 2013 Posted May 12, 2013 im all for anything that makes the url to links look clean and simple
Mark Posted May 13, 2013 Posted May 13, 2013 That shouldn't happen... Do you get the same problem copying links from, say, Wikipedia: http://en.wikipedia.org/wiki/Śūnyatā ? Is your community using the UTF-8 character set?
Sonya* Posted May 13, 2013 Posted May 13, 2013 Do you get the same problem copying links from, say, Wikipedia: http://en.wikipedia.org/wiki/Śūnyatā ? Yes. Any link with special characters in it, not only wikipedia. Is your community using the UTF-8 character set? Yes. However, it does not deal with my the specific community but with browsers. If you use non-English OS and copy-paste the url from the address bar, you'll see that your clipboard has the copied URL procent escaped. This is nothing that you can change in IPS. It is just current standard, RFC 3986: The generic URI syntax mandates that new URI schemes that provide for the representation of character data in a URI must, in effect, represent characters from the unreserved set without translation, and should convert all other characters to bytes according to UTF-8, and then percent-encode those values. And there are not only browsers that percent-encode UTF-8 URLs. When I would use special characters in URLs, my server logs would look like this. GET /dir1/%D0%A6%D1%8E%D1%80%D0%B8%D1%85%D1%81%D0%BA%D0%B8%D0%B9_%D1%83%D0%BD%D0%B8%D0%B2%D0%B5%D1%80%D1%81%D0%B8%D1%82%D0%B5%D1%82.html HTTP/1.1 User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.5a) Gecko/20030728 Mozilla Firebird/0.6.1 … Just impossible for visual analysis without any additional tool. When I would like to block specific URLs in robots.txt or add a new rule to mod_rewrite, then I have to use the same long string. Means that they are just not readable by human. Another problem is netiquette. Do you have special characters like ü, ö, ä, ß on your keyboard? Do you have cyrillic letters on your keyboard? Chinese, Arabic? Can you type (not copy paste, but type!) the ulrs below if I would tell the address to you by phone? http://www.mysite.com/grüße http://www.mysite.com/привет You cannot type them yourself. You can only copy and paste those URLs. That's why I transliterate all URLs to ASCII only on my projects. ;)
Axel Wers Posted May 13, 2013 Author Posted May 13, 2013 What problems do you mean? URLs can support UTF-8 characters... even Emoji works if you're on an OS that supports them. I tested my board (3.4.3) on Android. Browser in mobile couldn't handle with FURLs with special characters (I got error messages: too many redirects). But I think new version 3.4.5 fixed this (tested only on test board). Worser thing is in Google Webmaster Tools: Links with special characters I see like this: " alt="2vud7cl.jpg"> It's almost impossible to analyze it, what the hell is that for link? The same I see in server error logs.
Sonya* Posted May 13, 2013 Posted May 13, 2013 I have the same issue with Viglink in their stats. They cannot decode links back to UTF-8 representation. Looks like Google Webmaster. I have reported it to Viglink support in November 2012 and there is still no progress. I know that IPS does not have anything to do with Viglink stats. This is just an example of how other try to handle such URLs and fail.
Sonya* Posted May 13, 2013 Posted May 13, 2013 What problems do you mean? URLs can support UTF-8 characters... even Emoji works if you're on an OS that supports them. The link in your post brings me here http://xn--ls8h.la/ Is it WAI or should the link show that something works? :unsure:
Mark Posted May 13, 2013 Posted May 13, 2013 There are 4 main reasons why we do what we do: Transliteration is difficult to do universally. Although, for example, it's quite common for ß in German to be transliterated to ss, for some characters in some languages it's not appropriate to transliterate, and in many cases the same character would be translated to different values depending on the language. For example, we've had bug reports saying æ should be "ae", so we changed it, then another bug report said it should be "e", so we changed it again, then we had a third bug report saying it should be "ae" again. Languages which don't use latin characters at all (any east-Asian language) simply cannot be transliterated. So you end up with all your URLs being "/topic/123-/" which no friendly URL element. It is the proper way to do things. As you yourself point out, the RFC which defines the standard for URIs says that characters may be UTF-8 encoded - it does not suggest transliteration. It's good for SEO (otherwise you sort of defeat the point of friendly URLs).
Sonya* Posted May 13, 2013 Posted May 13, 2013 Mark, sorry but what you wrote is entirely false. Transliteration is difficult to do universally. No. It is simple, just two dimensional array like I have suggested it here I use it for years without any issues or impact on any IPS functions. Although, for example, it's quite common for ß in German to be transliterated to ss, for some characters in some languages it's not appropriate to transliterate, and in many cases the same character would be translated to different values depending on the language. This is the reason why I have asked for Data hook so that everybody can do substitution individually in a hook. In other software it is just a part of localization, where such arrays are different per language. Languages which don't use latin characters at all (any east-Asian language) simply cannot be transliterated. False. There are ISO standards for transliteration for every language http://en.wikipedia.org/wiki/List_of_ISO_transliterations It is the proper way to do things. It is an IPS way to do it but is not proper in the global world where English is not the only language people speak ;)
Axel Wers Posted May 27, 2013 Author Posted May 27, 2013 There is another problem, big problem. I wondered why in last time traffic systematically decreases on my board. It seems Google doesn't like FURLs with special characters, those FURL are excluded from main results I have about 130 categories on board. Before upgrade to 3.4.3 ALL were in results, but AFTER upgrade only about 30. Others are excluded in this way Similar with topics. I have about 3200 topics on board. In search results about 30. All topics with special characters disappeared from SERP. I still running under 3.4.3 version. Was improved something with FURLs with special characters in 3.4.5? This is important for me, because your experiment with special characters ruined my board. So get rid of those damned spec characters from FURL.
media Posted May 27, 2013 Posted May 27, 2013 I opened a ticket long time ago about this: #851800 and got this reply from support: Unfortunately this isn't possible without changing your site's charset to be something other than UTF-8, and that isn't something which I would advise. Romanization features included within IP.Board are only supported for non-UTF-8 sites, as you are already aware. I'm sorry but it doesn't look like there's anything we can do to 'fix' this as the software is Working As Intended. Please let me know if you have any further questions. Thank you,
Sonya* Posted May 27, 2013 Posted May 27, 2013 Axel, this *can* deal with canonical url. In you canonical you have special characters, but if Google does not like special characters it cannot read canonical properly and cannot match? Just a guess... I cannot reproduce the issue on my boards as I use patch to transliterate special characters in URLs.
media Posted May 27, 2013 Posted May 27, 2013 as I use patch to eliminate special characters from URL. How?
Sonya* Posted May 27, 2013 Posted May 27, 2013 Another thing that can cause it that Google requires sitemap to be encoded http://support.google.com/webmasters/bin/answer.py?hl=en&answer=35653 I am not sure how IPS default sitemap looks like when special characters are used in URL.
Sonya* Posted May 27, 2013 Posted May 27, 2013 How? Link to the solution above > This way I can transliterate and replace any sign I do not like. It does not have any impact on latin URLs, it deals only with special characters additionally. It works for every url generated in IPS, for all IPS and custom apps that uses furlTemplates.php, except of tags (tags are made another way in IPS, saved raw and not translated in the database and that's why it would not work for tags).
Axel Wers Posted May 27, 2013 Author Posted May 27, 2013 Problem is other. Basically there is not problem with indexing FURLs, FURLs are indexed very good, this is an example from my board: " alt="2zyhok2.jpg"> Problem is, Google won't show FURLs with special characters in SERP When I type in Google site:forum.freespace.sk/tema/ I get only 3 pages of results, what is about 30 topics. Remaining topics are hidden as very similar. So this is not any working as intended, this is issue ruining my board is Google SERP.
Sonya* Posted May 27, 2013 Posted May 27, 2013 If Google thinks it is duplicate content, then look into HTML improvements section in Google Webmaster https://support.google.com/webmasters/bin/answer.py?hl=en&answer=80407 Any data there?
Axel Wers Posted May 27, 2013 Author Posted May 27, 2013 Nope. There is not any alert about this. And I know another two boards with the same exact issue. And THIS board has the same issue. Try to put to Google search site:community.invisionpower.com/topic/ But I *think* this is problem rather with Google search than IP.Board.
media Posted May 28, 2013 Posted May 28, 2013 Link to the solution above > This way I can transliterate and replace any sign I do not like. It does not have any impact on latin URLs, it deals only with special characters additionally. It works for every url generated in IPS, for all IPS and custom apps that uses furlTemplates.php, except of tags (tags are made another way in IPS, saved raw and not translated in the database and that's why it would not work for tags). Sonya, Would you please make a step by step instruction on that modification that I can follow and get my board ready too????? Thanks
Sonya* Posted May 28, 2013 Posted May 28, 2013 Would you please make a step by step instruction on that modification http://community.invisionpower.com/topic/387062-guide-to-replace-special-signs-in-urls/ ;)
media Posted May 28, 2013 Posted May 28, 2013 http://community.invisionpower.com/topic/387062-guide-to-replace-special-signs-in-urls/ ;) Sonya you are the best... Thank you
Recommended Posts
Archived
This topic is now archived and is closed to further replies.