Jump to content

More natural plurals in non Roman languages


KVentz

Recommended Posts

Most Roman languages (and English too) has only two forms of nouns: one singular and one plural (usually ends with -s). This is very easy to develop programs in these languages using simple logic (if < 2 use singular else use plural). And there are only two forms of nouns in lang files.

But things get much worse if we are trying to translate the software into other languages like Russian and maybe some others (maybe IP.Board international customers help me). We have 3 forms of nouns in Russian:

  • one is for singular and any numbers ended with 1 exept 11 (1 файл (file), 21 файл (file), 31 файл (file) and so on),
  • one for plural for any numbers ended with 2, 3 and 4 exept 12, 13 and 14 (2 файла (files), 3 файла (files), 4 файла (files), 22 файла (files), 23 файла (files), 24 файла (files), 32 файла (files) and so on)
  • and another one for any other numers: ended with 5-0 and 11-19 (5 файлов (files), 6 файлов (files), 11 файлов (files), 25 файлов (files) and so on).


Now, the two forms language model is just untranslatable into three forms language model because there is no nor language strings for three forms, neither the logic which can select appropriate string. And IBR said they can not do anything with that. Since most applications are in English made by English speaking developers no one pay attention on this. And translators have only three options: leave the sentences grammatically wrong (like writing 'You have 1 messages' or 'There is 5 post in the topic') or to make sentences computer generated view: 'Messages: 5', 'You have comments: 1' or to write all forms at once: '5 файл (-а, -ов)'
Link to comment
Share on other sites

Have any suggestions? I'm really not sure how the software can know and understand this, realistically. Even in English we have trouble (MOST areas do not support dual singular/plural nouns, and we actually have a bug report about this open (ACP fails at grammar, shows "1 settings")).

Link to comment
Share on other sites

Have any suggestions? I'm really not sure how the software can know and understand this, realistically.


I just know nothing about grammar in different languages besides Russian, English and French. But I think the idea is in special methods which can analyze the number near the noun and maybe some information about this noun (if its kind influences the form) and then select appropriate language string from lang file. You only need to write a special grammar class once and connect some new logic when adding new language.

For English we need two forms in lang file:

$foo = array ('singular' => 'foo', 'plural' => 'foos');

And this logic:

if ($num < 2)

{

	$foo['singular'];

}

else

{

	$foo['plural'];

}

For Russian:

$foo = array ('singular' => 'foo', 'plural_1' => 'fooa', 'plural_2' => 'fooov');

if (substr ($num, -1) == 1 && $num != 11)

{

	$foo['singular'];

}


if (substr ($num, -1) >= 2 && substr ($num, -1) <= 4 && $num < 12 && $num > 14)

{

	$foo['plural_1'];

}


if ((substr ($num, -1) >= 5 && substr ($num, -1) <= 9) || substr ($num, -1) == 0 || ($num >= 11 && $num <= 14))

{

	$foo['plural_2'];

}

Hope I was not get wrong with logic. :) We have to modify lang files this way:

$lang = array (

'approve_img_mem' => array (

	'singular' => '<i>There is <b>%s</b> pending image for this member</i>',

	'plural' => '<i>There are <b>%s</b> pending images for this member</i>'));

Different language files can have different number of strings according to language logic, for example:

$lang = array (

'approve_img_mem' => array (

	'singular' => '<i><b>%s</b> изображение ожидает подтверждения для этого пользователя</i>',

	'plural_1' => '<i><b>%s</b> изображения ожидает подтверждения для этого пользователя</i>'),

	'plural_2' => '<i><b>%s</b> изображений ожидает подтверждения для этого пользователя</i>'));



Maybe some other people from different countries tell us about their grammar rules to see that my suggestion is (or not) enough to provide all forms.

Link to comment
Share on other sites


The major issue with that approach is that we'd have no way to build those grammar classes for other languages, given that most of us only read/write English.



Well, you don't need to do it yourselves, you only need to add the ability to add some custom grammar logic and add the ability to to store language strings as arrays (and let the forum to use them). As you don't make translations yourselves, you don't need to make foreign grammar logic yourselves too. Just let do it to foreign people. Maybe not in plain complicated PHP, maybe in some kind of abstraction level, like you did it with procedures in templates. I think it's very useful even for you and for the most popular English
Link to comment
Share on other sites

  • 11 months later...

Actually, I've just came up right with the same question and asked it in Support. Indeed, it's a bit of a problem, which, I think, can be even solved globally. You don't need to know all available languages - you only need to ADD AN OPTION to use multiple plural forms. You can add a simple format for strings, which can be used to add as much plural checks as needed. Here's a small example, which came in my mind right now:

Single form - stays as is.

Multiple form - if no format given, stays as it. When formatted - parse it.

Format:

  • %a ... %a - delimiter, needed for parsing purposes only
  • {nums}::X,Y,Z[X-Z] - plural numbers, which have different suffix (X,Y,Z - list of numbers separated by comma, [X-Z] - range of numbers)
  • {suffix}::suffix[default] - suffix for alternative numbers ([default] - default plural suffix)

Example: member%a{nums}::[2-4]{suffix}::i%a - means, that if number is plural and equals 2 to 4, "memberi" will be displayed, and for the rest - "members".

More complex example: member%a{nums}::[2,3]::4::[5-8]{suffix}::i::ye::ses%a - memberi, memberye, memberses

I think it's very easy to parse and all number-suffix work will be done by your customers, not yourself.
Link to comment
Share on other sites


Actually, I've just came up right with the same question and asked it in Support. Indeed, it's a bit of a problem, which, I think, can be even solved globally. You don't need to know all available languages - you only need to ADD AN OPTION to use multiple plural forms. You can add a simple format for strings, which can be used to add as much plural checks as needed. Here's a small example, which came in my mind right now:



Single form - stays as is.



Multiple form - if no format given, stays as it. When formatted - parse it.



Format:

  • %a ... %a - delimiter, needed for parsing purposes only
  • {nums}::X,Y,Z[X-Z] - plural numbers, which have different suffix (X,Y,Z - list of numbers separated by comma, [X-Z] - range of numbers)
  • {suffix}::suffix[default] - suffix for alternative numbers ([default] - default plural suffix)

Example: member%a{nums}::[2-4]{suffix}::i[s]%a - means, that if number is plural and equals 2 to 4, "memberi" will be displayed, and for the rest - "members".



More complex example: member%a{nums}::[2,3]::4::[5-8]{suffix}::i::ye::ses[s]%a - memberi, memberye, memberses



I think it's very easy to parse and all number-suffix work will be done by your customers, not yourself.





I think maybe 3 of our clients total would understand such a system. :P
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...