1 of 2
1
URL Title Foreign Character Conversion
Posted: 10 February 2007 04:20 PM   [ Ignore ]  
Research Assistant
RankRankRank
Total Posts:  943
Joined  2005-06-13

There’re several systems to transliterate some characters. One of these systems is here:

/* -------------------------------------
        /*  Create Foreign Character Conversion JS
        /* -------------------------------------*/
       
        $foreign_characters = array('223'    =>    "ss", // ß
'1072'    =>    "a",
'1073'    =>    "b",
'1074'    =>    "v",
'1075'    =>    "g",
'1076'    =>    "d",
'1077'    =>    "e",
'1105'    =>    "yo",
'1078'    =>    "zh",
'1079'    =>    "z",
'1080'    =>    "i",
'1081'    =>    "j",
'1082'    =>    "k",
'1083'    =>    "l",
'1084'    =>    "m",
'1085'    =>    "n",
'1086'    =>    "o",
'1087'    =>    "p",
'1088'    =>    "r",
'1089'    =>    "s",
'1090'    =>    "t",
'1091'    =>    "u",
'1092'    =>    "f",
'1093'    =>    "h",
'1094'    =>    "c",
'1095'    =>    "ch",
'1096'    =>    "sh",
'1097'    =>    "sch",
'1099'    =>    "y",
'1101'    =>    "e",
'1102'    =>    "yu",
'1103'    =>    "ya",
'1040'    =>    "a",
'1041'    =>    "b",
'1042'    =>    "v",
'1043'    =>    "g",
'1044'    =>    "d",
'1045'    =>    "e",
'1025'    =>    "yo",
'1046'    =>    "zh",
'1047'    =>    "z",
'1048'    =>    "i",
'1049'    =>    "j",
'1050'    =>    "k",
'1051'    =>    "l",
'1052'    =>    "m",
'1053'    =>    "n",
'1054'    =>    "o",
'1055'    =>    "p",
'1056'    =>    "r",
'1057'    =>    "s",
'1058'    =>    "t",
'1059'    =>    "u",
'1060'    =>    "f",
'1061'    =>    "h",
'1062'    =>    "c",
'1063'    =>    "ch",
'1064'    =>    "sh",
'1065'    =>    "sch",
'1067'    =>    "y",
'1069'    =>    "e",
'1070'    =>    "yu",
'1071'    =>    "ya",

But that is a hack that should be restored after each update.

So, the extension request is to create an extension that will allow users to use their favorite conversion system without restoring cp.publish.php each time.

Thanks.

 Signature 

tulks.com

Profile
 
 
Posted: 24 June 2007 12:33 PM   [ Ignore ]   [ # 1 ]  
Summer Student
Total Posts:  24
Joined  2006-09-20

The same thing here, Iam editing these files:

/core/core.regex.php; start line: 858
/cp/cp.publish.php; start line: 660
/modules/weblog/mod.weblog_standalone.php; start line: 307

'138'    =>    "s",    // Š
'142'    =>    "z",    // Ž
'154'    =>    "s",    // š
'158'    =>    "z",    // ž
'159'    =>    "y",    // Ÿ
'192'    => "a", '193'    => "a", '194'    => "a", '195'    => "a", '196'    => "a", '197'    => "a", '198'    => "a",    // À, Á, Â, Ã, Ä, Å, Æ
'200'    =>    "e", '201'    =>    "e", '202'    =>    "e", '203'    =>    "e",    // È, É, Ê, Ë
'204'    =>    "i", '205'    =>    "i", '206'    =>    "i", '207'    =>    "i",    // Ì, Í, Î, Ï
'210'    => "o", '211'    => "o", '212'    => "o", '213'    => "o", '214'    => "o",    // Ò, Ó, Ô, Õ, Ö
'217'    => "u", '218'    => "u", '219'    => "u", '220'    => "u",    // Ù, Ú, Û, Ü
'221'    => "y",    // Ý
'223'    =>    "s", // ß
'224'    =>    "a", '225'    => "a", '226'    => "a", '229'    => "a",    // à, á, â, å
'227'    =>    "a", '228'    => "a", '230'    =>    "a",    // ã, ä, æ
'199'    =>    "c", '231'    =>    "c",    // Ç, ç
'232'    =>    "e", '233'    =>    "e", '234'    =>    "e", '235'    =>    "e",    // è, é, ê, ë
'236'    => "i", '237'    => "i", '238'    => "i", '239'    => "i",    // ì, í, î, ï
'241'    =>    "n",    // ñ
'242'    => "o", '243'    => "o", '244'    => "o", '245'    => "o", '246'    =>    "o",    // ò, ó, ô, õ, ö
'249'    => "u", '250'     => "u", '251'    => "u", '252'    =>    "u",    // ù, ú, û, ü
'253'    =>    "y", '255'    =>    "y",    // ý, ÿ
'256'    =>    "a", '257'    =>    "a",     // Ā, ā
'268'    =>    "c", '269'    =>    "c",     // Č, č
'270'    =>    "d", '271'    =>    "d",     // Ď, ď
'274'    =>    "e", '275'    =>    "e",     // Ē, ē
'276'    =>    "e", '277'    =>    "e",     // Ě, ě
'282'    =>    "e", '283'    =>    "e",     // Ě, ě
'290'    =>    "g", '291'    =>    "g",     // Ģ, ģ
'298'    =>    "i", '299'    =>    "i",     // Ī, ī
'310'    =>    "k", '311'    =>    "k",     // Ķ, ķ
'313'    =>    "l", '314'        =>    "l",     // Ĺ, ĺ
'315'    =>    "l", '316'        =>    "l",     // Ļ, ļ
'317'    =>    "l", '318'        =>    "l",     // Ľ, ľ
'321'    =>    "l", '322'        =>    "l",     // Ł, ł
'325'    =>    "n", '326'    =>    "n",    // Ņ, ņ
'327'    =>    "n", '328'    =>    "n",    // Ň, ň
'340'    =>    "r", '341'    =>    "r",    // Ŕ, ŕ
'344'    =>    "r", '345'    =>    "r",    // Ř, ř
'352'    =>    "s", '353'    =>    "s",    // Š, š
'356'    =>    "t", '357'    =>    "t",    // Ť, ť
'362'    =>    "u", '363'    =>    "u",    // Ū, ū
'366'    =>    "u", '367'    =>    "u",    // Ů, ů
'381'    =>    "z", '382'    =>    "z",    // Ž, ž

Profile
 
 
Posted: 24 June 2007 01:12 PM   [ Ignore ]   [ # 2 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  11077
Joined  2004-05-15

You really should make this a feature request. The developers will probably listen, but they are monolingual, so they would not know which letters to transiterate to what.

We have support for most characters in the Latin-1 (ISO-8859-1) charset already. Since we get “ü -> ue” in urls alrady, there is no reason why other accented characters shouldn’t work.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.
No PM’s, please—use regular mail.

Profile
 
 
Posted: 24 June 2007 01:17 PM   [ Ignore ]   [ # 3 ]  
Summer Student
Total Posts:  24
Joined  2006-09-20

Thanks Ingmar, I’ll make feature request!

Profile
 
 
Posted: 24 June 2007 01:25 PM   [ Ignore ]   [ # 4 ]  
Summer Student
Total Posts:  24
Joined  2006-09-20

Regarding this topic I would like to request about changing part of code in:

/core/core.regex.php; start line: 858
/cp/cp.publish.php; start line: 660
/modules/weblog/mod.weblog_standalone.php; start line: 307

which will be support multilingual converting URL Title.

'138'    =>    "s",    // Š
'142'    =>    "z",    // Ž
'154'    =>    "s",    // š
'158'    =>    "z",    // ž
'159'    =>    "y",    // Ÿ
'192'    => "a", '193'    => "a", '194'    => "a", '195'    => "a", '196'    => "a", '197'    => "a", '198'    => "a",    // À, Á, Â, Ã, Ä, Å, Æ
'200'    =>    "e", '201'    =>    "e", '202'    =>    "e", '203'    =>    "e",    // È, É, Ê, Ë
'204'    =>    "i", '205'    =>    "i", '206'    =>    "i", '207'    =>    "i",    // Ì, Í, Î, Ï
'210'    => "o", '211'    => "o", '212'    => "o", '213'    => "o", '214'    => "o",    // Ò, Ó, Ô, Õ, Ö
'217'    => "u", '218'    => "u", '219'    => "u", '220'    => "u",    // Ù, Ú, Û, Ü
'221'    => "y",    // Ý
'223'    =>    "ss", // ß
'224'    =>    "a", '225'    => "a", '226'    => "a", '229'    => "a",    // à, á, â, å
'227'    =>    "a", '228'    => "a", '230'    =>    "a",    // ã, ä, æ
'199'    =>    "c", '231'    =>    "c",    // Ç, ç
'232'    =>    "e", '233'    =>    "e", '234'    =>    "e", '235'    =>    "e",    // è, é, ê, ë
'236'    => "i", '237'    => "i", '238'    => "i", '239'    => "i",    // ì, í, î, ï
'241'    =>    "n",    // ñ
'242'    => "o", '243'    => "o", '244'    => "o", '245'    => "o", '246'    =>    "o",    // ò, ó, ô, õ, ö
'249'    => "u", '250'     => "u", '251'    => "u", '252'    =>    "u",    // ù, ú, û, ü
'253'    =>    "y", '255'    =>    "y",    // ý, ÿ
'256'    =>    "a", '257'    =>    "a",     // Ā, ā
'268'    =>    "c", '269'    =>    "c",     // Č, č
'270'    =>    "d", '271'    =>    "d",     // Ď, ď
'274'    =>    "e", '275'    =>    "e",     // Ē, ē
'276'    =>    "e", '277'    =>    "e",     // Ě, ě
'282'    =>    "e", '283'    =>    "e",     // Ě, ě
'290'    =>    "g", '291'    =>    "g",     // Ģ, ģ
'298'    =>    "i", '299'    =>    "i",     // Ī, ī
'310'    =>    "k", '311'    =>    "k",     // Ķ, ķ
'313'    =>    "l", '314'        =>    "l",     // Ĺ, ĺ
'315'    =>    "l", '316'        =>    "l",     // Ļ, ļ
'317'    =>    "l", '318'        =>    "l",     // Ľ, ľ
'321'    =>    "l", '322'        =>    "l",     // Ł, ł
'325'    =>    "n", '326'    =>    "n",    // Ņ, ņ
'327'    =>    "n", '328'    =>    "n",    // Ň, ň
'340'    =>    "r", '341'    =>    "r",    // Ŕ, ŕ
'344'    =>    "r", '345'    =>    "r",    // Ř, ř
'352'    =>    "s", '353'    =>    "s",    // Š, š
'356'    =>    "t", '357'    =>    "t",    // Ť, ť
'362'    =>    "u", '363'    =>    "u",    // Ū, ū
'366'    =>    "u", '367'    =>    "u",    // Ů, ů
'381'    =>    "z", '382'    =>    "z",    // Ž, ž

Profile
 
 
Posted: 24 June 2007 03:36 PM   [ Ignore ]   [ # 5 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  13817
Joined  2002-06-03

No need.  From the 1.6 changelog:

“Added foreign_character_conversion_array extension hook to allow developers to use a custom foreign character conversion array for URL titles.”

 Signature 
Profile
 
 
Posted: 24 June 2007 07:21 PM   [ Ignore ]   [ # 6 ]  
Grad Student
Avatar
Rank
Total Posts:  99
Joined  2002-05-25

I’ve made extension based on ”foreign_character_conversion_array” hook.

Title: Strange URL Interpreter
Purpose: Gives ability to convert foreign-language characters in entry’s URL Title to proper characters.

Version 1.1.0 support more than Slovakian, Czech, Hungarian and Russian alphabet. If there are any unsupported, or wrong converted characters in your native language, let me know.

Version: 1.1.0 (25/06/2007)
Version: 1.2.0 (07/07/2007)
- Added: Lira, Degree, Yen, Pound and Cent sign
- Fixed: Update and Disable function
Version: 1.3.0 (22/09/2007)
- Added: Character set supporting Poland language (Thanks to Gabriel Borkowski).

EDIT: The current version of the extension is here

Image Attachments
Strange_URL_Interpreter_Screenshot.png
Click thumbnail to see full-size image
File Attachments
strange_url_interpreter_130.zip  (File Size: 8KB - Downloads: 162)
 Signature 

Caleydon Media

+ CM Strange URL Interpreter
+ CM Textarea Resizer
+ CM Template Editor (EditArea)

Profile
 
 
Posted: 24 June 2007 10:59 PM   [ Ignore ]   [ # 7 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  13817
Joined  2002-06-03

Fast work, Gabriel!  Say, if some others can check the accuracy of this conversion array, and if you name it something more specific than “Foreign URL Title”, we can probably add this to the repository.  Though me being rather ignorant of the alphabets of non-latin languages, I do not have anything good to suggest that would cover those four alphabets.

 Signature 
Profile
 
 
Posted: 25 June 2007 01:44 AM   [ Ignore ]   [ # 8 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  11077
Joined  2004-05-15

Nice work, of course I totally missed that hook in the changelog.

As a German native speaker, I’d like to add that German is fully supported as well (as has, in fact, been by EE natively for some time), also French, as far as I can tell.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.
No PM’s, please—use regular mail.

Profile
 
 
Posted: 25 June 2007 02:57 PM   [ Ignore ]   [ # 9 ]  
Grad Student
Avatar
Rank
Total Posts:  99
Joined  2002-05-25

I prepared Conversion Table for Strange URL Interpreter v1.3.0

 Signature 

Caleydon Media

+ CM Strange URL Interpreter
+ CM Textarea Resizer
+ CM Template Editor (EditArea)

Profile
 
 
Posted: 25 June 2007 03:04 PM   [ Ignore ]   [ # 10 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  13817
Joined  2002-06-03

lol, that’s not quite what I meant for changing the extension name.  Perhaps some others with knowledge of these alphabets will have a better idea.

 Signature 
Profile
 
 
Posted: 25 June 2007 03:54 PM   [ Ignore ]   [ # 11 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  11077
Joined  2004-05-15

Also, I don’t quite agree with some of the transliterations. I think “ä” should be “ae”, not “a”. Same for ö (oe), ü (ue) and probably æ (ae).

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.
No PM’s, please—use regular mail.

Profile
 
 
Posted: 25 June 2007 04:33 PM   [ Ignore ]   [ # 12 ]  
Grad Student
Avatar
Rank
Total Posts:  99
Joined  2002-05-25

Ingmar: Is not possible made any universal conversion, but why “ä” should be “ae” and not “a”? I think, URL title was developed as power tool for Search engines (SEO) and not for people. Who reading titles in URL, especially entries title???

For example: I’ve got slovak word “mäso” (meat). When I put to Google word “maso” I return searching resutls related to word “mäso” and “maso”. Nobody will be searching “maeso” because this word doesn’t exist! It mean, conversion “ä” >> “ae” simply and powerful harms Pagerank of your website.

Iam sure, you will find equal example in your language. If not, let me know.

 Signature 

Caleydon Media

+ CM Strange URL Interpreter
+ CM Textarea Resizer
+ CM Template Editor (EditArea)

Profile
 
 
Posted: 25 June 2007 04:52 PM   [ Ignore ]   [ # 13 ]  
Grad Student
Avatar
Rank
Total Posts:  99
Joined  2002-05-25

I’ve got idea. Maybe will be useful create various conversion sets in dependence on different languages. In the settings of the extension you’ll have option choose conversion set, as you want.

 Signature 

Caleydon Media

+ CM Strange URL Interpreter
+ CM Textarea Resizer
+ CM Template Editor (EditArea)

Profile
 
 
Posted: 25 June 2007 04:58 PM   [ Ignore ]   [ # 14 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  744
Joined  2002-03-16

Gabriel, I think Ingmar is right regarding german umlauts ä, ö, ü - it is the common transliteration, even search engines are honoring this form. Following your thoughts: Nobody will search for “nurnberg”, but someone who doesn’t have a german keyboard might search for “Nuernberg” instead of “Nürnberg”. The built-in url transliteration in EE replaces ü with ue as well.

Taking our example in consideration, that shows that there perhaps a editable translation table would be favourable - depending on the language one might want another transliteration for slovak and german.

EDIT: Too late ... I see you had the same idea a couple of minutes before my post wink

 Signature 

Markus Stolpmann • Expression Engine 1.6.3, build 20080319
Current project: kamaco.info - (german) website for Austrian publication coach K. Matejcek

Profile
 
 
Posted: 25 June 2007 06:34 PM   [ Ignore ]   [ # 15 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  106
Joined  2002-10-13

I think English speakers—at least the American ones—would search for “Nuremberg” in all cases. Nürnberg is unknown to us. It’s a little bit like how there is no Moscow but that’s where we all want to go.

Profile
 
 
Posted: 25 June 2007 06:40 PM   [ Ignore ]   [ # 16 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  744
Joined  2002-03-16

Maybe - but they wouldn’t find anything with that search term on a german language page, wouldn’t they? I was referring to a german speaking person looking for german information but not having a german keyboard. But this is far out of topic - “Nürnberg” would be in the content of the page, its just how to recode the URL title.

 Signature 

Markus Stolpmann • Expression Engine 1.6.3, build 20080319
Current project: kamaco.info - (german) website for Austrian publication coach K. Matejcek

Profile
 
 
Posted: 25 June 2007 10:37 PM   [ Ignore ]   [ # 17 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  11077
Joined  2004-05-15
Gabriel - 25 June 2007 04:33 PM

… but why “ä” should be “ae” and not “a”?

Because that is the general accepted transliteration in German (I can’t speak for Slovak, of course). Actually, EE handles it exactly that way right now. The good things with an extension, of course, is that everybody can handle that as he likes, without having to resort to hacks.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.
No PM’s, please—use regular mail.

Profile
&nb