1 of 2
1
URL Title Foreign Character Conversion
Posted: 10 February 2007 04:20 PM   [ Ignore ]  
Research Assistant
RankRankRank
Total Posts:  968
Joined  06-13-2005

There’re several systems to transliterate some characters. One of these systems is here:

/* -------------------------------------
        /*  Create Foreign Character Conversion JS
        /* -------------------------------------*/
       
        $foreign_characters = array('223'    =>    "ss", // ß
'1072'    =>    "a",
'1073'    =>    "b",
'1074'    =>    "v",
'1075'    =>    "g",
'1076'    =>    "d",
'1077'    =>    "e",
'1105'    =>    "yo",
'1078'    =>    "zh",
'1079'    =>    "z",
'1080'    =>    "i",
'1081'    =>    "j",
'1082'    =>    "k",
'1083'    =>    "l",
'1084'    =>    "m",
'1085'    =>    "n",
'1086'    =>    "o",
'1087'    =>    "p",
'1088'    =>    "r",
'1089'    =>    "s",
'1090'    =>    "t",
'1091'    =>    "u",
'1092'    =>    "f",
'1093'    =>    "h",
'1094'    =>    "c",
'1095'    =>    "ch",
'1096'    =>    "sh",
'1097'    =>    "sch",
'1099'    =>    "y",
'1101'    =>    "e",
'1102'    =>    "yu",
'1103'    =>    "ya",
'1040'    =>    "a",
'1041'    =>    "b",
'1042'    =>    "v",
'1043'    =>    "g",
'1044'    =>    "d",
'1045'    =>    "e",
'1025'    =>    "yo",
'1046'    =>    "zh",
'1047'    =>    "z",
'1048'    =>    "i",
'1049'    =>    "j",
'1050'    =>    "k",
'1051'    =>    "l",
'1052'    =>    "m",
'1053'    =>    "n",
'1054'    =>    "o",
'1055'    =>    "p",
'1056'    =>    "r",
'1057'    =>    "s",
'1058'    =>    "t",
'1059'    =>    "u",
'1060'    =>    "f",
'1061'    =>    "h",
'1062'    =>    "c",
'1063'    =>    "ch",
'1064'    =>    "sh",
'1065'    =>    "sch",
'1067'    =>    "y",
'1069'    =>    "e",
'1070'    =>    "yu",
'1071'    =>    "ya",

But that is a hack that should be restored after each update.

So, the extension request is to create an extension that will allow users to use their favorite conversion system without restoring cp.publish.php each time.

Thanks.

 Signature 

tulks.com | expressionengine.lv

Profile
 
 
Posted: 24 June 2007 12:33 PM   [ Ignore ]   [ # 1 ]  
Summer Student
Total Posts:  24
Joined  09-20-2006

The same thing here, Iam editing these files:

/core/core.regex.php; start line: 858
/cp/cp.publish.php; start line: 660
/modules/weblog/mod.weblog_standalone.php; start line: 307

'138'    =>    "s",    // Š
'142'    =>    "z",    // Ž
'154'    =>    "s",    // š
'158'    =>    "z",    // ž
'159'    =>    "y",    // Ÿ
'192'    => "a", '193'    => "a", '194'    => "a", '195'    => "a", '196'    => "a", '197'    => "a", '198'    => "a",    // À, Á, Â, Ã, Ä, Å, Æ
'200'    =>    "e", '201'    =>    "e", '202'    =>    "e", '203'    =>    "e",    // È, É, Ê, Ë
'204'    =>    "i", '205'    =>    "i", '206'    =>    "i", '207'    =>    "i",    // Ì, Í, Î, Ï
'210'    => "o", '211'    => "o", '212'    => "o", '213'    => "o", '214'    => "o",    // Ò, Ó, Ô, Õ, Ö
'217'    => "u", '218'    => "u", '219'    => "u", '220'    => "u",    // Ù, Ú, Û, Ü
'221'    => "y",    // Ý
'223'    =>    "s", // ß
'224'    =>    "a", '225'    => "a", '226'    => "a", '229'    => "a",    // à, á, â, å
'227'    =>    "a", '228'    => "a", '230'    =>    "a",    // ã, ä, æ
'199'    =>    "c", '231'    =>    "c",    // Ç, ç
'232'    =>    "e", '233'    =>    "e", '234'    =>    "e", '235'    =>    "e",    // è, é, ê, ë
'236'    => "i", '237'    => "i", '238'    => "i", '239'    => "i",    // ì, í, î, ï
'241'    =>    "n",    // ñ
'242'    => "o", '243'    => "o", '244'    => "o", '245'    => "o", '246'    =>    "o",    // ò, ó, ô, õ, ö
'249'    => "u", '250'     => "u", '251'    => "u", '252'    =>    "u",    // ù, ú, û, ü
'253'    =>    "y", '255'    =>    "y",    // ý, ÿ
'256'    =>    "a", '257'    =>    "a",     // Ā, ā
'268'    =>    "c", '269'    =>    "c",     // Č, č
'270'    =>    "d", '271'    =>    "d",     // Ď, ď
'274'    =>    "e", '275'    =>    "e",     // Ē, ē
'276'    =>    "e", '277'    =>    "e",     // Ě, ě
'282'    =>    "e", '283'    =>    "e",     // Ě, ě
'290'    =>    "g", '291'    =>    "g",     // Ģ, ģ
'298'    =>    "i", '299'    =>    "i",     // Ī, ī
'310'    =>    "k", '311'    =>    "k",     // Ķ, ķ
'313'    =>    "l", '314'        =>    "l",     // Ĺ, ĺ
'315'    =>    "l", '316'        =>    "l",     // Ļ, ļ
'317'    =>    "l", '318'        =>    "l",     // Ľ, ľ
'321'    =>    "l", '322'        =>    "l",     // Ł, ł
'325'    =>    "n", '326'    =>    "n",    // Ņ, ņ
'327'    =>    "n", '328'    =>    "n",    // Ň, ň
'340'    =>    "r", '341'    =>    "r",    // Ŕ, ŕ
'344'    =>    "r", '345'    =>    "r",    // Ř, ř
'352'    =>    "s", '353'    =>    "s",    // Š, š
'356'    =>    "t", '357'    =>    "t",    // Ť, ť
'362'    =>    "u", '363'    =>    "u",    // Ū, ū
'366'    =>    "u", '367'    =>    "u",    // Ů, ů
'381'    =>    "z", '382'    =>    "z",    // Ž, ž

Profile
 
 
Posted: 24 June 2007 01:12 PM   [ Ignore ]   [ # 2 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  13102
Joined  05-15-2004

You really should make this a feature request. The developers will probably listen, but they are monolingual, so they would not know which letters to transiterate to what.

We have support for most characters in the Latin-1 (ISO-8859-1) charset already. Since we get “ü -> ue” in urls alrady, there is no reason why other accented characters shouldn’t work.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 24 June 2007 01:17 PM   [ Ignore ]   [ # 3 ]  
Summer Student
Total Posts:  24
Joined  09-20-2006

Thanks Ingmar, I’ll make feature request!

Profile
 
 
Posted: 24 June 2007 01:25 PM   [ Ignore ]   [ # 4 ]  
Summer Student
Total Posts:  24
Joined  09-20-2006

Regarding this topic I would like to request about changing part of code in:

/core/core.regex.php; start line: 858
/cp/cp.publish.php; start line: 660
/modules/weblog/mod.weblog_standalone.php; start line: 307

which will be support multilingual converting URL Title.

'138'    =>    "s",    // Š
'142'    =>    "z",    // Ž
'154'    =>    "s",    // š
'158'    =>    "z",    // ž
'159'    =>    "y",    // Ÿ
'192'    => "a", '193'    => "a", '194'    => "a", '195'    => "a", '196'    => "a", '197'    => "a", '198'    => "a",    // À, Á, Â, Ã, Ä, Å, Æ
'200'    =>    "e", '201'    =>    "e", '202'    =>    "e", '203'    =>    "e",    // È, É, Ê, Ë
'204'    =>    "i", '205'    =>    "i", '206'    =>    "i", '207'    =>    "i",    // Ì, Í, Î, Ï
'210'    => "o", '211'    => "o", '212'    => "o", '213'    => "o", '214'    => "o",    // Ò, Ó, Ô, Õ, Ö
'217'    => "u", '218'    => "u", '219'    => "u", '220'    => "u",    // Ù, Ú, Û, Ü
'221'    => "y",    // Ý
'223'    =>    "ss", // ß
'224'    =>    "a", '225'    => "a", '226'    => "a", '229'    => "a",    // à, á, â, å
'227'    =>    "a", '228'    => "a", '230'    =>    "a",    // ã, ä, æ
'199'    =>    "c", '231'    =>    "c",    // Ç, ç
'232'    =>    "e", '233'    =>    "e", '234'    =>    "e", '235'    =>    "e",    // è, é, ê, ë
'236'    => "i", '237'    => "i", '238'    => "i", '239'    => "i",    // ì, í, î, ï
'241'    =>    "n",    // ñ
'242'    => "o", '243'    => "o", '244'    => "o", '245'    => "o", '246'    =>    "o",    // ò, ó, ô, õ, ö
'249'    => "u", '250'     => "u", '251'    => "u", '252'    =>    "u",    // ù, ú, û, ü
'253'    =>    "y", '255'    =>    "y",    // ý, ÿ
'256'    =>    "a", '257'    =>    "a",     // Ā, ā
'268'    =>    "c", '269'    =>    "c",     // Č, č
'270'    =>    "d", '271'    =>    "d",     // Ď, ď
'274'    =>    "e", '275'    =>    "e",     // Ē, ē
'276'    =>    "e", '277'    =>    "e",     // Ě, ě
'282'    =>    "e", '283'    =>    "e",     // Ě, ě
'290'    =>    "g", '291'    =>    "g",     // Ģ, ģ
'298'    =>    "i", '299'    =>    "i",     // Ī, ī
'310'    =>    "k", '311'    =>    "k",     // Ķ, ķ
'313'    =>    "l", '314'        =>    "l",     // Ĺ, ĺ
'315'    =>    "l", '316'        =>    "l",     // Ļ, ļ
'317'    =>    "l", '318'        =>    "l",     // Ľ, ľ
'321'    =>    "l", '322'        =>    "l",     // Ł, ł
'325'    =>    "n", '326'    =>    "n",    // Ņ, ņ
'327'    =>    "n", '328'    =>    "n",    // Ň, ň
'340'    =>    "r", '341'    =>    "r",    // Ŕ, ŕ
'344'    =>    "r", '345'    =>    "r",    // Ř, ř
'352'    =>    "s", '353'    =>    "s",    // Š, š
'356'    =>    "t", '357'    =>    "t",    // Ť, ť
'362'    =>    "u", '363'    =>    "u",    // Ū, ū
'366'    =>    "u", '367'    =>    "u",    // Ů, ů
'381'    =>    "z", '382'    =>    "z",    // Ž, ž

Profile
 
 
Posted: 24 June 2007 03:36 PM   [ Ignore ]   [ # 5 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15005
Joined  06-03-2002

No need.  From the 1.6 changelog:

“Added foreign_character_conversion_array extension hook to allow developers to use a custom foreign character conversion array for URL titles.”

 Signature 
Profile
MSG
 
 
Posted: 24 June 2007 07:21 PM   [ Ignore ]   [ # 6 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  115
Joined  05-25-2002

I’ve made extension based on ”foreign_character_conversion_array” hook.

Title: Strange URL Interpreter
Purpose: Gives ability to convert foreign-language characters in entry’s URL Title to proper characters.

Version 1.1.0 support more than Slovakian, Czech, Hungarian and Russian alphabet. If there are any unsupported, or wrong converted characters in your native language, let me know.

Version: 1.1.0 (25/06/2007)
Version: 1.2.0 (07/07/2007)
- Added: Lira, Degree, Yen, Pound and Cent sign
- Fixed: Update and Disable function
Version: 1.3.0 (22/09/2007)
- Added: Character set supporting Poland language (Thanks to Gabriel Borkowski).

EDIT: The current version of the extension is here

Image Attachments
Strange_URL_Interpreter_Screenshot.png
Click thumbnail to see full-size image
File Attachments
strange_url_interpreter_130.zip  (File Size: 8KB - Downloads: 193)
 Signature 

Caleydon Media

+ CM Strange URL Interpreter
+ CM Textarea Resizer
+ CM Template Editor (EditArea)

Profile
 
 
Posted: 24 June 2007 10:59 PM   [ Ignore ]   [ # 7 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15005
Joined  06-03-2002

Fast work, Gabriel!  Say, if some others can check the accuracy of this conversion array, and if you name it something more specific than “Foreign URL Title”, we can probably add this to the repository.  Though me being rather ignorant of the alphabets of non-latin languages, I do not have anything good to suggest that would cover those four alphabets.

 Signature 
Profile
MSG
 
 
Posted: 25 June 2007 01:44 AM   [ Ignore ]   [ # 8 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  13102
Joined  05-15-2004

Nice work, of course I totally missed that hook in the changelog.

As a German native speaker, I’d like to add that German is fully supported as well (as has, in fact, been by EE natively for some time), also French, as far as I can tell.

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 25 June 2007 02:57 PM   [ Ignore ]   [ # 9 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  115
Joined  05-25-2002

I prepared Conversion Table for Strange URL Interpreter v1.3.0

 Signature 

Caleydon Media

+ CM Strange URL Interpreter
+ CM Textarea Resizer
+ CM Template Editor (EditArea)

Profile
 
 
Posted: 25 June 2007 03:04 PM   [ Ignore ]   [ # 10 ]  
Administrator
Avatar
RankRankRankRankRankRankRank
Total Posts:  15005
Joined  06-03-2002

lol, that’s not quite what I meant for changing the extension name.  Perhaps some others with knowledge of these alphabets will have a better idea.

 Signature 
Profile
MSG
 
 
Posted: 25 June 2007 03:54 PM   [ Ignore ]   [ # 11 ]  
Moderator
Avatar
RankRankRankRankRankRankRank
Total Posts:  13102
Joined  05-15-2004

Also, I don’t quite agree with some of the transliterations. I think “ä” should be “ae”, not “a”. Same for ö (oe), ü (ue) and probably æ (ae).

 Signature 

Everything will be good in the end. If it’s not good, it’s not the end.

Profile
MSG
 
 
Posted: 25 June 2007 04:33 PM   [ Ignore ]   [ # 12 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  115
Joined  05-25-2002

Ingmar: Is not possible made any universal conversion, but why “ä” should be “ae” and not “a”? I think, URL title was developed as power tool for Search engines (SEO) and not for people. Who reading titles in URL, especially entries title???

For example: I’ve got slovak word “mäso” (meat). When I put to Google word “maso” I return searching resutls related to word “mäso” and “maso”. Nobody will be searching “maeso” because this word doesn’t exist! It mean, conversion “ä” >> “ae” simply and powerful harms Pagerank of your website.

Iam sure, you will find equal example in your language. If not, let me know.

 Signature 

Caleydon Media

+ CM Strange URL Interpreter
+ CM Textarea Resizer
+ CM Template Editor (EditArea)

Profile
 
 
Posted: 25 June 2007 04:52 PM   [ Ignore ]   [ # 13 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  115
Joined  05-25-2002

I’ve got idea. Maybe will be useful create various conversion sets in dependence on different languages. In the settings of the extension you’ll have option choose conversion set, as you want.

 Signature 

Caleydon Media

+ CM Strange URL Interpreter
+ CM Textarea Resizer
+ CM Template Editor (EditArea)

Profile
 
 
Posted: 25 June 2007 04:58 PM   [ Ignore ]   [ # 14 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  756
Joined  03-16-2002

Gabriel, I think Ingmar is right regarding german umlauts ä, ö, ü - it is the common transliteration, even search engines are honoring this form. Following your thoughts: Nobody will search for “nurnberg”, but someone who doesn’t have a german keyboard might search for “Nuernberg” instead of “Nürnberg”. The built-in url transliteration in EE replaces ü with ue as well.

Taking our example in consideration, that shows that there perhaps a editable translation table would be favourable - depending on the language one might want another transliteration for slovak and german.

EDIT: Too late ... I see you had the same idea a couple of minutes before my post wink

 Signature 

Markus Stolpmann • Expression Engine 1.6.4
Current project: kamaco.info - (german) website for Austrian publication coach K. Matejcek

Profile
 
 
Posted: 25 June 2007 06:34 PM   [ Ignore ]   [ # 15 ]  
Lab Assistant
Avatar
RankRank
Total Posts:  122
Joined  10-13-2002

I think English speakers—at least the American ones—would search for “Nuremberg” in all cases. Nürnberg is unknown to us. It’s a little bit like how there is no Moscow but that’s where we all want to go.

Profile
 
 
Posted: 25 June 2007 06:40 PM   [ Ignore ]   [ # 16 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  756
Joined  03-16-2002

Maybe - but they wouldn’t find anything with that search term on a german language page, wouldn’t they? I was referring to a german speaking person looking for german information but not having a german keyboard. But this is far out of topic - “Nürnberg” would be in the content of the page, its just how to recode the URL title.

 Signature 

Markus Stolpmann • Expression Engine 1.6.4
Current project: kamaco.info - (german) website for Austrian publication coach K. Matejcek

Profile
 
 
Posted: 25 June 2007 10:37 PM   [ Ignore ]   [ # 17 ]  
Moderator
Avatar
RankRankRankRankRankRank