Extension: Unicode Normalizer
Posted: 09 February 2010 10:57 AM   [ Ignore ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1135
Joined  02-02-2007

The attached add-on files may be out of date.  Please visit Devot-ee for the latest versions.

In many languages some characters can be used together with one or more diacritics.

In unicode a character having diacritics can be represenred either by one symbol or by sequence of several symbols consisting of the main character and one or more combining diacritical marks.

Support of combining diacritics vary widely among OSes and fonts. In many cases combining diacritics are placed wrongly and cause the text to look ugly. To avoid this in all cases when it is possible characters having diacritics should be represented by single symbols.

Site admins have no control over what symbols will be entered by the users, many of whom do not know anything about unicode and the possible ways of representing characters having diacritics; users just type or copy/paste the text.

Unicode Normalizer extension will convert all user input (i.e. weblog entries, comments and forum posts) when they are submitted into unicode’s normalization form C in which character having diacritics are represented by single symbols.

REQUIREMENTS: This extension requires either PHP version 5.3 or higher, or PHP extension PECL intl.

 Signature 

Full list of add-ons here

Child Categories
Browser Sniff
Category Id
Entries Number

Profile
 
 
Posted: 09 February 2010 04:06 PM   [ Ignore ]   [ # 1 ]  
Chancellor's Fellow
Avatar
RankRankRankRankRankRankRankRank
Total Posts:  33338
Joined  05-15-2004

To be honest, I am amazed to see that there is need for such an extension. I have never heard of people using composite diacritical marks, is that really an issue?

Profile
MSG
 
 
Posted: 10 February 2010 02:12 PM   [ Ignore ]   [ # 2 ]  
Lab Technician
Avatar
RankRankRankRank
Total Posts:  1135
Joined  02-02-2007

Hi Ingmar,

Precombined characters support for major European languages is good and use of combining diacriticals is very rare. But for some other languages better support for precombined characters in fonts is more recent and there are lots of digital content in which combining diacritical marks are used. Copy/paste from such content can cause the text in the website to look ugly.

For example, I’m active in the project in which texts in unicode’s Greek polytonic subrange are published. Vowels in this subrange can have up to 3 diacritics and in some cases when the text is digitized several years ago those diacritics often are represented by combining marks.

 Signature 

Full list of add-ons here

Child Categories
Browser Sniff
Category Id
Entries Number

Profile
 
 
Posted: 10 February 2010 02:15 PM   [ Ignore ]   [ # 3 ]  
Chancellor's Fellow
Avatar
RankRankRankRankRankRankRankRank
Total Posts:  33338
Joined  05-15-2004

Thanks for clarifying, Laisvunas smile

Profile
MSG
 
 
Posted: 19 May 2010 02:44 PM   [ Ignore ]   [ # 4 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  400
Joined  03-07-2007

Indeed, the most of obscure usage and edge situations. Glad there are some of these extensions made and available. Thnx Lais.

Profile