We use cookies to improve your experience. No personal information is gathered and we don't serve ads. Cookies Policy.

ExpressionEngine Logo ExpressionEngine
Features Pricing Support Find A Developer
Partners Upgrades
Blog Add-Ons Learn
Docs Forums University
Log In or Sign Up
Log In Sign Up
ExpressionEngine Logo
Features Pro new Support Find A Developer
Partners Upgrades
Blog Add-Ons Learn
Docs Forums University Blog
  • Home
  • Forums

Extension: Unicode Normalizer

Development and Programming

Laisvunas's avatar
Laisvunas
879 posts
16 years ago
Laisvunas's avatar Laisvunas

The attached add-on files may be out of date. Please visit Devot-ee for the latest versions.

In many languages some characters can be used together with one or more diacritics.

In unicode a character having diacritics can be represenred either by one symbol or by sequence of several symbols consisting of the main character and one or more combining diacritical marks.

Support of combining diacritics vary widely among OSes and fonts. In many cases combining diacritics are placed wrongly and cause the text to look ugly. To avoid this in all cases when it is possible characters having diacritics should be represented by single symbols.

Site admins have no control over what symbols will be entered by the users, many of whom do not know anything about unicode and the possible ways of representing characters having diacritics; users just type or copy/paste the text.

Unicode Normalizer extension will convert all user input (i.e. weblog entries, comments and forum posts) when they are submitted into unicode’s normalization form C in which character having diacritics are represented by single symbols.

REQUIREMENTS: This extension requires either PHP version 5.3 or higher, or PHP extension PECL intl.

       
Ingmar's avatar
Ingmar
29,245 posts
16 years ago
Ingmar's avatar Ingmar

To be honest, I am amazed to see that there is need for such an extension. I have never heard of people using composite diacritical marks, is that really an issue?

       
Laisvunas's avatar
Laisvunas
879 posts
16 years ago
Laisvunas's avatar Laisvunas

Hi Ingmar,

Precombined characters support for major European languages is good and use of combining diacriticals is very rare. But for some other languages better support for precombined characters in fonts is more recent and there are lots of digital content in which combining diacritical marks are used. Copy/paste from such content can cause the text in the website to look ugly.

For example, I’m active in the project in which texts in unicode’s Greek polytonic subrange are published. Vowels in this subrange can have up to 3 diacritics and in some cases when the text is digitized several years ago those diacritics often are represented by combining marks.

       
Ingmar's avatar
Ingmar
29,245 posts
16 years ago
Ingmar's avatar Ingmar

Thanks for clarifying, Laisvunas 😊

       
GDmac - expocom's avatar
GDmac - expocom
350 posts
15 years ago
GDmac - expocom's avatar GDmac - expocom

Indeed, the most of obscure usage and edge situations. Glad there are some of these extensions made and available. Thnx Lais.

       

Reply

Sign In To Reply

ExpressionEngine Home Features Pro Contact Version Support
Learn Docs University Forums
Resources Support Add-Ons Partners Blog
Privacy Terms Trademark Use License

Packet Tide owns and develops ExpressionEngine. © Packet Tide, All Rights Reserved.