Unicode anomaly

From Wikipedia, the free encyclopedia

Jump to: navigation, search

The topic of this article may not meet Wikipedia's general notability guideline. Please help to establish notability by citing reliable secondary sources that are independent of the topic and provide significant coverage of it beyond its mere trivial mention. If notability cannot be established, the article is likely to be merged, redirected, or deleted.
Find sources: "Unicode anomaly" – news · newspapers · books · scholar · JSTOR · free images (May 2013) (Learn how and when to remove this template message)

This article needs attention from an expert on the subject. Please add a reason or a talk parameter to this template to explain the issue with the article. Consider associating this request with a WikiProject. (November 2010)

This article relies too much on references to primary sources. Please improve this by adding secondary or tertiary sources. (October 2014) (Learn how and when to remove this template message)

The Unicode Standard has imposed for itself strict rules to guarantee stability.^[1] Depending on the grade of strictness of a rule, a change can be prohibited or allowed. For example, a "Name" given to a code point can not and will not change. But a "Script" property is more flexible, by Unicode's own rules. In version 2.0, Unicode changed many code point "Names" from version 1. At the same moment, Unicode stated that from then on, an assigned Name to a code point will never change anymore. This implies that when mistakes are published, these mistakes cannot be corrected, even if they are trivial (as happened in one instance with the spelling BRAKCET for BRACKET in a character name).

Anomalies[edit]

In 2006 Unicode has published a list of anomalies in character names.^[2]

U+0818 ࠘ SAMARITAN MARK DAGESH and U+0819 ࠙ SAMARITAN MARK OCCLUSION: Names mixed up.

Corrected text, names swapped:

U+0818 ࠘ SAMARITAN MARK OCCLUSION (HTML ࠘ · "strengthens" the consonant, for example changing /w/ to /b/) and

U+0819 ࠙ SAMARITAN MARK DAGESH (HTML ࠙ · indicates consonant gemination)^[3]

U+2118 ℘ SCRIPT CAPITAL P (HTML ℘ · &weierp;): it is not a capital

The name says "capital", but it is a small letter. The true capital is U+1D4AB 𝒫 MATHEMATICAL SCRIPT CAPITAL P (HTML 𝒫)^[4]

U+FE18 ︘ PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET (HTML ︘): BRAKCET is spelled wrong. Since this is the fixed Character Name by policy, it cannot be changed.^[5]

References[edit]

Unicode

Code points

Characters

Special purpose	BOM Combining Grapheme Joiner Left-to-right mark / Right-to-left mark Soft hyphen Word joiner Zero-width joiner Zero-width non-joiner Zero-width space

Lists	Characters CJK Unified Ideographs Combining character Duplicate characters Numerals Scripts Spaces Symbols Halfwidth and fullwidth

Processing

Algorithms	Bi-directional text Collation ISO 14651 Equivalence Variation sequences

Comparison	BOCU-1 CESU-8 Punycode SCSU UTF-1 UTF-7 UTF-8 UTF-9/UTF-18 UTF-16/UCS-2 UTF-32/UCS-4 UTF-EBCDIC

On pairs of
code points

Usage

Related standards

Related topics

Scripts and symbols in Unicode

Common and inherited scripts	Combining marks Diacritics Punctuation Space

Modern scripts	Adlam Arabic diacritics Armenian Balinese Bamum Batak Bengali Bopomofo Braille Buhid Burmese Canadian Aboriginal Chakma Cham Cherokee CJK Unified Ideographs (Han) Cyrillic Deseret Devanagari Ge'ez Georgian Greek Gujarati Gurmukhī Hangul Hanja Hanunó'o Hebrew diacritics Hiragana Javanese Kanji Kannada Katakana Kayah Li Khmer Khudawadi Lao Latin Lepcha Limbu Lisu (Fraser) Lontara Malayalam Mandaic Meetei Mayek Mende Kikakui Miao (Pollard) Mongolian Mro N'Ko New Tai Lue Newa Ol Chiki Oriya Osage Osmanya Pahawh Hmong Pau Cin Hau Rejang Samaritan Śāradā Saurashtra Shavian Sinhala Sorang Sompeng Sundanese Sylheti Nagari Syriac Tagalog (Baybayin) Tagbanwa Tai Le Tai Tham Tai Viet Takri Tamil Telugu Thaana Thai Tibetan Tifinagh Tirhuta Vai Varang Kshiti Yi

Ancient and historic scripts	Ahom Anatolian hieroglyphs Ancient North Arabian Avestan Bassa Vah Bhaiksuki Brāhmī Carian Caucasian Albanian Coptic Cuneiform Cypriot Egyptian hieroglyphs Elbasan Glagolitic Gothic Grantha Hatran Imperial Aramaic Inscriptional Pahlavi Inscriptional Parthian Kaithi Kharosthi Khojki Linear A Linear B Lycian Lydian Mahajani Manichaean Marchen Meroitic Modi Multani Nabataean Ogham Old Hungarian Old Italic Old Permic Old Persian cuneiform Old Turkic Palmyrene 'Phags-pa Phoenician Psalter Pahlavi Runic Siddham Tangut South Arabian Ugaritic

Notational scripts	Duployan SignWriting

Symbols	Cultural, political, and religious symbols Currency Mathematical operators and symbols Phonetic symbols (including IPA) Emoji

Retrieved from "https://en.wikipedia.org/w/index.php?title=Unicode_anomaly&oldid=677382609"

Categories:

Unicode

Hidden categories:

Oct	NOV	Dec
	16
2015	2016	2017

Unicode anomaly

Anomalies[edit]

References[edit]

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Interaction

Tools

Print/export

Languages