The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode 16.0, Unicode defines a total of 97,680 characters.[1]
Until the early 20th century, Vietnam also used Chinese characters (Chữ Nôm), so sometimes the abbreviation CJKV is used.
Sources
The Ideographic Research Group (IRG) is responsible for developing extensions to the encoded repertoires of CJK unified ideographs. IRG processes proposals for new CJK unified ideographs submitted by its member bodies, and after undergoing several rounds of expert review, IRG submits a consolidated set of characters to ISO/IEC JTC 1/SC 2 Working Group 2 (WG2) and the Unicode Technical Committee (UTC) for consideration for inclusion in the ISO/IEC 10646 and Unicode standards. The following IRG member bodies have been involved in the standardization of CJK unified ideographs:
The ideographs submitted by the UTC and the United Kingdom are not specific to any particular region, but are characters which have been suggested for encoding by individual experts. The ideographs submitted by SAT are required for the SAT Daizōkyō text database.
The table below gives the numbers of encoded CJK unified ideographs for each IRG source for Unicode 16.0.[2] The total number of characters (260,840) far exceeds the number of encoded CJK unified ideographs (97,680) as many characters have more than one source.
The basic block named CJK Unified Ideographs (4E00–9FFF) contains 20,992 basic Chinese characters in the range U+4E00 through U+9FFF. The block not only includes characters used in the Chinese writing system but also kanji used in the Japanese writing system, hanja in Korea, and chữ Nôm characters in Vietnamese. Many characters in this block are used in all three writing systems, while others are in only one or two of the three. The first 20,902 characters in the block are arranged according to the Kangxi Dictionary ordering of radicals. In this system the characters written with the fewest strokes are listed first. The remaining characters were added later, and so are not in radical order.
The block is the result of Han unification,[4] which was somewhat controversial within East Asia.[5] Since Chinese, Japanese and Korean characters were coded in the same location, the appearance of a selected glyph could depend on the particular font being used. However, the source separation rule states that characters encoded separately in an earlier character set would remain separate in the new Unicode encoding.[6]
Using variation selectors, it is possible to specify certain variant CJK ideograms within Unicode.[7] The Adobe-Japan1 character set, which has 14,684 ideographic variation sequences,[8] is an extreme example of the use of variation selectors.[9]
Note: Most characters appear in multiple sources, so the sum of individual character counts (108,480) is far greater than the number of encoded characters (20,992).[10]
In Unicode 4.1, 14 HKSCS-2004 characters and 8 GB 18030 characters were assigned to between U+9FA6 and U+9FBB code points. Since then, other additions were added to this block for various reasons, all summarized in the version history section below.
Note: Most characters appear in more than one source, so the sum of individual character counts (23,954) is far greater than the number of encoded characters (6,592).[10]
The block named CJK Unified Ideographs Extension B (20000–2A6DF) contains 42,720 characters in the range U+20000 through U+2A6DF. These include most of the characters used in the Kangxi Dictionary that are not in the basic CJK Unified Ideographs block, as well as many Hán-Nôm characters that were formerly used to write Vietnamese.
Note: Many characters appear in more than one source, so the sum of individual character counts (99,784) is far greater than the number of encoded characters (42,720).[10]
The block named CJK Unified Ideographs Extension C (2A700–2B73F) contains 4,154 characters in the range U+2A700 through U+2B739. It was initially added in Unicode 5.2 (2009).
Note: Some characters appear in more than one source, so the sum of individual character counts (4,634) is greater than the number of encoded characters (4,154).[10]
The block named CJK Unified Ideographs Extension D (2B740–2B81F) contains 222 characters in the range U+2B740 through U+2B81D that were added in Unicode 6.0 (2010).
Note: Some characters appear in more than one source, so the sum of individual character counts (239) is greater than the number of encoded characters (222).[10]
The block named CJK Unified Ideographs Extension E (2B820–2CEAF) contains 5,762 characters in the range U+2B820 through U+2CEA1 that were added in Unicode 8.0 (2015).
Note: Some characters appear in more than one source, so the sum of individual character counts (5,919) is greater than the number of encoded characters (5,762).[10]
The block named CJK Unified Ideographs Extension F (2CEB0–2EBEF) contains 7,473 characters in the range U+2CEB0 through 2EBE0 that were added in Unicode 10.0 (2017). It includes more than 1,000 Sawndip characters for Zhuang.
Note: Some characters appear in more than one source, so the sum of individual character counts (7,775) is greater than the number of encoded characters (7,473).[10]
Note: Some characters appear in more than one source, so the sum of individual character counts (5,081) is greater than the number of encoded characters (4,939).[10]
Note: Some characters appear in more than one source, so the sum of individual character counts (4,309) is greater than the number of encoded characters (4,192).[10]
Note: Some characters appear in more than one source, making the sum of individual character counts (625) more than the number of encoded characters (622).[10]
ID system of the Ministry of Public Security of China, 2023
622
622
Japan
JMJ
Character Information Development and Maintenance Project for e-Government "MojiJoho-Kiban Project" (文字情報基盤整備事業)
1
1
n/a
UTC
UTC sources
2
2
CJK Compatibility Ideographs
The block named CJK Compatibility Ideographs (F900–FAFF) was created to retain round-trip compatibility with other standards.
However, twelve characters in this block actually have the "Unified Ideograph" property: U+FA0E 﨎, U+FA0F 﨏, U+FA11 﨑, U+FA13 﨓, U+FA14 﨔, U+FA1F 﨟, U+FA21 﨡, U+FA23 﨣, U+FA24 﨤, U+FA27 﨧, U+FA28 﨨, and U+FA29 﨩.[1] None of the other characters in this and other "Compatibility" blocks relate to CJK unification.
While 龜 and 亀 are not considered unifiable, U+FA20蘒CJK COMPATIBILITY IDEOGRAPH-FA20 is considered a duplicate to U+8612蘒CJK UNIFIED IDEOGRAPH-8612.
Note: All characters appear in more than one source, so the sum of individual character counts (40) is greater than the number of encoded characters (12).[10]
The character U+4039 (䀹) was a unification of two different characters (one with jiā 夾 phonetic and one with shǎn 㚒 phonetic) until Unicode 5.0. However, they were lexically different characters that should not have been unified; they have different pronunciations and different meanings.
The proposal of disunification of U+4039[16] was accepted for Unicode 5.1, encoding a new character at U+9FC3 (鿃) to represent shǎn.
Other 3 glyphs in Extension B
In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B (𠅻), U+204AF (𠒯) and U+24CB2 (𤲲). The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones.[17]
Unifiable variants and exact duplicates
Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded by mistake.[18] Additionally, an ISO/IEC JTC 1/SC 2 report has found that six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a de facto disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake:[19]
U+34A8 㒨 = U+20457 𠑗 : U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8
U+3DB7 㶷 = U+2420E 𤈎 : same glyph shapes
U+8641 虁 = U+27144 𧅄 : U+27144 is the same as the Korean-source glyph for U+8641, but it is significantly different from the Chinese Mainland-, Taiwan- and Japan-source glyphs for U+8641
U+204F2 𠓲 = U+23515 𣔕 : same glyph shapes, but ordered under different radicals
U+249BC 𤦼 = U+249E9 𤧩 : same glyph shapes
U+24BD2 𤯒 = U+2A415 𪐕 : same glyph shapes, but ordered under different radicals
U+26842 𦡂 = U+26866 𦡦 : same glyph shapes
U+FA23 﨣 = U+27EAF 𧺯 : same glyph shapes (U+FA23 﨣 is a unified CJK ideograph, despite its name "CJK COMPATIBILITY IDEOGRAPH-FA23.")
Other CJK ideographs in Unicode, not Unified
Apart from the ten blocks of "Unified Ideographs," Unicode has about a dozen more blocks with not-unified CJK-characters. These are mainly CJK radicals, strokes, punctuation, marks, symbols and compatibility characters. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different. An example of a not-unified CJK-character is U+3007〇IDEOGRAPHIC NUMBER ZERO in the CJK Symbols and Punctuation block. Although it is not covered under "CJK Unified Ideographs", it is treated as a CJK-character for all other intents and purposes.[20]
Four blocks of compatibility characters are included for compatibility with legacy text handling systems and older character sets:
They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means. Therefore, their use is discouraged.
Font support
The blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A, being parts of the Basic Multilingual Plane, are supported by the majority of the CJK fonts. However, Japanese and Korean fonts usually have fewer characters (about 13,000 and 8,000, respectively) than Chinese. Extensions B, C, D are supported by additional fonts MingLiU-ExtB, MingLiU_HKSCS-ExtB, PMingLiU-ExtB, SimSun-ExtB included in Microsoft Windows since Vista.[21]
Unicode version history
CJK unified ideographs additions per Unicode version
^Suzanne Topping,
"The secret life of Unicode". Archived from the original on 2007-11-14. Retrieved 2010-05-12.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Unified
Not unified
Not unified
Not unified
Not unified
Not unified
Not unified
Not unified
12 are unified
Not unified
Not unified
Not unified
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han
Han Common
Han, Hangul, Common, Inherited
Common
Hangul, Katakana, Common
Katakana, Common
Han
Common Hiragana, Common
Han
Stefanie ScottStefanie Scott at the No Strings Attached premiere on January 11, 2011LahirStefanie Noelle Scott6 Desember 1996 (umur 27)Chicago, Illinois, A.S.KebangsaanAmerikaPekerjaanAktris, penyanyiTahun aktif2008–sekarangDikenal atasFlippedA.N.T. FarmSitus webwww.stefaniescott.com Stefanie Noelle Scott[1] (lahir 6 Desember 1996) adalah aktris dan penyanyi asal Amerika Serikat. Filmografi Film Tahun Judul Peran Catatan 2008 Beethoven's Big Break Katie 2010 Flipped Dana ...
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article includes a list of general references, but it lacks sufficient corresponding inline citations. Please help to improve this article by introducing more precise citations. (August 2016) (Learn how and when to remove this template message) This article relies largely or entirely on a single source. Relevant discussion may be found ...
American natural disaster December 2008 Northeastern United States ice stormA tree that fell due to the weight of ice in Troy, New York. TypeIce stormWinter stormFormedDecember 11, 2008DissipatedDecember 12, 2008 Fatalities5[1]Damage~$2.5 to 3.7 billion (2008 USD)Power outages1.7 millionAreas affectedNortheastern United StatesPart of the Winter storms of 2008–09 The December 2008 Northeastern United States ice storm was a damaging ice storm that took out power for millions of people...
Small pedal powered recreational boat This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Pedalo – news · newspapers · books · scholar · JSTOR (July 2009) (Learn how and when to remove this template message) Water bike on Lake St. Clair (Michigan) Pedalo at the Stockholm Exhibition of 1930. A paddle boat on Gene...
Untuk kegunaan lain, lihat Aceh (disambiguasi). Koordinat: 2°20′N 97°50′E / 2.333°N 97.833°E / 2.333; 97.833 Kabupaten Aceh SingkilKabupatenTranskripsi bahasa daerah • Jawoe/Jawiاچيه سيڠكيلTari Dampeng Aceh Singkil LambangMotto: Sekata sepekat(Singkil) Harmonisasi persaudaraan dan toleransiPetaKabupaten Aceh SingkilPetaTampilkan peta SumatraKabupaten Aceh SingkilKabupaten Aceh Singkil (Indonesia)Tampilkan peta IndonesiaKoordinat:...
Mulk Raj AnandLahir(1905-12-12)12 Desember 1905Peshawar, NWFP, India Britania (kini Khyber Pakhtunkhwa, Pakistan)Meninggal28 September 2004(2004-09-28) (umur 98)Pune, Maharashtra, IndiaPekerjaanPenulisAlmamaterUniversitas CambridgeUniversity College LondonKhalsa College, AmritsarPeriodeAbad ke-20Karya terkenalCoolie; UntouchablePenghargaanPenghargaan Sahitya Akademi (1971) Padma Bhushan (1967)Penghargaan Perdamaian Internasional (1953)PasanganKathleen Gelder; Shirin VajifdarTanda...
Penyanyi Malaysia This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: Shake singer – news · newspapers · books · scholar · JSTOR (July 2020) (Learn how and when to remove this message) The veri...
District in Banaadir, SomaliaShibisDistrictCountry SomaliaRegionBanaadirArea • Total67 km2 (26 sq mi)Population • Total947,800Time zoneUTC+3 (EAT) Shibis District (Somali: Degmada Shibis) is a district of the southeastern Banaadir region of Somalia. One of the oldest districts in Mogadishu, it is bordered by Karan District, Yaqshid District, Bondhere District and Abdiaziz District. Of the many notable places in Shibis are: NSS headquarters, Saudi Ar...
✓ الكود المصدري لملف SVG هذا صالح. نظرًا للقيود الفنية، لا يتم عرض الشعار بشكل صحيح إذا كانت معلمة الدقة أكبر من 850 بكسل ملخص مَعْلُومَاتُ وَسَائِطَ غَيْر حُرّة وَتَعْلِيلُ الاسْتِعْمَالِ true لِمَقَالَة ويندوز فندمينتالز فور ليجاسي بيسيز الوَصْفُ شعار ويندوز فندمينتالز فو...
Logo dari Final Fantasy VI Final Fantasy VI (ファイナルファンタジーVIcode: ja is deprecated , Fainaru Fantajī Shikkusu) adalah role-playing game yang dikembangkan dan diproduksi oleh Square Co., Ltd. tahun 1994 untuk konsol permainan Super Famicom. Ia disutradarai oleh Yoshinori Kitase dan Hiroyuki Itō, yang menggantikan pencipta dan produser seri Final Fantasy Hironobu Sakaguchi, yang mensutradarai lima judul sebelumnya. Lagu dan musiknya disusun oleh kontributor kawakan seri FF...
Greek judge, writer and the Prime Minister of Greece (January 1910 – October 1910) Stefanos DragoumisΣτέφανος ΔραγούμηςPrime Minister of GreeceIn office18 January 1910 – 6 October 1910 (o.s.MonarchGeorge IPreceded byDimitrios RallisSucceeded byEleftherios VenizelosMinister of FinanceIn office24 September 1915 – 27 March 1916MonarchConstantine I of GreecePreceded byEmmanouil RepoulisSucceeded byDimitrios Rallis Personal detailsBorn1842Athens, GreeceDied...
Species of bird Dusky antbird female in NW Ecuador Conservation status Least Concern (IUCN 3.1)[1] Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Aves Order: Passeriformes Family: Thamnophilidae Genus: Cercomacroides Species: C. tyrannina Binomial name Cercomacroides tyrannina(Sclater, PL, 1855) Synonyms Cercomacra tyrannina The dusky antbird or tyrannine antbird (Cercomacroides tyrannina) is a passerine bird in the antbird family. It ...
Mesin perata modern dengan blade tambahan di Jyvaskyla, Finlandia Mesin perata modern, di bagian depan Sebuah mesin perata, juga umum disebut perata jalan atau perata motor, adalah alat berat dengan pisau panjang yang digunakan untuk meratakan permukaan dalam proses perataan. Umumnya memiliki tiga as roda, dengan mesin dan kabin berada di atas as roda belakang di satu ujung kendaraan dan as ketiga pada bagian ujung depan kendaraan, dengan bilah berada di antaranya. Di beberapa negara, seperti...
French soldier and politician Alexandre-Théodore-Victor LamethBorn20 October 1760Paris, FranceDied18 March 1829 (1829-03-19) (aged 68)TitleCountRelativesCharles Malo François Lameth (brother)Théodore de Lameth (brother) Alexandre-Théodore-Victor, comte de Lameth (20 October 1760 – 18 March 1829) was a French soldier and politician. Life Alexandre Lameth was born in Paris on 20 October 1760 and was the youngest child of Marie Thérèse de Broglie. His mother was th...
Theory and practice of navigation beyond the Earth's atmosphere Hubble Space Telescope over Earth (during the STS-109 mission) Astronautics (or cosmonautics) is the practice of sending spacecraft beyond Earth's atmosphere into outer space. Spaceflight is one of its main applications and space science is its overarching field. The term astronautics (originally astronautique in French) was coined in the 1920s by J.-H. Rosny, president of the Goncourt academy, in analogy with aeronautics.[1&...
Bård NestengInformasi pribadiLahir14 Mei 1979 (umur 45)Fredrikstad, Norwegia OlahragaNegaraNorwegiaOlahragaPanahan Bård Magnus Nesteng (lahir 14 Mei 1979, di Fredrikstad, Norwegia) adalah seorang atlet panahan asal Norwegia. Ia berkompetisi pada Olimpiade Musim Panas 2000 di Sydney, Olimpiade Musim Panas 2012 di London, dan Olimpiade Musim Panas 2016 di Rio de Janeiro.[1][2] Referensi ^ Bård Nesteng. Olympics at Sports-Reference.com. Sports Reference LLC. Diarsipkan da...
For other ships with the same name, see USS South Carolina. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: USS South Carolina CGN-37 – news · newspapers · books · scholar · JSTOR (February 2018) (Learn how and when to remove this message) USS South Carolina underway on 9 October 1997 History United Sta...
تحتاج هذه المقالة إلى تنسيق لتتناسب مع دليل الأسلوب في ويكيبيديا. فضلًا، ساهم بتنسيقها وفق دليل الأسلوب المعتمد في ويكيبيديا. (يوليو 2019) محفوظ بن محمد نحناح أول رئيس لحركة مجتمع السلم في المنصب5 مايو 1991 – 8 مايو 2003 (12 سنةً و3 أيامٍ) تأسيس الحزب أبو جرة سلطاني معلومات شخصية ...