GB stroke-based order

The GB stroke-based order, full name GB13000.1 Character Set Chinese Character Order (Stroke-Based Order) (GB13000.1字符集汉字字序(笔画序)规范), is a standard released by the State Language Commission of China in 1999.[1] It is the current national standard for stroke-based sorting, and has been applied to the arrangement of the List of Commonly Used Standard Chinese Characters (通用规范汉字表),[2] and the new versions of Xinhua Zidian[3] and Xiandai Hanyu Cidian,[4] etc.

GB13000.1 is a Chinese national standard equivalent to the international standard of the Chinese character set of ISO/IEC 10646:1993, or Unicode 1.1. [a] It is a large character set of 20,902 Chinese characters used in China, Japan and Korea (CJK).

The standard of GB stroke-based order includes two parts: (a) the sorting rules, and (b) a table with all the CJK characters of GB13000.1 Character Set sorted in standard stroke-based order.

Sorting rules

The rules for sorting Chinese characters are as follows.[5]

1. Rule of stroke counts

First, compare the numbers of strokes of the two Chinese characters to be sorted. If they are different, the one with less strokes is put before the one with more strokes. For example, character 十(2 strokes) is before 干 (3), 沛 (7) before 泣 (8), 爱 (10) before 愛 (13).

The strokes of a character are usually counted by the stroke order, to avoid missing or re-counting some of the strokes. For example, the stroke order of traditional Chinese character 愛 (love) is "㇓㇔㇔㇓㇔㇇㇔㇟㇔㇔㇓㇇㇏", altogether 13 strokes.

2. Rule of stroke orders

When the number of strokes of the two Chinese characters are the same, their corresponding strokes are compared pair by pair according to stroke order to make order of the characters. The strokes of Chinese characters are divided into five groups: 1. heng (héng, 横, horizontal; this group include primary stroke ㇐ and secondary stroke ㇀), 2. shu (shù, 竖, vertical, including primary 丨 and secondary 亅), 3. pie (piě, 撇, left falling, only stroke 丿, no secondary), 4. dian (diǎn, 点, dot, including primary 丶 and secondary ㇏), and 5. zhe (乛, fold, including primary 乛, and secondary ㇕, ㇅, ㇎, ㇡, ㇋, ㇊, ㇍, ㇈, ㇆, ㇌, ㇗, ㇞, ㇉, ㇙, ㇄, ㇟, ㇜, ㇛, ㇁, ㇢, ㇂, etc.). A stroke of group heng is before a stroke of group shu, group shu is before group pie, group pie is before group dian, and group dian is before group zhe. This is the so-called heng-shu-pie-dian-zhe (横竖撇点折) stroke group order.

For example, both characters 二 and 十 have two strokes, and both start with stroke 一. The second stroke of 二 is again 一 of the heng group, while the second stroke in 十 is 丨 of the shu group. According to the heng-shu-pie-dian-zhe order, heng is before shu, hence character 二 is before 十. Similarly, we have: 十 before 厂, 乃 before 又, and 义 before 叉.

In China Mainland, there are two currently effective standards for stroke orders. One is the Stroke Orders of the Commonly-used Standard Chinese Characters (通用规范汉字笔顺规范),[6] with stroke order of each character represented in specific strokes, e.g., 好: ㇛㇓㇐㇇㇚㇐. [b] The other is GB13000.1 Character Set Chinese Character Stroke Orders (GB13000.1字符集汉字笔顺规范),[7] with stroke orders represented in the five group numbers, e.g., 好: 531521. In Taiwan, there is the standard of Handbook of Stroke Orders of Standard Commonly-used Chinese Characters (常用國字標準字體筆順手冊) [8]

3. Rules of primary-secondary strokes

When the number of strokes and the stroke orders of two Chinese characters are the same, compare the primary and secondary strokes pair by pair by stroke orders according to the following rules.

3.1 The primary and secondary stroke forms of the heng, shu and dian groups

The primary stroke form comes before the secondary stroke forms. The order of stroke forms in each group is defined as follows.

Primary stroke form 一 is before secondary stroke ㇀, primary 丨 before secondary 亅, primary 丶 before secondary strokes in the order of ㇏ 乁 乀. For example, 子 is before 孑, 干 before 于, and 夕 before 久.

3.2 The primary and secondary stroke forms of the zhe group

First, sort according to the number of turning points, the stroke with fewer turning points is before the ones with more turning points. For example: 山 (stroke order:㇑㇗㇑) is before 巾 (㇑㇆㇑), because their first stroke are the same, and the second stroke of 山 is ㇗ (with one turning point) while the second stroke of 巾 (i.e.,㇆) has two turning points. Other examples are: 化 is before 仉, and 刀 before 乃.

When the number of turning points are the same, then sort according to the heng-shu-pie-dian order of the starting segments of the two zhe strokes. For example: 幻 and 乣, the last strokes of the characters (㇆ and ㇟) both have 2 turning points, but ㇆ starts with segment 一 while ㇟ starts with 丨. Therefore, 幻 comes before 乣. Similarly, we have 云 before 弌.

When the number of turning points and the starting segment are the same, sort according to the segments after the turning point in the heng-shu-pie-dian order. For instance: 凡 and 及. The second strokes ㇈ and ㇋ both start with 一. but after the first turning points, the second stroke segment of ㇈ is ㇑, while in ㇋ it is ㇓. ㇑ is before ㇓, hence 凡 is before 及.

4. Rules of stroke combinations

When the number of strokes, stroke orders, and primary and secondary strokes are the same, compare the combinations of strokes.

The combinational relationships of strokes are divided into separation, connection and intersection. Separation is before connection, and connection before intersection. For example, character 八 is before 人, and 人 is before 乂.

When they are both connected, sort according to the modes of connection. Head-head connected strokes go before tail-head connected strokes, tail-head connected before tail-tail connected, and tail-tail connected before body connected. For example, 目 is before 且, where the last stroke 一 of 且 is in body connection. If both characters are in body connection, then sort according to the strokes being connected in the order of heng-shu-pie-dian-zhe, for example, 人 is before 入.

When there is only a difference in the location of strokes separation between the two characters, upper part separation is before lower part separation, and left-right separation is before up-down separation. For example: 玊 is before 玉, and 埒 is before 埓.   When two characters differ only in the proportion of stroke length, the short-long proportion is before long-short proportion. Such as: 未 is before 末, 土 before 士.

5. Rules of structures

If two characters have the same number of strokes, stroke orders, primary and secondary strokes, and stroke combinations, they are ordered according to their (component) structures. A character in left-right structure is before a character in up-down structure, the left-center-right structure is before the up-middle-bottom structure, and the up-down structure is before the surrounding structure. For example: 旼 is before 旻, 嚻 is before 囂, and 旮 is before 旭. When the structures are the same, but the sizes of the whole characters are different, the smaller character comes first. Such as, 口 is before 囗.

Chinese characters should be sorted based on their real forms, employing the sorting rules one by one in the previous order. Make sure that a rule is used only when sorting can not be fulfilled by the rules before it, until all characters are properly ordered.

The sorting rules for the GB13000.1 Chinese Characters Set are applicable to other character sets as well. When the characters increase, the rules can be increased accordingly, and the character orders of different character sets can be kept compatible with each other.

Table of the GB13000.1 Character Set in stroke-based order

In this table (Chinese name: GB13000.1字符集汉字字序表), all the 20,902 CJK (China, Japan and Korea) Chinese characters are sorted in standard order, covering over 700 A4 pages. Each character is represented by an entry, with the contents of: "serial number, Chinese character, number of strokes, stroke order, and Unicode, etc". For example, the entry of character 札 is “407, 札, 5, 12345, 672D”. Stroke order is in numerical form of 1, 2, 3, 4, 5 representing the five groups of heng, shu, pie, dian and zhe respectively. A sample of the first two pages of the table can be found on the Web .[9]

To look up a character (such as character 福, blessing, good fortune) in the table, use the sorting rules in the same order. First, according to Rule 1, count the number of strokes (福, ㇔㇇㇑㇔㇐㇑㇕㇐㇑㇕㇐㇑㇐, has 13 strokes), then turn to a page with characters of 13 strokes (all characters are first sorted by stroke counts). Secondly, convert the stroke order to the digital form (福, ㇔(4) ㇇(5) ㇑(2) ㇔(4) ㇐(1) ㇑(2) ㇕(5) ㇐(1) ㇑(2) ㇕(5) ㇐(1) ㇑(2) ㇐(1), 4524125125121), then look up the target character with that stroke order among the characters of the stroke count (福 is on page 425 of the book, character number 12197). Among the 13-stroke characters, only "禊 福 禋 禖" start with 45241. That means only checking the first 5 strokes will reduce the search range to 4 characters, from which you can quickly find the target character without checking more strokes. And a skillful student can directly compare the groups of strokes between two characters stroke by stroke until the target is found.

Comments

Radical-based sorting, sound-based sorting, four-corner sorting and stroke-based sorting are the methods commonly used in modern Chinese dictionaries. Among them, stroke-based sorting is usually used in all the other methods to improve their performance: in radical-based sorting to sort the index of radicals, the characters in each radical family, as well as the index of characters difficult to look up; in sound-based sorting to sort homophone characters; in four-corner sorting to sort characters of the same code.[10] That means stroke-based sorting is indispensable for Chinese lexicography.

GB13000.1 Character Set Chinese Character Order (Stroke-Based Order) has greatly improved the accuracy of stroke-based ordering by adding more layers of sorting rules, making it possible to sort large character set with high accuracy without support from other sorting methods. But the involvement of many layers (or tiers) of rules and comparisons make word lookup very time-consuming.

The best thing is to use only one tier of sorting only, like what happens in the English alphabetical sorting. Or at most use two tiers for accurate sorting, while making sure that the user can look up a character or word conveniently with the first layer only. To this end, a new comer has made a quite good contribution: The YES method[11] is a simplified stroke-based sorting method free of stroke counting and grouping. And it has been successfully applied to the indexing of all the characters in Xinhua Zidian (新华字典) and Xiandai Hanyu Cidian (现代汉语词典), as well as the 20,902 Unicode CJK characters.[12]

See also

Notes

  1. ^ see GBK (character encoding) (paragraph 1).
  2. ^ printed in red color in the original document.

References

  1. ^ People's Republic of China, National Language Commission (October 1, 1999). GB13000.1 Character Set Chinese Character Order (Stroke-Based Order) (GB13000.1字符集汉字字序(笔画序)规范) (PDF) (in Chinese). Shanghai Education Press. ISBN 7-5320-6674-6.
  2. ^ 国务院关于公布《通用规范汉字表》的通知 (in Chinese). State Council of the People's Republic of China. 5 June 2013.
  3. ^ Linguistic Institute, the Social Science Academy of China (中国社会科学院语言研究所) (2020). Xinhua Zidian (Xinhua Chinese Character Dictionary) (in Chinese) (12th ed.). Beijing: 商务印书馆 (Commercial Press). ISBN 978-7-100-17093-2.
  4. ^ Linguistic Institute, the Social Science Academy of China (中国社会科学院语言研究所) (2016). Xiandai Hanyu Cidian (Modern Chinese Dictionary) (in Chinese) (7th ed.). Beijing: 商务印书馆 (Commercial Press). ISBN 978-7-100-12450-8.
  5. ^ People's Republic of China 1999, pp. 3–4.
  6. ^ PRC, the National Language Commission (2021). Stroke Orders of Commonly-used Standard Chinese Characters (通用规范汉字笔顺规范) (in Chinese). Beijing: The Commercial Press (商务印书馆). ISBN 978-7-100-19347-4.
  7. ^ PRC, National Language Commission (October 1, 1999). "GB13000.1 Character Set Chinese Character Stroke Order (《GB13000.1字符集汉字笔顺规范》)" (PDF) (in Chinese). Shanghai Education Press.
  8. ^ Mandarin Promotion Committee of the Ministry of Education (教育部國語推行委員會) (1996). "Handbook of Stroke Orders of Standard Commonly-used Chinese Characters) (常用國字標準字體筆順手冊". Taipei: Ministry of Education.
  9. ^ People's Republic of China 1999, pp. 5–6.
  10. ^ Wang, Ning (王寧,鄒曉麗) (2003). 工具書 (Reference Books) (in Chinese). Hong Kong: 和平圖書有限公司. p. 24. ISBN 962-238-363-7.
  11. ^ Zhang, Xiaoheng et. al (张小衡 李笑通) (2013). 一二三笔顺检字手册 (Handbook of the YES Indexing Method) (in Chinese). Beijing: 语文出版社 (The Language Press). ISBN 978-7-80241-670-3.
  12. ^ Zhang, Xiaoheng (2015). "Building a collation element table for a large Chinese character set in YES". Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - Lecture Notes in Computer Science. Switzerland: Springer Verlag. pp. 3–14. ISBN 9783319258157.