Share to: share facebook share twitter share wa share telegram print page

Unicode input

The KCharSelect character mapping tool shown displaying a subset of the Unicode Mathematical Operators
The Unicode logo

Unicode input is the insertion of a specific Unicode character on a computer by a user; it is a common way to input characters not directly supported by a physical keyboard. Unicode characters can be produced either by selecting them from a display or by typing a certain sequence of keys on a physical keyboard. In addition, a character produced by one of these methods in one web page or document can be copied into another. In contrast to ASCII's 96 element character set (which it contains), Unicode encodes hundreds of thousands of graphemes (characters) from almost all of the world's written languages and many other signs and symbols besides.[1][better source needed]

A Unicode input system must provide for a large repertoire of characters, ideally all valid Unicode code points. This is different from a keyboard layout which defines keys and their combinations only for a limited number of characters appropriate for a certain locale.

Unicode numbers

Unicode characters are distinguished by code points, which are conventionally represented by "U+" followed by four, five or six hexadecimal digits, for example U+00AE or U+1D310. Characters in the Basic Multilingual Plane (BMP), containing modern scripts – including many Chinese and Japanese characters – and many symbols, have a 4-digit code. Historic scripts, but also many modern symbols and pictographs (such as emoticons, emojis, playing cards and many CJK characters) have 5-digit codes.

Glyph availability

Glyph 0 must be assigned to a .notdef glyph. The .notdef glyph is very important for providing the user feedback that a glyph is not found in the font. This glyph should not be left without an outline as the user will only see what looks like a space if a glyph is missing and not be aware of the active font’s limitation.[2]

Recommendations for OpenType Fonts (Microsoft.com)

An application can display a character only if it can access a computer font which contains a glyph for that character.[3] Fonts usually have incomplete Unicode coverage; most only contain the glyphs needed to support a few writing systems. However, most modern browsers and other text-processing applications are able to display multilingual content because they perform font substitution, automatically switching to a fallback font when necessary to display characters which are not supported in the current font. Which fonts are used for fallback and the thoroughness of Unicode coverage varies by software and operating system; some software will search for a suitable glyph in all of the installed fonts, others only search within certain fonts.

If an application does not have access to a glyph, the character will usually be shown as the font's .notdef glyph ⟨􏿮⟩ which often appears as an empty box, ☐ (nicknamed "tofu" based on the shape), a box with an X in it, ☒, a diamond with a question mark, �, or a box with a question mark in it, ⍰.

Techniques

Extended keyboard mapping

Most operating systems support extended keyboard mapping – the facility to increase the repertoire of characters available using techniques such as Alternate graphic ("AltGr") that gives a third and fourth meaning to every key; Compose key (sometimes called multi key), a key on a computer keyboard that indicates that the following (usually 2 or more) keystrokes trigger the insertion of an alternate character, typically a precomposed character or a symbol;[4] dead keys typically used to attach a specific diacritic to a base letter;[5] or indeed combinations of these.

These techniques facilitate entry of character sets beyond the basic set provided as standard with the computer.

Selection from a screen

GNOME Character Map

Many systems provide a way to select Unicode characters visually. ISO/IEC 14755 refers to this as a screen-selection entry method.[6]

Microsoft Windows has provided a Unicode version of the Character Map program, appearing in the consumer edition since XP. This is limited to characters in the Basic Multilingual Plane (BMP). Characters are searchable by Unicode character name, and the table can be limited to a particular code block.[7] Starting with Windows 10 Microsoft Windows also contains so called "emoji keyboard". It can be started by holding down the Windows key (the one with the Windows symbol on it) and hitting the period key. The emoji keyboard allows entering of emojis as well as symbols.[8]

More advanced third-party tools of the same type are also available (a notable freeware example is BabelMap, which supports all Unicode characters). On most Linux desktop environments, equivalent tools – such as gucharmap (GNOME) or kcharselect (KDE) – are available.[9]

Generally these tools let the user "copy" the selected characters into the clipboard, and then paste them into the document, rather than pretending to directly type them.

It is often practical to just find the desired character on the web or in another document, and copy and paste it from there.

Decimal input (Alt codes)

Some programs running in Microsoft Windows, including recent versions of Word and Notepad, can produce characters from their Unicode code points expressed in decimal and entered on the numeric keypad with the Alt key held down. For example, the Euro sign has 20AC as its hexadecimal code point, which is 8364 in decimal, so Alt+8364 will produce the symbol. Similarly, Alt+120132 produces the double-struck (blackboard bold) character 𝕄.

Decimal code points in the range 160 –255 must be entered with a leading zero (so that the Windows code page is chosen) and furthermore the Windows code page must be set to match Unicode (CP1252 must be used[a]). For example, Alt+0247 yields a ÷, corresponding to its code point, but the character produced by Alt+247 depends on the OEM code page, such as Code page 437, and may yield a . Also Alt+0128 through Alt+0159 yield the characters assigned in rows 8 and 9 in the CP1252 layout, rather than the C1 control codes that are assigned to those numbers in Unicode.

In programs which were not designed to handle Alt codes over 255, the character retrieved usually corresponds to the remainder when the number is divided by 256.[citation needed]

The text editor Vim allows characters to be specified by two-character mnemonics referred to as digraphs. The installed set can be augmented by custom mnemonics defined for arbitrary code points, specified in decimal. For example, as decimal 9881 is equal to hexadecimal 2699, dig Gr 9881 associates "Gr" with U+2699 GEAR.

See below for use of decimal code points in HTML.

Hexadecimal input

Clause 5.1 of ISO/IEC 14755 describes a Basic method whereby a beginning sequence is followed by the hex number representation of the code point and the ending sequence. Most modern systems have some method to emulate this, sometimes limited to four digits (thus only the Basic Multilingual Plane).

In Microsoft Windows

Hexadecimal Unicode input can be enabled by adding a string type (REG_SZ) value called EnableHexNumpad to the registry key HKEY_CURRENT_USER\Control Panel\Input Method and assigning the value data 1 to it. Users will need to log off and back in after editing the registry for this input method to start working. (In versions earlier than Vista, users needed to reboot for it to start working.)

Unicode characters can then be entered by holding down Alt, and typing + on the numeric keypad, followed by the hexadecimal code, and then releasing Alt.[3] This may not work for 5-digit hexadecimal codes like U+1F937. Some versions of Windows may require the digits 0-9 to be typed on the numeric keypad or require NumLock to be on.[citation needed]

In some applications (Word, Notepad and LibreOffice programs) Alt+X will replace the hexadecimal number to the left of the cursor with the matching Unicode character. Unless it is six hexadecimal digits long, the code must not be preceded by any digit or letters a–f as they may be treated as part of the code to be converted. For example, entering af1 followed by Alt+X (or Alt+C if using a French version) will produce '૱' (U+0AF1), but entering a0000f1 followed by Alt+X will produce 'añ' ('a' followed by character U+00F1).

This facility enables Unicode characters to be entered in other applications: one can create a desired character in Notepad, for example, and then cut and paste it wherever desired.

In MacOS

Hex input of Unicode must be enabled. In Mac OS 8.5 and later, one can choose the Unicode Hex Input keyboard layout; in OS X (10.10) Yosemite, this can be added in Keyboard → Input Sources.

Holding down ⌥ Option, one types the four-digit hexadecimal Unicode code point and the equivalent character appears; one can then release the ⌥ Option key.[10] Characters outside of the BMP (the Basic Multilingual Plane) exceed the four-digit limit of the Unicode hex input mechanism but can be entered by using surrogate pairs: holding down the ⌥ Option key while entering the first surrogate, the +, the second surrogate, then releasing the Option key.

In X11 (Linux and other Unix variants including ChromeOS)

In many applications one or both of the following methods work to directly input Unicode characters:

  • Holding Ctrl+⇧ Shift and typing u followed by the hex digits, then releasing Ctrl+⇧ Shift.
  • Entering Ctrl+⇧ Shift+u, releasing, then typing the hex digits and pressing ↵ Enter (or Space or even, on some systems, pressing and releasing ⇧ Shift or Ctrl).[11]

This is supported by GTK and Qt applications, and possibly others. In ChromeOS, this is an operating system function.[11]

In platform-independent applications

  • In Emacs, Ctrl+x8Return invokes the insert-char command, which accepts input either via hex code point or unicode char name.
  • In LibreOffice 5.1 onwards, the Alt+X method described above for Windows works.
  • In Opera versions that use the Presto layout engine—i.e. up to and including version 12.xx—, entering the hexadecimal number of the desired symbol or character and then pressing Ctrl+⇧ Shift+x (alternative shortcut Meta+⇧ Shift++x on macOS).
  • In the Vim editor, in insert mode, the user first types Ctrl+V u (for codepoints up to 4 hex digits long; using Ctrl+V ⇧ Shift+U for longer), then types in the hexadecimal number of the symbol or character desired, and it will be converted into the symbol. (On Microsoft Windows, Ctrl+Q may be required instead of Ctrl+V.[12])
  • In AutoCAD \U2300 or three shortcuts %%c, %%d, %%p.

HTML

In HTML and XML, character codes to be rendered as characters are prefixed by ampersand and number sign (&#), and are followed by a semicolon (;). The code point can be either in decimal or in hexadecimal; in the latter case it is preceded by an "x". Leading zeros may be omitted. A number of characters may be represented by a named entity.

Example: In HTML/XML, the copyright sign © (U+00A9) may be coded as:

  • © (decimal code point)
  • © (hexadecimal code point)
  • © (entity name)

This works in many pieces of software that accept HTML markup, such as Thunderbird and Wikipedia editing.

See also

Notes

  1. ^ CP1252 is the default in North and South America including the Caribbean islands, Western Europe, Central and Southern Africa, Australia, New Zealand, and the (former) European colonies and possessions in Oceania

References

  1. ^ Lafontaine, Sylvain (February 17, 2012). "Unicode vs ASCII difference and benefits". MSDN. Retrieved 28 February 2014.
  2. ^ "Recommendations for OpenType Fonts". Microsoft.com.
  3. ^ a b Andrew Marcuse, "How to enter Unicode characters in Microsoft Windows". Access date: September 13, 2012
  4. ^ "Linux Keyboard Text Symbols: Compose-Key Shortcuts". FSymbols. 2013-07-24. Retrieved 2015-07-07.
  5. ^ "Dead Key | Definition of Dead Key by Merriam-Webster". Merriam-webster.com. Retrieved 2017-05-01.
  6. ^ "ISO/IEC 14755:1997 Information technology -- Input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input device". ISO. Retrieved 2017-10-14.
  7. ^ "How to Use Special Characters in Windows Documents". support.microsoft.com. Jul 31, 2019. Retrieved 2020-10-17.
  8. ^ "Windows 10 Tip: Get started with the emoji keyboard shortcut". blogs.windows.com. Feb 5, 2018. Retrieved 2024-06-04.
  9. ^ Peck, Akkana (2009-11-25). "Mastering Characters Sets in Linux (Weird Characters, part 2)". LinuxPlanet. Archived from the original on 2010-11-26. Retrieved 2018-12-05.
  10. ^ Typing special and accented characters Archived 2008-03-09 at the Wayback Machine
  11. ^ a b Jack Busch (April 20, 2018). "Type Special Characters with a Chromebook (Accents, Symbols, Em Dashes)". groovypost.com. Retrieved February 28, 2020.
  12. ^ Vim documentation: gui_w32

Read other articles:

1994 single by Brandy I Wanna Be DownSingle by Brandyfrom the album Brandy ReleasedSeptember 5, 1994 (1994-09-05)Recorded1993[1]Genre R&B hip hop soul Length4:51LabelAtlanticSongwriter(s) Keith Crouch Kipper Jones Producer(s)Keith CrouchBrandy singles chronology I Wanna Be Down (1994) Baby (1994) I Wanna Be Down is the debut single of American recording artist Brandy from her self-titled debut album (1994). It was written by musicians Keith Crouch and Kipper Jones, wit…

Cristiano Ronaldo, vincitore del FIFA World Player 2008 L’edizione 2008 del FIFA World Player, 18ª edizione del premio calcistico istituito dalla FIFA, fu vinta dal portoghese Cristiano Ronaldo (Manchester United) e, per la terza volta consecutiva, dalla brasiliana Marta (Umeå).[1] A votare per la graduatoria maschile furono 310 giurati, di cui 155 commissari tecnici e altrettanti capitani, mentre per quella femminile furono 278, di cui 139 commissari tecnici e altrettanti capitani.&…

Гіперболічний об'єм вісімки дорівнює 2,0 298 832 В теорії вузлів гіперболічний об'єм гіперболічного зачеплення дорівнює об'єму доповнення зачеплення відносно його повної гіперболічної метрики. Об'єм обов'язково є скінченним дійсним числом. Гіперболічний об'єм негіпербо…

Resolusi 815Dewan Keamanan PBBPasukan PBB di SarajevoTanggal30 Maret 1993Sidang no.3.189KodeS/RES/815 (Dokumen)TopikKroasiaRingkasan hasil15 mendukungTidak ada menentangTidak ada abstainHasilDiadopsiKomposisi Dewan KeamananAnggota tetap Tiongkok Prancis Rusia Britania Raya Amerika SerikatAnggota tidak tetap Brasil Tanjung Verde Djibouti Spanyol Hungaria Jepang Maroko Selandia Baru Pakistan Venezuela Resolusi 815…

Arnoud Nijssen als brancardier tijdens zijn jaarlijkse bedevaart naar Lourdes (2019). Arnoud Maria Frans Nijssen (Sint-Truiden, 14 april 1994) is een Belgisch politicus voor de CD&V. Sinds 1 januari 2019 is hij schepen in Nieuwerkerken. Levensloop Van opleiding leraar wiskunde en geschiedenis, richtte hij in 2017 samen met Pieter Neirinckx Dépot d'histoires op. Bij de gemeenteraadsverkiezingen van 14 oktober 2018 stond Nijssen als nieuwe kandidaat op de lijst van CD&V. Tegen de verwacht…

1973 studio album by The Seldom SceneAct IIIStudio album by The Seldom SceneReleased1973 (1973)RecordedJuly 15 and 21, 1973StudioITI Recordings, Inc.GenreBluegrass, progressive bluegrassLabelRebelProducerGary B. ReidThe Seldom Scene chronology Act II(1973) Act III(1973) Old Train(1973) Professional ratingsReview scoresSourceRatingAllmusic[1] Act III is the third album by the progressive bluegrass Maryland band The Seldom Scene. The album features the band in their classic li…

Кривцов Сергій Іванович Народився 1802Помер 5 травня 1864(1864-05-05)Брати, сестри Nikolay Krivtsovd і Pavel Ivanovich Krivtsovd  Медіафайли у Вікісховищі Кривцов Сергій Іванович. 1828рік. Акварель Бестужева М. О. Автограф Кривцова. Сергі́й Іва́нович Кривцо́в (1802 — 5 травня 1864) — підпоручик л

Koněprusy Koněprusy (Tschechien) Basisdaten Staat: Tschechien Tschechien Region: Středočeský kraj Bezirk: Beroun Fläche: 603,8141[1] ha Geographische Lage: 49° 55′ N, 14° 4′ O49.92055555555614.065368Koordinaten: 49° 55′ 14″ N, 14° 3′ 54″ O Höhe: 368 m n.m. Einwohner: 257 (1. Jan. 2023)[2] Postleitzahl: 266 01 Kfz-Kennzeichen: S Verkehr Straße: Králův Dvůr – Liteň Nächster int. Flughafe…

Augustus G. Paine Jr.BornAugustus Gibson Paine Jr.(1866-10-19)October 19, 1866New York City, New York, U.S.DiedOctober 23, 1947(1947-10-23) (aged 81)New York City, New York, U.S.Spouses Maud Eustis Potts ​ ​(m. 1888; died 1919)​ Francisca Machado Warren ​ ​(m. 1923)​ Children6Parent(s)Augustus G. Paine Sr.Charlotte M. Bedell PaineRelativesGeorge Eustis Paine (grandson)Molly McGreevy (granddaughter) Augustus G…

Sports facility in Greece O.A.C.A. Olympic Indoor HallNikos Galis Olympic Indoor HallAthens Olympic Indoor HallFull nameO.A.C.A. Indoor Sports CenterLocationAthens Olympic Sports Complex, Marousi, Athens, GreeceCoordinates38°02′16″N 23°47′05″E / 38.037862°N 23.784676°E / 38.037862; 23.784676Public transit EiriniOwnerPanathinaikos B.C.OperatorPanathinaikos B.C.CapacityGymnastics: 17,600Basketball: 19,443SurfaceParquetConstructionOpened1994Renovated2002–2004 (…

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (نوفمبر 2020) ليزلي وينستون معلومات شخصية الميلاد 13 مايو 1956 (67 سنة)  مواطنة الولايات المتحدة  الحياة العملية المهنة ممثلة،  وممثلة تلفزيونية  اللغة الأم الإنجليز…

Cet article est une ébauche concernant le baseball. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Suède Données clés Couleurs bleu, jaune et blanc Premier match officiel 1962 Uniformes modifier L'équipe de Suède de baseball représente la Fédération de Suède de baseball lors des compétitions internationales, comme le Championnat d'Europe de baseball ou la Coupe du monde de baseball notamment. Le prochain…

Keuskupan SabahLambang Keuskupan SabahLokasiProvinsi gerejawiProvinsi Asia TenggaraKediakonan agungKediakonan Agung Pesisir Barat dan LabuanKediakonan TengahKediakonan Pesisir TimurInformasiKatedralKatedral Semua Orang Kudus, Kota KinabaluKepemimpinan kiniUskupRt Revd Datuk Melter Jiki TaisSitus webanglicansabah.org Keuskupan Sabah adalah sebuah keuskupan Anglikan yang meliputi Sabah dan Labuan di Malaysia. Didirikan pada 1962, takhta tersebut awalnya adalah bagian dari Keuskupan Labuan dan…

اضغط هنا للاطلاع على كيفية قراءة التصنيف الأونايصورات   حالة الحفظ أنواع منقرضة المرتبة التصنيفية فصيلة  التصنيف العلمي النطاق: حقيقيات النوى المملكة: حيوانات العويلم: ثنائيات التناظر الشعبة: الحبليات الشعيبة: الفقاريات الطائفة: الزواحف (غير مصنف) العظائيات (غير مصنف)

А́ккрські Конфере́нції — конференції народів Африки, що відбулись у столиці Гани Аккрі 1958. Перша Аккрська Конференція незалежних країн Африки — ОАР, Судану, Лівії, Ефіопії, Марокко, Ліберії, Гани і Тунісу (15—22 квітня) підтвердила резолюції Бандунгської та Каїрської кон…

Intercollegiate sports teams of the University of Cincinnati Cincinnati BearcatsUniversityUniversity of CincinnatiConferenceBig 12 ConferenceAmerican Athletic Conference (women's lacrosse)NCAADivision I (FBS)Athletic directorJohn CunninghamLocationCincinnati, OhioVarsity teams18Football stadiumNippert StadiumBasketball arenaFifth Third ArenaBaseball stadiumUC Baseball StadiumOther venuesArmory Fieldhouse, Heritage Bank CenterMascotBearcatNicknameBearcatsFight songCheer CincinnatiColorsRed a…

1991 television film directed by Charles Correll This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Deadly Desire – news · newspapers · books · scholar · JSTOR (May 2019) (Learn how and when to remove this template message) Deadly DesireGenreDramaWritten byJerrold L. LudwigTobi LudwigDirected byCharles CorrellStar…

Defunct railway in Ontario Not to be confused with Canadian Northern Railway. Map A map showing the route of the Northern at its maximum extent in the late 1800s. Only the portion from Toronto to Barrie and a small section running west remain in service, while the section north of Orillia has merged with another line. The Northern Railway of Canada was a railway in the province of Ontario, Canada. It was the first steam railway to enter service in what was then known as Upper Canada. It was even…

USS ソーフィッシュ 基本情報建造所 ポーツマス海軍造船所運用者 アメリカ海軍艦種 攻撃型潜水艦 (SS)級名 ガトー級潜水艦艦歴起工 1942年1月20日[1]進水 1942年6月23日[1]就役 1942年8月30日[1]退役 1946年6月20日[1]除籍 1960年4月1日[1]その後 1960年12月2日、スクラップとして売却[1]要目水上排水量 1,525 トン[2]水中排水量 2,424 トン[2]全長 …

Indonesian holding company Kompas Gramedia GroupKompas Gramedia Building in JakartaTypePrivateIndustryConglomerateFounded17 August 1963FounderP.K. OjongJakob OetamaHeadquartersSouth Palmerah road 22-26, Jakarta, IndonesiaArea servedIndonesiaKey peopleLilik Oetama (Chairman and CEO)ProductsMediaPropertyEvent organizerOwnerOetama familySubsidiariesKG Media(Kompas Media Nusantara, Gramedia Pustaka Utama, dll)Dyandra PromosindoSantika Indonesia Hotels & ResortsMultimedia Nusantara UniversityWebs…

Kembali kehalaman sebelumnya

Lokasi Pengunjung: 3.135.215.53