Homoglyph

The homoglyphs
U+0061 a LATIN SMALL LETTER A and
U+0430 а CYRILLIC SMALL LETTER A overlaid. In the image, both characters are set in Helvetica LT Std Roman.

In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar but may have differing meaning. The designation is also applied to sequences of characters sharing these properties.

In 2008, the Unicode Consortium published its Technical Report #36[1] on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts.

Examples of homoglyphic symbols are (a) the diaeresis and umlaut (both a pair of dots, but with different meaning, although encoded with the same code points); and (b) the hyphen and minus sign (both a short horizontal stroke, but with different meaning, although often encoded with the same code point). Among digits and letters, digit 1 and lowercase l are always encoded separately but in many typefaces are given very similar glyphs, and digit 0 and capital O are always encoded separately but in many typefaces are given very similar glyphs. Virtually every example of a homoglyphic pair of characters can potentially be differentiated graphically with clearly distinguishable glyphs and separate code points, but this is not always done. Typefaces that do not emphatically distinguish the one/el and zero/oh homoglyphs are considered unsuitable for writing formulas, URLs, source code, IDs and other text where characters cannot always be differentiated without context. Fonts which distinguish glyphs by means of a slashed zero, for example, are preferred for those uses.

The term homograph is sometimes misused synonymously with homoglyph, but in the usual linguistic sense, homographs are words that are spelled the same but have different meanings, a property of words, not characters.

Allographs are typeface design variants that look different but mean the same thing – for example ⟨g⟩ and ⟨g⟩, or a dollar sign with one or two strokes. The term synoglyph has a similar but a little more abstract meaning – for example the symbol ⟨£⟩ and the letter ⟨L⟩ (in Lsd) both mean the pound sterling,[2] but only in that context. Allographs and synoglyphs are also known informally as display variants.

Umlaut and diaresis

In the days of early mechanical typewriters these were typed with the same key (using the "backspace and over-type" technique), which was also used for a double inverted comma. However the umlaut originated specifically as a pair of short vertical lines (not two dots) (see Sutterlin). Incidentally the two dots above the letter E in Albanian are described as a diaresis but do not fulfil the function of a diaresis. [3]

0 and O; 1, l and I

Two common and important sets of homoglyphs in use today are the digit zero and the capital letter O (i.e. 0 and O); and the digit one, the lowercase letter L and the uppercase i (i.e. 1, l and I). In the early days of mechanical typewriters there was very little or no visual difference between these glyphs, and typists treated them interchangeably as keyboarding shortcuts. In fact, most keyboards did not even have a key for the digit "1", requiring users to type the letter "l" instead, and some also omitted 0. As these same typists transitioned in the 1970s and 1980s to being computer keyboard operators, their old keyboarding habits continued with them, and was an occasional source of confusion.

Most current type designs carefully distinguish between these homoglyphs, usually by drawing the digit zero narrower and drawing the digit one with prominent serifs. Early computer print-outs went even further and marked the zero with a slash or dot, which led to a new conflict involving the Scandinavian letter "Ø" and the Greek letter Φ (phi). The redesigning of character types to differentiate these characters has meant less confusion. The degree to which two different characters appear the same to a given observer is called the "visual similarity".[4]

Some type designs conform to the DIN 1450 legibility standard by carefully designing such characters to be easy to distinguish: slashed zero to distinguish it from capital O; lowercase l with a tail and uppercase I with serifs to distinguish it from the digit 1; distinguishing the numeral 5 from the capital S; etc.[5]

An example of confusion due to near-homoglyphs arose from the use of a ⟨y⟩ to represent a ⟨þ⟩ (thorn). Early English typesetters imported Dutch typesets that did not contain the latter character, so used the letter ⟨y⟩ instead because (in Blackletter typeface) they look sufficiently similar.[6] It has led in modern times to such phenomena as Ye olde shoppe, implying incorrectly that the word the was formerly written ye /j/ rather than þe. The spelling of the name Menzies (pronounced Mengis and originally spelled Menȝies) arose for the same reason: the letter ⟨z⟩ was substituted for ⟨ȝ⟩ (yogh).

Multi-letter homoglyphs

Letters m and r+n in typefaces Arial, Calibri, Times New Roman, Cambria, Walbaum-Fraktur, and Comic Sans
Stefan Szczotkowski looks like Aeffan Szczotkowski on the gravestone.

Some other combinations of letters look similar, for instance rn looks similar to m, cl looks similar to d, and vv looks similar to w.

In certain narrow-spaced fonts (such as Tahoma), placing the letter c next to a letter such as j, l or i will create a homoglyph, such as cj cl ci (g d a).

When some characters are placed next to each other, seen together at a glance they give the visual impression of another, unrelated character. A more precise way of saying this is that some typographic ligatures can look similar to standalone glyphs. For example, the ligature (fi) can look similar to A in some typefaces or fonts. This potential for confusion is sometimes an argument made against the use of ligatures.[citation needed]

Canonicalization

Homoglyphs of all kinds can be detected through a process called 'dual canonicalization'.[4] The first step in this process is to identify homoglyph sets, namely characters appearing the same to a given observer. From here, a single token is specified to represent the homoglyph set. This token is called a canon. The next step is to convert each character in the text to the corresponding canon in a process called canonicalization. If the canons of two runs of text are the same but the original text is different, then a homoglyph exists in the text.

Homoglyph prevention

Homoglyph attacks can be mitigated through a combination of user awareness and proactive measures. It is crucial to educate users about the risks associated with homoglyph attacks, urging them to meticulously inspect URLs before clicking.[7] Employing advanced security solutions, particularly those capable of scanning for homoglyph variations in domain names, can automate the detection and prevention of potential threats. Additionally, implementing stringent domain name monitoring and registration policies can help identify and neutralize homoglyph-related risks promptly. By fostering a culture of cyber vigilance and leveraging cutting-edge technologies, organizations can fortify their defenses against homoglyph attacks, ensuring a more secure online environment.

Unicode homoglyphs

The three most prominent European alphabets (Greek, Cyrillic and Latin) share many letter forms that are encoded in Unicode under separate code points.

Unicode has code points for many strongly homoglyphic characters, known as "confusables".[1] These present security risks in a variety of situations (addressed in UTR#36)[8] and were called to particular attention in regard to internationalized domain names. In theory at least, one might deliberately spoof a domain name by replacing one character with its homoglyph, thus creating a second domain name, not readily distinguishable from the first, that can be exploited in phishing (see main article IDN homograph attack). In many typefaces, the Greek letter 'Α', the Cyrillic letter 'А' and the Latin letter 'A' are visually identical, as are the Latin letter 'a' and the Cyrillic letter 'а' (the same can be applied to the Latin letters "aBceHKopTxy" and the Cyrillic letters "аВсеНКорТху"). A domain name can be spoofed simply by substituting one of these forms for another in a separately registered name. There are also many examples of near-homoglyphs within the same script such as 'í' (with an acute accent) and 'i' (with a tittle), É (E-acute) and Ė (E dot above) and È (E-grave), Í (cpaital I with an acute accent) and ĺ (Lowercase L with acute). When discussing this specific security issue, any two sequences of similar characters may be assessed in terms of its potential to be taken as a 'homoglyph pair', or if the sequences clearly appear to be words, as 'pseudo-homographs' (noting again that these terms may themselves cause confusion in other contexts). In the Chinese language, many simplified Chinese characters are homoglyphs of the corresponding traditional Chinese characters.

Efforts by TLD registries and Web browser designers aim to minimize the risks of homoglyphic confusion. Commonly, this is achieved by prohibiting names which mix character sets from multiple languages (toys-Я-us.org, using the Cyrillic letter Я, would be invalid, but wíkipedia.org and wikipedia.org still exist as different websites); Canada's .ca registry goes one step further by requiring names which differ only in diacritics to have the same owner and same registrar.[9] The handling of Chinese characters varies: in .org and .info registration of one variant renders the other unavailable to anyone, while in .biz the traditional and simplified versions of the same name are delivered as a two-domain bundle which both point to the same domain name server.

Relevant documentation will be found both on the developers' Web sites, and on an IDN Forum[10] provided by ICANN.

ES1845 JCUKEN-QWERTY hybrid layout keyboard

The Cyrillic letter ⟨С⟩ (U+0421 С CYRILLIC CAPITAL LETTER ES) not only looks like Latin ⟨C⟩ (U+0043 C LATIN CAPITAL LETTER C), but also occupies the same button in JCUKEN-QWERTY hybrid layout keyboards. This design nuance can be seen on the C/С button represented in Keyboard Monument in Yekaterinburg.

See also


References

  1. ^ a b "UTR #36: Unicode Security Considerations". www.unicode.org.
  2. ^ Walton, Chas (October 7, 2020). "A writer's guide to diacritics and special characters". Text Wizard.
  3. ^ Describing these as homoglyphs is questionable as there are probably no languages in which the glyph can fulfil both these roles. It would be just as valid to describe, say, a grave accent as a homoglyph because it fulfils different roles in different languages.
  4. ^ a b Helfrich, James; Neff, Rick (2012). "Dual canonicalization: An answer to the homograph attack". 2012 e Crime Researchers Summit. eCrime Researchers Summit (eCrime), 2012. pp. 1–10. doi:10.1109/eCrime.2012.6489517. ISBN 978-1-4673-2543-1.
  5. ^ Nigel Tao, Chuck Bigelow, and Rob Pike. Go fonts: DIN Legibility Standard". 2016.
  6. ^ Hill, Will (30 June 2020). "Chapter 25: Typography and the printed English text" (PDF). The Routledge Handbook of the English Writing System. Taylor & Francis. p. 6. ISBN 9780367581565. Archived from the original (PDF) on 10 July 2022. Retrieved 24 January 2024. The types used by Caxton and his contemporaries originated in Holland and Belgium, and did not provide for the continuing use of elements of the Old English alphabet such as thorn <þ>, eth <ð>, and yogh <ʒ>. The substitution of visually similar typographic forms has led to some anomalies which persist to this day in the reprinting of archaic texts and the spelling of regional words. The widely misunderstood 'ye' occurs through a habit of printer's usage that originates in Caxton's time, when printers would substitute the <y> (often accompanied by a superscript <e>) in place of the thorn <þ> or the eth <ð>, both of which were used to denote both the voiced and non-voiced sounds, /ð/ and /θ/ (Anderson, D. (1969) The Art of Written Forms. New York: Holt, Rinehart and Winston, p 169)
  7. ^ https://governance.dev/phishing-domain-check, accessed on February 12, 2024
  8. ^ "UTR #36: Unicode Security Considerations". unicode.org.
  9. ^ "Register a .CA in French!". Archived from the original on 2013-03-28. Retrieved 2013-03-29.
  10. ^ "ICANN Email Archives: [idn-guidelines]". forum.icann.org.

Read other articles:

Denny'sJenisPerusahaan publik (Nasdaq: DENN)Industrirumah makan keluargaDidirikan1953KantorpusatSpartanburg, South Carolina, Amerika SerikatTokohkunciHarold Butler, pendiriSitus webwww.dennys.com Denny's adalah rumah makan berantai asal Amerika Serikat yang menyajikan makanan dengan konsep restoran keluarga. Selain di Amerika Serikat, rumah makan Denny's terdapat di Puerto Riko, Kanada, Kuraçao, Kosta Rika, El Salvador, Jepang, Meksiko, dan Selandia Baru. Denny's mengoperasikan lebih da...

 

 

Amerisium(II) iodida Nama Nama IUPAC Amerisium(II) iodida Nama lain Amerisium diiodida Penanda Model 3D (JSmol) Gambar interaktif 3DMet {{{3DMet}}} Nomor EC Nomor RTECS {{{value}}} InChI InChI=1S/Am.2HI/h;2*1H/q+2;;/p-2Key: BEKYCNWSCXLCSQ-UHFFFAOYSA-L SMILES [I-].[I-].[Am+2] Sifat Rumus kimia AmI2 Massa molar 496,81 g·mol−1 Penampilan Padatan hitam Densitas 6,60 g/cm3 Kecuali dinyatakan lain, data di atas berlaku pada suhu dan tekanan standar (25 °C [77 °F], 1...

 

 

Babad Blambangan adalah karya sastra klasik yang berasal dari daerah Blambangan.[1][2] Daerah Blambangan merupakan negeri yang dikelilingi oleh laut.[3] Daerah ini di luar batas Gunung Bromo dan Lamajang.[3] Babad adalah kumpulan dari tulisan-tulisan bahasa kias yang bermuatan cerita-cerita sejarah.[4] Babad Blambangan adalah karya sastra yang berisi data-data sejarah di sekitar Blambangan.[2] Babad blambangan bukan merupakan satu karangan utuh ...

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Maret 2016. SMK Tarakanita JakartaInformasiDidirikan1968JenisSwastaAkreditasiAMaskotLogo TarakanitaKepala SekolahLinda Tri Setyaningsih P.P., S.SiKurikulumKurikulum 2013 RevisiAlamatLokasiJl. Wolter Monginsidi No.118, RT.16/RW.2, Petogogan, Kec. Kby. Baru, Jaka...

 

 

American diplomat John Randolph Clay John Randolph Clay (September 29, 1808 – August 15, 1885)[1] was an American diplomat. Biography Clay was born in Philadelphia, Pennsylvania in 1808, the second child of parents Joseph (1769–1811) and Mary Ashmead Clay (1782–1871) and younger brother of Joseph Ashmead Clay (1806–1881). He also had a younger sister Ann Eliza Clay (1810–1872).[2] Clay was orphaned as a child, and was both brought up and taught by John Randolph of Ro...

 

 

Map all coordinates using OpenStreetMap Download coordinates as: KML GPX (all coordinates) GPX (primary coordinates) GPX (secondary coordinates) Town in Queensland, AustraliaNormantonQueenslandEntry into NormantonNormantonCoordinates17°40′13″S 141°04′45″E / 17.6702°S 141.0791°E / -17.6702; 141.0791 (Normanton (town centre))Population1,391 (2021 census)[1] • Density0.19423/km2 (0.50305/sq mi)Established1867Postcode(s)4890Are...

Weekly MorningSampul keluaran Mei 2011 Weekly Morning, yang diterbitkan oleh Kodansha pada 15 Mei 2011KategoriManga seinenFrekuensiMingguanTerbitan pertama1982PerusahaanKodanshaNegaraJepangBahasaJepangSitus webMorning Weekly Morning (Jepang: 週刊モーニングcode: ja is deprecated , Hepburn: Shūkan Mōningu) adalah sebuah majalah manga seinen Jepang mingguan yang diterbitkan oleh Kodansha, khusus untuk pria dewasa. Majalah tersebut pertama kali diterbitkan pada tahun 1982 dengan nama Com...

 

 

Santpoort ZuidGeneral informationLocationNetherlandsCoordinates52°25′10″N 4°37′54″E / 52.41944°N 4.63167°E / 52.41944; 4.63167Line(s)Haarlem–Uitgeest railwayServices Preceding station Nederlandse Spoorwegen Following station Santpoort Noordtowards Hoorn NS Sprinter 4800 Bloemendaaltowards Amsterdam Centraal LocationSantpoort ZuidLocation within Northern RandstadShow map of Northern RandstadSantpoort ZuidSantpoort Zuid (Netherlands)Show map of Netherlands ...

 

 

2019 song by Dimitri Vegas & Like Mike, David Guetta, Daddy Yankee, Afro Bros, and Natti Natasha InstagramSingle by Dimitri Vegas & Like Mike, David Guetta, Daddy Yankee, Afro Bros, and Natti NatashaLanguageEnglishSpanishReleased5 July 2019Recorded2018GenreDanceLength3:04LabelSmash the HouseSongwriter(s) David Guetta Francesca Richard Dimitri Thivaios Michael Thivaios Ramon Luis Ayala Rodriguez Giordano M.S. Ashruf Natalia Alexandra Gutierrez Batista Rashid M.S. M Badloe Sharef M.R. B...

Mac OS Sámi is a character encoding used on classic Mac OS to represent the Sámi languages and the Finnish Kalo language. While not used in any official Apple product, it has been used in various fonts designed to support Sámi languages under classic Mac OS, including those from Evertype.[1][2][3] Each character is shown with its equivalent Unicode code point. Only the second half of the table (code points 128–255) is shown, the first half (code points 0–...

 

 

Badminton is a SEA Games event and has been one of the sports held at the Games since the inaugural edition of the South East Asian Peninsular Games (SEAP Games) in 1959. Summary Games Year Host City Events Best nation SEAP Games I 1959 Bangkok 2  Thailand II 1961 Rangoon 5  Thailand III 1965 Kuala Lumpur 7  Malaysia IV 1967 Bangkok 5  Thailand V 1969 Rangoon 5  Malaysia VI 1971 Kuala Lumpur 7  Malaysia VII 1973 Singapore 7  Malaysia VIII 1975 Bangkok 7 ...

 

 

Norfolk mayoral election, 2016 ← 2014 May 3, 2016 (2016-05-03) 2020 → Turnout27.56%[1] 13.91 pp[2]   Nominee Kenny Alexander Andy A. Protogyrou Robert James McCabe Party Nonpartisan Nonpartisan Nonpartisan Popular vote 16,397 8,022 7,276 Percentage 51.7% 25.3% 22.9% 2016 Norfolk Mayoral Election with results colored by precinct. Green for Kenny Alexander, orange for Andy A. Protogyrou, and purple for Robert James McCabe. Mayor be...

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Februari 2023. Havana SyndromeThe Hotel Nacional in Havana is one of the locations where the syndrome has reportedly been experienced.[1]Informasi umumPenyebabNot definitively determined[2][3][4][5][6]Aspek klinisGejal...

 

 

Myths of China Nine Dragons, handscroll section, by Chen Rong, AD 1244, Song dynasty, Museum of Fine Arts, Boston Part of a series onChinese folk religion Concepts Tian—Shangdi Qi Shen Ling Xian ling Yinyang Hundun Mingyun Yuanfen Baoying Wu Theory Chinese theology Chinese gods and immortals Chinese mythology Chinese creation myth Chinese spiritual world concepts Model humanity: Xian Zhenren Wen and wu Practices Fenxiang Jingxiang Feng shui Miaohui Wu shamanism Jitong mediumship Precious sc...

 

 

Cappella dell'Immacolata, Gesù delle monache Lorenzo Vaccaro (Napoli, 10 agosto 1655 – Torre del Greco, 10 agosto 1706) è stato uno scultore, architetto e pittore italiano di epoca tardo-barocca, attivo principalmente a Napoli. Indice 1 Biografia 2 Note 3 Bibliografia 4 Altri progetti 5 Collegamenti esterni Biografia Lorenzo nacque a Napoli da Domenico, avvocato lodato nel foro, e da Candida Morvillo. L'anno successivo, durante l'epidemia di peste, morì il padre e fu allevato dalla madre...

Persona Non GrataPosterNama lainJepang杉原千畝 スギハラチウネ SutradaraCellin GluckProduserNobuyuki IinumaKazutoshi WadakuraDitulis olehTetsuo KamataHiromichi MatsuoPemeranToshiaki KarasawaKoyukiPenata musikNaoki SatoSinematograferGarry WallerPenyuntingJim MunroPerusahaanproduksi Nippon TV Toho D.N. Dream Partners Yomiuri Telecasting Corporation Dentsu Pony Canyon Yomiuri Shimbun Shogakukan ShoPro JTB Group Chunichi Shimbun BS Nippon Cine Bazar Sapporo Television Broa...

 

 

Penguin galápagos Status konservasi Terancam  (IUCN 3.1)[1] Klasifikasi ilmiah Domain: Eukaryota Kerajaan: Animalia Filum: Chordata Kelas: Aves Ordo: Sphenisciformes Famili: Spheniscidae Genus: Spheniscus Spesies: S. mendiculus Nama binomial Spheniscus mendiculusSundevall, 1871 Peta persebaran penguin Galápagos Penguin galápagos (Spheniscus mendiculus) adalah sebuah penguin endemik di Kepulauan Galapagos. Ini adalah satu-satunya penguin yang ditemukan di utara khatulisti...

 

 

この項目には暴力的または猟奇的な記述・表現が含まれています。 免責事項もお読みください。 この記事は検証可能な参考文献や出典が全く示されていないか、不十分です。 出典を追加して記事の信頼性向上にご協力ください。(このテンプレートの使い方)出典検索?: ジェノサイバー – ニュース · 書籍 · スカラー · CiNii · J-STAGE · NDL...

Collège O'Sullivan de Montréal inc.TypePrivate CollegeEstablished1916LocationMontreal, Quebec, CanadaWebsitehttps://osullivan.edu/en/ 45°29′50″N 73°34′25″W / 45.4972415°N 73.5736895°W / 45.4972415; -73.5736895 Collège O'Sullivan de Montréal inc. is a small bilingual private college that was founded in 1916. It is located at 1191, Mountain Street (rue de la Montagne) in downtown Montreal. The college is near Peel and Lucien L'Allier metro stations. The c...

 

 

An astronomical system positing that the Earth, Moon, Sun, and planets revolve around an unseen Central Fire was developed in the fifth century BC and has been attributed to the Pythagorean philosopher Philolaus.[1][2] The system has been called the first coherent system in which celestial bodies move in circles,[3] anticipating Copernicus in moving the earth from the center of the cosmos [and] making it a planet.[4] Although its concepts of a Central Fire dist...