Plain text

Text file with portion of The Human Side of Animals by Royal Dixon, displayed by the command cat in an xterm window

In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects (floating-point numbers, images, etc.). It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects (encoded integers, real numbers, images, etc.).

The term is sometimes used quite loosely, to mean files that contain only "readable" content (or just files with nothing that the speaker does not prefer). For example, that could exclude any indication of fonts or layout (such as markup, markdown, or even tabs); characters such as curly quotes, non-breaking spaces, soft hyphens, em dashes, and/or ligatures; or other things.

In principle, plain text can be in any encoding, but occasionally the term is taken to imply ASCII. As Unicode-based encodings such as UTF-8 and UTF-16 become more common, that usage may be shrinking.

Plain text is also sometimes used only to exclude "binary" files: those in which at least some parts of the file cannot be correctly interpreted via the character encoding in effect. For example, a file or string consisting of "hello" (in any encoding), following by 4 bytes that express a binary integer that is not a character, is a binary file. Converting a plain text file to a different character encoding does not change the meaning of the text, as long as the correct character encoding is used. However, converting a binary file to a different format may alter the interpretation of the non-textual data.

Plain text and rich text

According to The Unicode Standard:[1]

  • "Plain text is a pure sequence of character codes; plain Un-encoded text is therefore a sequence of Unicode character codes.
  • In contrast, styled text, also known as rich text, is any text representation containing plain text plus added information such as a language identifier, font size, color, hypertext links, and so on.
  • SGML, RTF, HTML, XML, and TeX are examples of rich text fully represented as plain text streams, interspersing plain text data with sequences of characters that represent the additional data structures."

According to other definitions, however, files that contain markup or other meta-data are generally considered plain text, so long as the markup is also in a directly human-readable form (as in HTML, XML, and so on). Thus, representations such as SGML, RTF, HTML, XML, wiki markup, and TeX, as well as nearly all programming language source code files, are considered plain text. The particular content is irrelevant to whether a file is plain text. For example, an SVG file can express drawings or even bitmapped graphics, but is still plain text.

The use of plain text rather than binary files enables files to survive much better "in the wild", in part by making them largely immune to computer architecture incompatibilities. For example, all the problems of Endianness can be avoided (with encodings such as UCS-2 rather than UTF-8, endianness matters, but uniformly for every character, rather than for potentially-unknown subsets of it).

Usage

The purpose of using plain text today is primarily independence from programs that require their very own special encoding or formatting or file format. Plain text files can be opened, read, and edited with ubiquitous text editors and utilities.

A command-line interface allows people to give commands in plain text and get a response, also typically in plain text.

Many other computer programs are also capable of processing or creating plain text, such as countless programs in DOS, Windows, classic Mac OS, and Unix and its kin; as well as web browsers (a few browsers such as Lynx and the Line Mode Browser produce only plain text for display) and other e-text readers.

Plain text files are almost universal in programming; a source code file containing instructions in a programming language is almost always a plain text file. Plain text is also commonly used for configuration files, which are read for saved settings at the startup of a program.

Plain text is used for much e-mail.

A comment, a ".txt" file, or a TXT Record generally contains only plain text (without formatting) intended for humans to read.

The best format for storing knowledge persistently is plain text, rather than some binary format.[2]

Encoding

Character encodings

Before the early 1960s, computers were mainly used for number-crunching rather than for text, and memory was extremely expensive. Computers often allocated only 6 bits for each character, permitting only 64 characters—assigning codes for A-Z, a-z, and 0-9 would leave only 2 codes: nowhere near enough. Most computers opted not to support lower-case letters. Thus, early text projects such as Roberto Busa's Index Thomisticus, the Brown Corpus, and others had to resort to conventions such as keying an asterisk preceding letters actually intended to be upper-case.

Fred Brooks of IBM argued strongly for going to 8-bit bytes, because someday people might want to process text, and won. Although IBM used EBCDIC, most text from then on came to be encoded in ASCII, using values from 0 to 31 for (non-printing) control characters, and values from 32 to 127 for graphic characters such as letters, digits, and punctuation. Most machines stored characters in 8 bits rather than 7, ignoring the remaining bit or using it as a checksum.

The near-ubiquity of ASCII was a great help, but failed to address international and linguistic concerns. The dollar-sign ("$") was not as useful in England, and the accented characters used in Spanish, French, German, Portuguese, Italian and many other languages were entirely unavailable in ASCII (not to mention characters used in Greek, Russian, and most Eastern languages). Many individuals, companies, and countries defined extra characters as needed—often reassigning control characters, or using values in the range from 128 to 255. Using values above 128 conflicts with using the 8th bit as a checksum, but the checksum usage gradually died out.

These additional characters were encoded differently in different countries, making texts impossible to decode without figuring out the originator's rules. For instance, a browser might display ¬A rather than ` if it tried to interpret one character set as another. The International Organization for Standardization (ISO) eventually developed several code pages under ISO 8859, to accommodate various languages. The first of these (ISO 8859-1) is also known as "Latin-1", and covers the needs of most (not all) European languages that use Latin-based characters (there was not quite enough room to cover them all). ISO 2022 then provided conventions for "switching" between different character sets in mid-file. Many other organisations developed variations on these, and for many years Windows and Macintosh computers used incompatible variations.

The text-encoding situation became more and more complex, leading to efforts by ISO and by the Unicode Consortium to develop a single, unified character encoding that could cover all known (or at least all currently known) languages. After some conflict,[3] these efforts were unified. Unicode currently allows for 1,114,112 code values, and assigns codes covering nearly all modern text writing systems, as well as many historical ones, and for many non-linguistic characters such as printer's dingbats, mathematical symbols, etc.

Text is considered plain text regardless of its encoding. To properly understand or process it the recipient must know (or be able to figure out) what encoding was used; however, they need not know anything about the computer architecture that was used, or about the binary structures defined by whatever program (if any) created the data.

Perhaps the most common way of explicitly stating the specific encoding of plain text is with a MIME type. For email and HTTP, the default MIME type is "text/plain" -- plain text without markup. Another MIME type often used in both email and HTTP is "text/html; charset=UTF-8" -- plain text represented using the UTF-8 character encoding with HTML markup. Another common MIME type is "application/json" -- plain text represented using the UTF-8 character encoding with JSON markup.

When a document is received without any explicit indication of the character encoding, some applications use charset detection to attempt to guess what encoding was used.

Control codes

ASCII reserves the first 32 codes (numbers 0–31 decimal) for control characters known as the "C0 set": codes originally intended not to represent printable information, but rather to control devices (such as printers) that make use of ASCII, or to provide meta-information about data streams such as those stored on magnetic tape. They include common characters like the newline and the tab character.

In 8-bit character sets such as Latin-1 and the other ISO 8859 sets, the first 32 characters of the "upper half" (128 to 159) are also control codes, known as the "C1 set". They are rarely used directly; when they turn up in documents which are ostensibly in an ISO 8859 encoding, their code positions generally refer instead to the characters at that position in a proprietary, system-specific encoding, such as Windows-1252 or Mac OS Roman, that use the codes to instead provide additional graphic characters.

Unicode defines additional control characters, including bi-directional text direction override characters (used to explicitly mark right-to-left writing inside left-to-right writing and the other way around) and variation selectors to select alternate forms of CJK ideographs, emoji and other characters.

See also

References

  1. ^ "The Unicode Standard, version 14.0" (PDF). pp. 18–19.
  2. ^ Andrew Hunt, David Thomas. "The Pragmatic Programmer". 1999. Chapter 14: "The Power of Plain Text". p. 73.
  3. ^ "ISO/Unicode Merger: Ed Hart Memo". www.unicode.org. Retrieved 2024-10-21.

Read other articles:

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Desember 2022. Lee Song-hoInformasi pribadiTanggal lahir 12 April 1983 (umur 40)Tempat lahir Prefektur Osaka, JepangPosisi bermain GelandangKarier senior*Tahun Tim Tampil (Gol)2006-2008 FC Gifu * Penampilan dan gol di klub senior hanya dihitung dari liga domest...

 

 

الحاكم النيسابوري معلومات شخصية اسم الولادة محمد بن عبد الله الحاكم النيسابوري الميلاد 3 ربيع الأول 321 هـنيسابور  الوفاة 3 صفر 405 هـنيسابور  سبب الوفاة سكتة دماغية  مواطنة الدولة العباسية  الكنية أبو عبد الله الديانة الإسلام المذهب الفقهي الشافعي العقيدة أهل الس�...

 

 

Documentary video streaming service DocuramaIndustryEntertainmentFoundedMay 2014HeadquartersNew York, United StatesArea servedNorth AmericaKey peopleSusan Margolin (president)ServicesVideo streamingNumber of employees10ParentCineverseWebsitedocurama.com Docurama is an over-the-top video streaming service in May 2014 by US entertainment company Cineverse that serves documentary films to proprietary software clients.[1] In 2013, Docurama had a library of about 1,200 programs,[2]...

Cet article est une ébauche concernant une intercommunalité française et la Côte-d'Or. Vous pouvez partager vos connaissances en l’améliorant (comment ?) ; pour plus d’indications, visitez le Projet des intercommunalités françaises. Communauté de communes Rives de Saône Administration Pays France Région Bourgogne-Franche-Comté Département Côte-d'Or Forme Communauté de communes Siège Seurre Communes 38 Président Sébastien Delacour (DVG) Date de création 1er jan...

 

 

جزء من سلسلة مقالات حولالليبرالية التاريخ تاريخ الفكر الليبرالي مساهمات في النظرية الليبرالية تاريخ الليبرالية الكلاسيكية الأفكار ليبرالية سياسية ليبرالية اقتصادية حرية سياسية ديمقراطية رأسمالية تربية ديمقراطية فردانية اقتصاد عدم التدخل ديمقراطية ليبرالية الحياد ال�...

 

 

Hindu temple in Tirunangur ThirumanikkoodamReligionAffiliationHinduismDistrictMayiladuthuraiDeityVaradaraja Perumal,Senkanmal (Vishnu)Thirumamagal,Bhoodevi (Lakshmi)FeaturesTower: KanakaLocationLocationThirunangurStateTamil NaduCountryIndiaLocation in Tamil NaduGeographic coordinates11°10′46″N 79°47′17″E / 11.17944°N 79.78806°E / 11.17944; 79.78806ArchitectureTypeDravidian architecture Thirumanikkoodam or Varadaraja Perumal Temple is located in Thirunangur,...

Keledai yang Memakai Kulit Singa adalah salah satu fabel Aesop, dimana terdapat dua versi berbeda. Terdapat juga beberapa ragam Timur, dan penafsiran cerita yang beragam. Fabel Ilustrasi Arthur Rackham, 1912 Dari dua versi Yunani dari cerita tersebut, satu cerita yang dikatalogkan sebagai 188 dalam Perry Index mengisahkan seekor keledai yang memakai kulit singa dan menyamar dalam rangka menakut-nakuti seluruh hewan yang bodoh. Pada akhir cerita, seekor rubah datang dan ia juga berniat untuk m...

 

 

Telephone numbers in Kyrgyzstanxx xxxx Xxxxx XxxxxLocationCountryKyrgyzstanContinentAsiaTypeclosedAccess codesCountry code+996International access00Long-distance0 Calling formats +996 XXX XXXXXX for calls from outside Kyrgyzstan 0XXX XXXXXX or 0XXXX XXXXX for calls within Kyrgyzstan. The NSN length is nine digits. National Numbering Plan (NNP) of the Kyrgyz Republic LIST OF AREA CODES[1] International number or range Usage of international number or range Additional network informati...

 

 

Halaman ini berisi artikel tentang merek minuman energi. Untuk distributornya, lihat Red Bull GmbH. Untuk minuman dengan nama yang sama dalam bahasa Thai, lihat Krating Daeng. Untuk kegunaan lain, lihat Red Bull (disambiguasi).Artikel ini tidak memiliki bagian pembuka yang sesuai dengan standar Wikipedia. Mohon tulis paragraf pembuka yang informatif sehingga pembaca dapat memahami maksud dari Red Bull. Contoh paragraf pembuka Red Bull adalah .... (Juli 2022) (Pelajari cara dan kapan saatnya u...

Torneo Città di VignolaSport Calcio Tiposquadre di club CategoriaGiovanile Paese Italia LuogoVignola Cadenzaannuale Sito InternetSito ufficiale StoriaFondazione1969 Numero edizioni55 al 2023 Detentore Modena Record vittorie Fiorentina (10) Modifica dati su Wikidata · Manuale Il Torneo Città di Vignola è una competizione calcistica con cadenza annuale che si svolge a Vignola, in provincia di Modena, a cui partecipano formazioni giovanili provenienti da tutta Italia. La ...

 

 

本條目存在以下問題,請協助改善本條目或在討論頁針對議題發表看法。 此條目需要編修,以確保文法、用詞、语气、格式、標點等使用恰当。 (2013年8月6日)請按照校對指引,幫助编辑這個條目。(幫助、討論) 此條目剧情、虛構用語或人物介紹过长过细,需清理无关故事主轴的细节、用語和角色介紹。 (2020年10月6日)劇情、用語和人物介紹都只是用於了解故事主軸,輔助�...

 

 

本條目存在以下問題,請協助改善本條目或在討論頁針對議題發表看法。 此條目需要編修,以確保文法、用詞、语气、格式、標點等使用恰当。 (2013年8月6日)請按照校對指引,幫助编辑這個條目。(幫助、討論) 此條目剧情、虛構用語或人物介紹过长过细,需清理无关故事主轴的细节、用語和角色介紹。 (2020年10月6日)劇情、用語和人物介紹都只是用於了解故事主軸,輔助�...

Ця стаття про колишню податкову службу України. Про нову податкову службу див. Державна податкова служба України. Ця стаття використовує голі URL-посилання для посилань на джерела, що може призвести до мертвих посилань[en]. Будь ласка, допоможіть удосконалити цю статтю, зам�...

 

 

Railway line in China Guangshengang XRLMTR Vibrant Express Train G6581 at Guangzhou South on the inaugural day of operations along the Hong Kong section of the Express Rail LinkOverviewNative name广深港高铁廣深港高鐵StatusOperationalOwner China Railway (mainland section) KCR Corporation (Hong Kong section) LocaleSouth ChinaTerminiGuangzhou SouthHong Kong West KowloonStations7WebsiteOfficial websiteServiceTypeHigh-speed rail lineSystem China Railway High-speedServices Guangzhou-Hong ...

 

 

سلطة الأمم المتحدة الانتقالية في كمبوديا سلطة الأمم المتحدة الانتقالية في كمبوديا‌ سلطة الأمم المتحدة الانتقالية في كمبوديا‌   الاختصار UNTAC البلد كمبوديا  تاريخ التأسيس 28 فبراير 1992[1] تاريخ الحل انتهى سبتمبر 1993[2] النوع قوة حفظ سلام اللغات الرسمية الخميرية...

Election for the 59th Parliament of Victoria 2018 Victorian state election ← 2014 24 November 2018 2022 → All 88 seats in the Victorian Legislative AssemblyAll 40 seats in the Victorian Legislative Council45 Assembly seats were needed for a majorityInformation below is for the Assembly election.   First party Second party Third party   Leader Daniel Andrews Matthew Guy Samantha Ratnam Party Labor Liberal/National Coalition Greens Leader since 3 December 20...

 

 

شاحنة عاملة على الغاز الطبيعي تتزود بالوقود مركبة غاز طبيعي هي مركبة غاز تعمل قوتها الدافعة على الغاز الطبيعي المسال؛ وهي بذلك مركبة تعمل على الوقود البديل. يمزج محرك الاحتراق الداخلي في هذه المركبة غاز الميثان مع غاز الأكسجين بتفاعل احتراق ليتشكل غاز ثنائي أكسيد الكربون ...

 

 

Subprefecture and commune in Grand Est, France For the wine grape that is also known as Bar sur Aube, see Chasselas. Subprefecture and commune in Grand Est, FranceBar-sur-AubeSubprefecture and communeTown hall of Bar-sur-Aube Coat of armsLocation of Bar-sur-Aube Bar-sur-AubeShow map of FranceBar-sur-AubeShow map of Grand EstCoordinates: 48°16′N 4°43′E / 48.27°N 4.72°E / 48.27; 4.72CountryFranceRegionGrand EstDepartmentAubeArrondissementBar-sur-AubeCantonBar-sur...

نيا سميرني    خريطة الموقع سميت باسم إزمير  تقسيم إداري البلد اليونان  [1] خصائص جغرافية إحداثيات 37°56′36″N 23°42′38″E / 37.94333333°N 23.71055556°E / 37.94333333; 23.71055556   [2] المساحة 4 كيلومتر مربع  الارتفاع 50 متر  السكان التعداد السكاني 72853 (إحصاء السكان و resi...

 

 

Questa voce sull'argomento stagioni delle società calcistiche italiane è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Voce principale: Ravenna Football Club 1913. Associazione Calcio RavennaStagione 1933-1934Sport calcio Squadra Ravenna Allenatore Attilio Corbions Presidente Erminio Cidonio Prima Divisione9º posto nel girone D. Maggiori presenzeCampionato: Lino Morselli (26) Miglior marcatoreCa...