Text file

Text file
Filename extension
.text, .txt
Internet media type
text/plain
Type codeTEXT
Uniform Type Identifier (UTI)public.plain-text
UTI conformationpublic.text
Type of formatDocument file format, Generic container format

A text file (sometimes spelled textfile; an old alternative name is flat file) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system.

In operating systems such as CP/M, where the operating system does not keep track of the file size in bytes, the end of a text file is denoted by placing one or more special characters, known as an end-of-file (EOF) marker, as padding after the last line in a text file. In modern operating systems such as DOS, Microsoft Windows and Unix-like systems, text files do not contain any special EOF character, because file systems on those operating systems keep track of the file size in bytes.

Some operating systems, such as Multics, Unix-like systems, CP/M, DOS, the classic Mac OS, and Windows, store text files as a sequence of bytes, with an end-of-line delimiter at the end of each line. Other operating systems, such as OpenVMS and OS/360 and its successors, have record-oriented filesystems, in which text files are stored as a sequence either of fixed-length records or of variable-length records with a record-length value in the record header.

"Text file" refers to a type of container, while plain text refers to a type of content.

At a generic level of description, there are two kinds of computer files: text files and binary files.[1]

Data storage

A stylized iconic depiction of a CSV-formatted text file

Because of their simplicity, text files are commonly used for storage of information. They avoid some of the problems encountered with other file formats, such as endianness, padding bytes, or differences in the number of bytes in a machine word. Further, when data corruption occurs in a text file, it is often easier to recover and continue processing the remaining contents. A disadvantage of text files is that they usually have a low entropy, meaning that the information occupies more storage than is strictly necessary.

A simple text file may need no additional metadata (other than knowledge of its character set) to assist the reader in interpretation. A text file may contain no data at all, which is a case of zero-byte file.

Encoding

The ASCII character set is the most common compatible subset of character sets for English-language text files, and is generally assumed to be the default file format in many situations. It covers American English, but for the British pound sign, the euro sign, or characters used outside English, a richer character set must be used. In many systems, this is chosen based on the default locale setting on the computer it is read on. Prior to UTF-8, this was traditionally single-byte encodings (such as ISO-8859-1 through ISO-8859-16) for European languages and wide character encodings for Asian languages.

Because encodings necessarily have only a limited repertoire of characters, often very small, many are only usable to represent text in a limited subset of human languages. Unicode is an attempt to create a common standard for representing all known languages, and most known character sets are subsets of the very large Unicode character set. Although there are multiple character encodings available for Unicode, the most common is UTF-8, which has the advantage of being backwards-compatible with ASCII; that is, every ASCII text file is also a UTF-8 text file with identical meaning. UTF-8 also has the advantage that it is easily auto-detectable. Thus, a common operating mode of UTF-8 capable software, when opening files of unknown encoding, is to try UTF-8 first and fall back to a locale dependent legacy encoding when it definitely is not UTF-8.

Formats

On most operating systems, the name text file refers to a file format that allows only plain text content with very little formatting (e.g., no bold or italic types). Such files can be viewed and edited on text terminals or in simple text editors. Text files usually have the MIME type text/plain, usually with additional information indicating an encoding.

Microsoft Windows text files

DOS and Microsoft Windows use a common text file format, with each line of text separated by a two-character combination: carriage return (CR) and line feed (LF). It is common for the last line of text not to be terminated with a CR-LF marker, and many text editors (including Notepad) do not automatically insert one on the last line.

On Microsoft Windows operating systems, a file is regarded as a text file if the suffix of the name of the file (the "filename extension") is .txt. However, many other suffixes are used for text files with specific purposes. For example, source code for computer programs is usually kept in text files that have file name suffixes indicating the programming language in which the source is written.

Most Microsoft Windows text files use ANSI, OEM, Unicode or UTF-8 encoding. What Microsoft Windows terminology calls "ANSI encodings" are usually single-byte ISO/IEC 8859 encodings (i.e. ANSI in the Microsoft Notepad menus is really "System Code Page", non-Unicode, legacy encoding), except for in locales such as Chinese, Japanese and Korean that require double-byte character sets. ANSI encodings were traditionally used as default system locales within Microsoft Windows, before the transition to Unicode. By contrast, OEM encodings, also known as DOS code pages, were defined by IBM for use in the original IBM PC text mode display system. They typically include graphical and line-drawing characters common in DOS applications. "Unicode"-encoded Microsoft Windows text files contain text in UTF-16 Unicode Transformation Format. Such files normally begin with byte order mark (BOM), which communicates the endianness of the file content. Although UTF-8 does not suffer from endianness problems, many Microsoft Windows programs (i.e. Notepad) prepend the contents of UTF-8-encoded files with BOM,[2] to differentiate UTF-8 encoding from other 8-bit encodings.[3]

Unix text files

On Unix-like operating systems, text files format is precisely described: POSIX defines a text file as a file that contains characters organized into zero or more lines,[4] where lines are sequences of zero or more non-newline characters plus a terminating newline character,[5] normally LF.

Additionally, POSIX defines a printable file as a text file whose characters are printable or space or backspace according to regional rules. This excludes most control characters, which are not printable.[6]

Apple Macintosh text files

Prior to the advent of macOS, the classic Mac OS system regarded the content of a file (the data fork) to be a text file when its resource fork indicated that the type of the file was "TEXT".[7] Lines of classic Mac OS text files are terminated with CR characters.[8]

Being a Unix-like system, macOS uses Unix format for text files.[8] Uniform Type Identifier (UTI) used for text files in macOS is "public.plain-text"; additional, more specific UTIs are: "public.utf8-plain-text" for utf-8-encoded text, "public.utf16-external-plain-text" and "public.utf16-plain-text" for utf-16-encoded text and "com.apple.traditional-mac-plain-text" for classic Mac OS text files.[7]

Rendering

When opened by a text editor, human-readable content is presented to the user. This often consists of the file's plain text visible to the user. Depending on the application, control codes may be rendered either as literal instructions acted upon by the editor, or as visible escape characters that can be edited as plain text. Though there may be plain text in a text file, control characters within the file (especially the end-of-file character) can render the plain text unseen by a particular method.

See also

Notes and references

  1. ^ Lewis, John (2006). Computer Science Illuminated. Jones and Bartlett. ISBN 0-7637-4149-3.
  2. ^ "Using Byte Order Marks". Internationalization for Windows Applications. Microsoft. Jan 7, 2021. Archived from the original on Feb 21, 2023. Retrieved 2022-04-21.
  3. ^ Freytag, Asmus (2015-12-18). "FAQ – UTF-8, UTF-16, UTF-32 & BOM". The Unicode Consortium. Retrieved 2016-05-30. Yes, UTF-8 can contain a BOM. However, it makes no difference as to the endianness of the byte stream. UTF-8 always has the same byte order. An initial BOM is only used as a signature — an indication that an otherwise unmarked text file is in UTF-8. Note that some recipients of UTF-8 encoded data do not expect a BOM. Where UTF-8 is used transparently in 8-bit environments, the use of a BOM will interfere with any protocol or file format that expects specific ASCII characters at the beginning, such as the use of "#!" of at the beginning of Unix shell scripts.
  4. ^ "3.403 Text File". IEEE Std 1003.1, 2017 Edition. IEEE Computer Society. Retrieved 2019-03-01.
  5. ^ "3.206 Line". IEEE Std 1003.1, 2013 Edition. IEEE Computer Society. Retrieved 2015-12-15.
  6. ^ "3.284 Printable File". IEEE Std 1003.1, 2013 Edition. IEEE Computer Society. Retrieved 2015-12-15.
  7. ^ a b "System-Declared Uniform Type Identifiers". Guides and Sample Code. Apple Inc. 2009-11-17. Retrieved 2016-09-12.
  8. ^ a b "Designing Scripts for Cross-Platform Deployment". Mac Developer Library. Apple Inc. 2014-03-10. Retrieved 2016-09-12.

Read other articles:

Place in Jerusalem, IsraelYish'iYish'iCoordinates: 31°45′3″N 34°58′0″E / 31.75083°N 34.96667°E / 31.75083; 34.96667CountryIsraelDistrictJerusalemCouncilMateh YehudaAffiliationHapoel HaMizrachiFounded12 July 1950Founded byYemenite JewsPopulation (2021)[1]857 Yish'i (Hebrew: יִשְׁעִי, lit. 'My Salvation') is a moshav in central Israel. Located near Beit Shemesh, it falls under the jurisdiction of Mateh Yehuda Regional Counci...

 

 

  لمعانٍ أخرى، طالع روس (توضيح). روس   الإحداثيات 45°44′37″N 88°42′51″W / 45.743611111111°N 88.714166666667°W / 45.743611111111; -88.714166666667  [1] تقسيم إداري  البلد الولايات المتحدة[2]  التقسيم الأعلى مقاطعة فورست  خصائص جغرافية  المساحة 38.6 ميل مربع  ارتفاع 475 متر...

 

 

Small wild cat Geoffroy's cat Conservation status Least Concern  (IUCN 3.1)[1] CITES Appendix I (CITES)[1] Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Mammalia Order: Carnivora Suborder: Feliformia Family: Felidae Subfamily: Felinae Genus: Leopardus Species: L. geoffroyi Binomial name Leopardus geoffroyi(d'Orbigny & Gervais, 1844)[2] Distribution of Geoffroy's cat, 2015[1] Synonyms Oncifelis geoffroyi...

Topik artikel ini mungkin tidak memenuhi kriteria kelayakan umum. Harap penuhi kelayakan artikel dengan: menyertakan sumber-sumber tepercaya yang independen terhadap subjek dan sebaiknya hindari sumber-sumber trivial. Jika tidak dipenuhi, artikel ini harus digabungkan, dialihkan ke cakupan yang lebih luas, atau dihapus oleh Pengurus.Cari sumber: Gema Goeyardi – berita · surat kabar · buku · cendekiawan · JSTOR (Pelajari cara dan kapan saatnya untuk men...

 

 

2002 film by John Lee Hancock This article possibly contains original research. Please improve it by verifying the claims made and adding inline citations. Statements consisting only of original research should be removed. (January 2016) (Learn how and when to remove this template message) The RookieTheatrical release posterDirected byJohn Lee HancockWritten byMike RichProduced byMark Ciardi Gordon GrayMark JohnsonStarringDennis QuaidRachel GriffithsJay HernandezBrian CoxCinematographyJohn Sc...

 

 

Questa voce sull'argomento Calciatori panamensi è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Andrés Andrade Nazionalità  Panama Altezza 187 cm Peso 78 kg Calcio Ruolo Difensore Squadra  LASK CarrieraSquadre di club1 2016-2017 San Francisco21 (0)2017-2018 Toluca0 (0)2018 Querétaro0 (0)2018-2020 Juniors OÖ34 (4)2018-2021 LASK44 (0)2021-2023 Arminia Bielefeld...

Kejadian 12Kitab Kejadian lengkap pada Kodeks Leningrad, dibuat tahun 1008.KitabKitab KejadianKategoriTauratBagian Alkitab KristenPerjanjian LamaUrutan dalamKitab Kristen1← pasal 11 pasal 13 → Kejadian 12 (disingkat Kej 12) adalah pasal kedua belas Kitab Kejadian dalam Alkitab Ibrani dan Perjanjian Lama di Alkitab Kristen. Termasuk dalam kumpulan kitab Taurat yang disusun oleh Musa.[1] Pasal ini berisi riwayat Abram (kelak dinamakan Abraham) dan istrinya, Sarai (kelak dina...

 

 

Quadrilateral with four right angles For the record label, see Rectangle (label). RectangleRectangleTypequadrilateral, trapezium, parallelogram, orthotopeEdges and vertices4Schläfli symbol{ } × { }Coxeter–Dynkin diagramsSymmetry groupDihedral (D2), [2], (*22), order 4Propertiesconvex, isogonal, cyclic Opposite angles and sides are congruentDual polygonrhombus In Euclidean plane geometry, a rectangle is a quadrilateral with four right angles. It can also be defined as: an equiang...

 

 

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「�...

2016年美國總統選舉 ← 2012 2016年11月8日 2020 → 538個選舉人團席位獲勝需270票民意調查投票率55.7%[1][2] ▲ 0.8 %   获提名人 唐納·川普 希拉莉·克林頓 政党 共和黨 民主党 家鄉州 紐約州 紐約州 竞选搭档 迈克·彭斯 蒂姆·凱恩 选举人票 304[3][4][註 1] 227[5] 胜出州/省 30 + 緬-2 20 + DC 民選得票 62,984,828[6] 65,853,514[6]...

 

 

此條目需要补充更多来源。 (2021年7月4日)请协助補充多方面可靠来源以改善这篇条目,无法查证的内容可能會因為异议提出而被移除。致使用者:请搜索一下条目的标题(来源搜索:美国众议院 — 网页、新闻、书籍、学术、图像),以检查网络上是否存在该主题的更多可靠来源(判定指引)。 美國眾議院 United States House of Representatives第118届美国国会众议院徽章 众议院旗...

 

 

هذه المقالة تحتاج للمزيد من الوصلات للمقالات الأخرى للمساعدة في ترابط مقالات الموسوعة. فضلًا ساعد في تحسين هذه المقالة بإضافة وصلات إلى المقالات المتعلقة بها الموجودة في النص الحالي. (فبراير 2023)Learn how and when to remove this message هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. ف...

American television show This article is about the American television show. For information on the courts that adjudicate divorce proceedings, see Family court. For the Australian TV series, see Divorce Court (Australian TV series). Divorce CourtGenreNontraditional court showStarring Voltaire Perkins (1957–1962; 1967–1969) William B. Keene (1984–1993) Jim Peck (1984–1989) Martha Smith (1989–1993) Mablean Ephriam (1999–2006) Lynn Toler (2006–2020) Faith Jenkins (2020–2022) Sta...

 

 

Peta menunjukkan lokasi San Agustin San Agustin adalah munisipalitas yang terletak di provinsi Romblon, Filipina. Pada tahun 2010, munisipalitas ini memiliki populasi sebesar 22.118 jiwa dan 4.541 rumah tangga. Pembagian wilayah Secara administratif Alicia terbagi menjadi 15 barangay, yaitu: Bachawan Binongaan Buli Cabolutan Cagboaya Camantaya Carmen Cawayan Doña Juana Dubduban Hinugusan Lusong Mahabangbaybay Poblacion Sugod Pranala luar About San Agustin Diarsipkan 2013-02-02 di Archive.is ...

 

 

US Supreme Court justice from 1912 to 1922 Mahlon PitneyAssociate Justice of the Supreme Court of the United StatesIn officeMarch 18, 1912 – December 31, 1922[1]Nominated byWilliam TaftPreceded byJohn Marshall HarlanSucceeded byEdward Terry SanfordMember of the U.S. House of Representativesfrom New Jersey's 4th districtIn officeMarch 4, 1895 – January 10, 1899Preceded byJohnston CornishSucceeded byJoshua Salmon Personal detailsBorn(1858-02-05)February...

Estella AgsteribbeEstella Agsteribbe è la quinta da destra nella foto di gruppo delle Olimpiadi del 1928Nazionalità Paesi Bassi Ginnastica artistica Palmarès  Olimpiadi OroAmsterdam 1928Squadra Il simbolo → indica un trasferimento in prestito.   Modifica dati su Wikidata · Manuale Estella Agsteribbe (Amsterdam, 6 aprile 1909 – Auschwitz, 17 settembre 1943) è stata una ginnasta olandese di origine ebraica, vittima dell'Olocausto. Biografia Nel 1928 Estella Agsterib...

 

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Central Institute of Psychiatry – news · newspapers · books · scholar · JSTOR (April 2022) (Learn how and when to remove this message) Central Institute of Psychiatry RanchiFormer namesRanchi European Lunatic AsylumTypePublic Medical SchoolEstablished17 Ma...

 

 

Love HappySutradaraDavid MillerProduserMary PickfordDavid MillerDitulis olehMac BenoffFrank TashlinHarpo Marx (cerita)PemeranHarpo MarxChico MarxGroucho MarxIlona MasseyVera-EllenMarion HuttonMarilyn MonroePenata musikAnn RonellSinematograferWilliam MellorPenyuntingBasil WrangellDistributorUnited ArtistsTanggal rilis 12 Oktober 1949 (1949-10-12) (San Francisco) 03 Maret 1950 (1950-03-03) Durasi85 menitNegaraAmerika SerikatBahasaInggris Love Happy adalah sebuah film komedi ...

Reza Ghoochannejhad Reza Ghoochanejhad 2014Informasi pribadiNama lengkap Reza Ghoochannejhad NourniaTanggal lahir 20 September 1987 (umur 26)Tempat lahir Mashhad, IranTinggi 1,80 m (5 ft 11 in)Posisi bermain Striker, SayapInformasi klubKlub saat ini Charlton AthleticNomor 8Karier junior LAC Frisia 1883 Cambuur1998-2005 HeerenveenKarier senior*Tahun Tim Tampil (Gol)2005-2009 Heerenveen 2 (0)2006-2007 → Go Ahead Eagles (pinjam) 13 (4)2009 → Emmen (pinjam) 12 (1)2009-2010 Go ...

 

 

Honkai Impact 3rd Разработчик miHoYo Издатель miHoYo Часть серии Honkai[вд] Даты выпуска 14 октября 2016 22 февраля 2017 28 марта 2018 Жанры Gacha, action, role-playing, hack and slash, adventure game, SciFi Технические данные Платформы Android, iOS, Windows Движок Unity Режимы игры одиночная игра[1], мультиплеер и кооперативный ...