FASTA

FASTA
Developer(s)
Stable release
36
Repository
Operating system
TypeBioinformatics
Licenseapache2.0
Website

FASTA is a DNA and protein sequence alignment software package first described by David J. Lipman and William R. Pearson in 1985.[1] Its legacy is the FASTA format which is now ubiquitous in bioinformatics.

History

The original FASTA program was designed for protein sequence similarity searching. Because of the exponentially expanding genetic information and the limited speed and memory of computers in the 1980s heuristic methods were introduced aligning a query sequence to entire data-bases. FASTA, published in 1987, added the ability to do DNA:DNA searches, translated protein:DNA searches, and also provided a more sophisticated shuffling program for evaluating statistical significance.[2] There are several programs in this package that allow the alignment of protein sequences and DNA sequences. Nowadays, increased computer performance makes it possible to perform searches for local alignment detection in a database using the Smith–Waterman algorithm.

FASTA is pronounced "fast A", and stands for "FAST-All", because it works with any alphabet, an extension of the original "FAST-P" (protein) and "FAST-N" (nucleotide) alignment tools.

Mappers timeline (since 2001). DNA mappers are plotted in blue, RNA mappers in red, miRNA mappers in green and bisulphite mappers in purple. Grey dotted lines connect related mappers (extensions or new versions). The timeline only includes mappers with peer-reviewed publications, and the date corresponds to the earliest date of publication (e.g. advanced publication date as opposed to the date of publication)

Uses

The current FASTA package contains programs for protein:protein, DNA:DNA, protein:translated DNA (with frameshifts), and ordered or unordered peptide searches. Recent versions of the FASTA package include special translated search algorithms that correctly handle frameshift errors (which six-frame-translated searches do not handle very well) when comparing nucleotide to protein sequence data.

In addition to rapid heuristic search methods, the FASTA package provides SSEARCH, an implementation of the optimal Smith–Waterman algorithm.

A major focus of the package is the calculation of accurate similarity statistics, so that biologists can judge whether an alignment is likely to have occurred by chance, or whether it can be used to infer homology. The FASTA package is available from the University of Virginia[3] and the European Bioinformatics Institute.[4]

The FASTA file format used as input for this software is now largely used by other sequence database search tools (such as BLAST) and sequence alignment programs (Clustal, T-Coffee, etc.).

Search method

FASTA takes a given nucleotide or amino acid sequence and searches a corresponding sequence database by using local sequence alignment to find matches of similar database sequences.

The FASTA program follows a largely heuristic method which contributes to the high speed of its execution. It initially observes the pattern of word hits, word-to-word matches of a given length, and marks potential matches before performing a more time-consuming optimized search using a Smith–Waterman type of algorithm.

The size taken for a word, given by the parameter kmer, controls the sensitivity and speed of the program. Increasing the k-mer value decreases number of background hits that are found. From the word hits that are returned the program looks for segments that contain a cluster of nearby hits. It then investigates these segments for a possible match.

There are some differences between fastn and fastp relating to the type of sequences used but both use four steps and calculate three scores to describe and format the sequence similarity results. These are:

  • Identify regions of highest density in each sequence comparison. Taking a k-mer to equal 1 or 2.
In this step all or a group of the identities between two sequences are found using a look up table. The k-mer value determines how many consecutive identities are required for a match to be declared. Thus the lesser the k-mer value: the more sensitive the search. k-mer=2 is frequently taken by users for protein sequences and kmer=4 or 6 for nucleotide sequences. Short oligonucleotides are usually run with k-mer= 1. The program then finds all similar local regions, represented as diagonals of a certain length in a dot plot, between the two sequences by counting k-mer matches and penalizing for intervening mismatches. This way, local regions of highest density matches in a diagonal are isolated from background hits. For protein sequences BLOSUM50 values are used for scoring k-mer matches. This ensures that groups of identities with high similarity scores contribute more to the local diagonal score than to identities with low similarity scores. Nucleotide sequences use the identity matrix for the same purpose. The best 10 local regions selected from all the diagonals put together are then saved.
  • Rescan the regions taken using the scoring matrices. trimming the ends of the region to include only those contributing to the highest score.
Rescan the 10 regions taken. This time use the relevant scoring matrix while rescoring to allow runs of identities shorter than the k-mer value. Also while rescoring conservative replacements that contribute to the similarity score are taken. Though protein sequences use the BLOSUM50 matrix, scoring matrices based on the minimum number of base changes required for a specific replacement, on identities alone, or on an alternative measure of similarity such as PAM, can also be used with the program. For each of the diagonal regions rescanned this way, a subregion with the maximum score is identified. The initial scores found in step1 are used to rank the library sequences. The highest score is referred to as init1 score.
  • In an alignment if several initial regions with scores greater than a CUTOFF value are found, check whether the trimmed initial regions can be joined to form an approximate alignment with gaps. Calculate a similarity score that is the sum of the joined regions penalising for each gap 20 points. This initial similarity score (initn) is used to rank the library sequences. The score of the single best initial region found in step 2 is reported (init1).
Here the program calculates an optimal alignment of initial regions as a combination of compatible regions with maximal score. This optimal alignment of initial regions can be rapidly calculated using a dynamic programming algorithm. The resulting score initn is used to rank the library sequences. This joining process increases sensitivity but decreases selectivity. A carefully calculated cut-off value is thus used to control where this step is implemented, a value that is approximately one standard deviation above the average score expected from unrelated sequences in the library. A 200-residue query sequence with k-mer 2 uses a value 28.
This step uses a banded Smith–Waterman algorithm to create an optimised score (opt) for each alignment of query sequence to a database(library) sequence. It takes a band of 32 residues centered on the init1 region of step2 for calculating the optimal alignment. After all sequences are searched the program plots the initial scores of each database sequence in a histogram, and calculates the statistical significance of the "opt" score. For protein sequences, the final alignment is produced using a full Smith–Waterman alignment. For DNA sequences, a banded alignment is provided.

FASTA can remove complexity regions before aligning the sequences by encoding low complexity regions in lower case and using the -S option. However, the BLAST program offers more options for correcting for biased composition statistics. Therefore, the program PRSS is added in the FASTA distribution package. PRSS shuffles the matching sequences in the database either on the one-letter level or it shuffles short segments which length the user can determine. The shuffled sequences are now aligned again and if the score is still higher than expected this is caused by the low complexity regions being mixed up still mapping to the query. By the amount of the score the shuffled sequences still attain PRSS now can predict the significance of the score of the original sequences. The higher the score of the shuffled sequences the less significant the matches found between original database and query sequence.[5]

The FASTA programs find regions of local or global similarity between Protein or DNA sequences, either by searching Protein or DNA databases, or by identifying local duplications within a sequence. Other programs provide information on the statistical significance of an alignment. Like BLAST, FASTA can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

See also

References

  1. ^ Lipman, DJ; Pearson, WR (1985). "Rapid and sensitive protein similarity searches". Science. 227 (4693): 1435–41. Bibcode:1985Sci...227.1435L. doi:10.1126/science.2983426. PMID 2983426. Closed access icon
  2. ^ Pearson, WR; Lipman, DJ (1988). "Improved tools for biological sequence comparison". Proceedings of the National Academy of Sciences of the United States of America. 85 (8): 2444–8. Bibcode:1988PNAS...85.2444P. doi:10.1073/pnas.85.8.2444. PMC 280013. PMID 3162770.
  3. ^ "FASTA Programs". Archived from the original on 2000-03-04.
  4. ^ "FASTA/SSEARCH/GGSEARCH/GLSEARCH < Sequence Similarity Searching < EMBL-EBI".
  5. ^ David W. Mount: Bioinformatics Sequence and Genome Analysis, Edition 1, Cold Spring Harbor Laboratory Press, 2001, pp. 295–297.

Read other articles:

The stave church in Borgund. N 351 is the Rundata catalog number for a medieval runic inscription carved on a piece of wood that was found at the north portal of the Borgund stave church in Norway. Description This runic inscription states that it was carved by a man named Þórir into a piece of wood while visiting the church during the mass of Saint Olaf during the Middle Ages. Olaf was king of Norway from 1015 to 1028 C.E. and legally recognized Christianity as the nation's religion in 102...

 

 

SingkilUpacara khitanan adat Singkil.Jumlah populasi74.000[1]Daerah dengan populasi signifikanAceh(terutama di Subulussalam, Aceh Singkil, Aceh Selatan, dan Aceh Tenggara)BahasaBahasa SingkilAgamaMayoritasIslam Suni (90%[2])Kelompok etnik terkaitBatak PakpakBatak KaroMinangkabauAceh Singkil adalah salah satu kelompok etnis yang menyebar dan menetap di wilayah Subulussalam, Aceh Singkil, serta sebagian wilayah Aceh Selatan dan Aceh Tenggara di Aceh. Dalam etnis Batak Pakpak, Si...

 

 

Public college in Sorel-Tracy, Quebec This article does not cite any sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Cégep de Sorel-Tracy – news · newspapers · books · scholar · JSTOR (August 2009) (Learn how and when to remove this template message) Cégep de Sorel-TracyMottoAvoir la reussite à coeur!Motto in EnglishHave success in your heartTypePu...

Russian politician and Mayor of Kherson You can help expand this article with text translated from the corresponding article in Russian. (April 2022) Click [show] for important translation instructions. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the translation is accurate, rather than simply copy-pasting machine-translated text into the English Wikipedia. Do not transl...

 

 

The Azad Brigade or the 3rd Guerrilla Regiment was a unit of the Indian National Army that formed a part of the First INA and later part of the 1st Division after the INA's revival under Subhas Chandra Bose. After the revival of the INA in February 1943, the 3rd Guerrilla Regiment came under the command of Col. Gulzara Singh and consisted of three infantry battalions. It was one of the units that participated in the INA's disastrous Imphal Campaign, arriving in upper Burma immediately before ...

 

 

Tila TequilaTequila pada Oktober 2008LahirThien Thanh Thi Nguyen24 Oktober 1981 (umur 42)SingapuraNama lainTila NguyenMiss TilaTornado Thien[1]Pekerjaan Model pemandu acara televisi penyanyi penulis lagu pemeran penulis blogger Tahun aktif2001–sekarangKota asalHouston, Texas, ASPasanganCasey Johnson (2007–10; kematian Johnson)Anak2Karier musikGenre Hip hop pop rap pop rock electropop Label The Saturday Team will.i.am Music Group Thien Thanh Thi Nguyen[2 ...

ХристианствоБиблия Ветхий Завет Новый Завет Евангелие Десять заповедей Нагорная проповедь Апокрифы Бог, Троица Бог Отец Иисус Христос Святой Дух История христианства Апостолы Хронология христианства Раннее христианство Гностическое христианство Вселенские соборы Н...

 

 

American criminal and cult leader (1934–2017) Charles MansonManson's 1968 mugshotBornCharles Milles Maddox(1934-11-12)November 12, 1934Cincinnati, Ohio, U.S.DiedNovember 19, 2017(2017-11-19) (aged 83)Bakersfield, California, U.S.Known forManson Family murdersSpouses Rosalie Willis ​ ​(m. 1955; div. 1958)​ Leona Stevens ​ ​(m. 1959; div. 1963)​ Children3Conviction(s) First degree murder ...

 

 

Professional American soccer club Soccer clubDetroit City FCNickname(s)Le RougeFounded2012; 12 years ago (2012)StadiumKeyworth StadiumHamtramck, MichiganCapacity7,231CEOSean MannHead coachDanny DichioLeagueUSL Championship20238th, Eastern ConferencePlayoffs: Conference SemifinalsWebsiteClub website Home colors Away colors Alternative colors Current season Detroit City FC (DCFC) is an American professional soccer club based in Detroit, Michigan, that competes in the USL Champ...

此条目序言章节没有充分总结全文内容要点。 (2019年3月21日)请考虑扩充序言,清晰概述条目所有重點。请在条目的讨论页讨论此问题。 哈萨克斯坦總統哈薩克總統旗現任Қасым-Жомарт Кемелұлы Тоқаев卡瑟姆若马尔特·托卡耶夫自2019年3月20日在任任期7年首任努尔苏丹·纳扎尔巴耶夫设立1990年4月24日(哈薩克蘇維埃社會主義共和國總統) 哈萨克斯坦 哈萨克斯坦政府...

 

 

French soldier and diplomat S.E. M le marquis de La Luzerne Anne-César de La Luzerne (15 July 1741 – 14 September 1791) was an 18th-century French soldier and diplomat who had an influential role to the Continental Congress and new government of the United States of America after it gained independence from Great Britain. Descended from an illustrious Normandy family, as a Knight of Malta and the Order of Saint Louis he was styled Chevalier before King Louis XVI created him a Marquis in 17...

 

 

Turbine engine driving an aircraft propeller Not to be confused with propfan or turbofan. GE T64 turboprop, with the propeller on the left, the gearbox with accessories in the middle, and the gas generator (turbine) on the right A turboprop is a turbine engine that drives an aircraft propeller.[1] A turboprop consists of an intake, reduction gearbox, compressor, combustor, turbine, and a propelling nozzle.[2] Air enters the intake and is compressed by the compressor. Fuel is t...

Малабарский берег на карте Индии Малаба́рский берег — длинное и узкое побережье на юго-западе полуострова Индостан, расположенное к югу от Гоа, между Индийским океаном и Западными Гатами. Побережье простирается на 845 км по территории индийских штатов Карнатака и К�...

 

 

German-american classical philologist Werner Jaeger. Lithography by Max Liebermann (1915) Werner Wilhelm Jaeger (30 July 1888 – 19 October 1961) was a German-American classicist. Life Werner Wilhelm Jaeger was born in Lobberich, Rhenish Prussia in the German Empire. He attended school in Lobberich and at the Gymnasium Thomaeum in Kempen. Jaeger studied at the University of Marburg and University of Berlin. He received a Ph.D. from the latter in 1911 for a dissertation on the Metaphysics of ...

 

 

Iput IIMakamPiramida di SaqqaraPekerjaanRatu MesirSuami/istriPepi IIOrang tuaPepi I Meryre Iput Era: Kerajaan Baru(1550–1069 BC) Hieroglif Mesir Iput merupakan seorang permaisuri Mesir Kuno dari Dinasti keenam, ia adalah saudari dan istri Pepi II.[1] Gelar Gelar-gelarnya adalah: Putri Raja (z.t-nỉswt), dan Putri Sulung Raja (z.t-nỉswt-šms.t) menunjukkan bahwa Iput II adalah putri Firaun (Pepi I Meryre). Gelar Putri Herediter (ỉrỉỉ.t-pˁt) mengenalinya sebagai seoran...

Russian prince (1886-1918) This article does not cite any sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Prince John Konstantinovich of Russia – news · newspapers · books · scholar · JSTOR (July 2009) (Learn how and when to remove this message) Prince John ConstantinovichBorn(1886-07-05)5 July 1886Pavlovsk Palace, Pavlovsk, Saint Petersburg, Russian Empir...

 

 

VaumoisecomuneVaumoise – Veduta LocalizzazioneStato Francia RegioneAlta Francia Dipartimento Oise ArrondissementSenlis CantoneCrépy-en-Valois TerritorioCoordinate49°14′N 2°59′E49°14′N, 2°59′E (Vaumoise) Altitudine66 e 123 m s.l.m. Superficie3,14 km² Abitanti985[1] (2009) Densità313,69 ab./km² Altre informazioniCod. postale60117 Fuso orarioUTC+1 Codice INSEE60661 CartografiaVaumoise Sito istituzionaleModifica dati su Wikidata · Manuale ...

 

 

German conductor Hermann ScherchenBorn(1891-06-21)21 June 1891BerlinDied12 June 1966(1966-06-12) (aged 74)FlorenceOccupationConductor Hermann Scherchen (21 June 1891 – 12 June 1966) was a German conductor, who was principal conductor of the city orchestra of Winterthur from 1922 to 1950. He promoted contemporary music, beginning with Schoenberg's Pierrot Lunaire, followed by works by Richard Strauss, Anton Webern, Alban Berg, Edgard Varèse, later Iannis Xenakis, Luigi Nono and Leon Sc...

As of December 2023, nine women have served as governor of an Argentine province. Only seven (out of 23) of the country's provinces have been governed by women. Following the 2023 provincial elections, for the first time since the first woman was elected to a provincial governorship in 2007, no women are presently serving as head of a provincial executive in Argentina.[1] List of female governors          Incumbent Picture Name(Lifespan) ...

 

 

Lidia ȘimonNazionalità Romania Altezza160 cm Peso43 kg Atletica leggera SpecialitàMezzofondo, fondo SocietàDinamo București Record 10.000 m 31'3264 (1998) Mezza maratona 1h08'34 (2000) Maratona 2h22'54 (2000) CarrieraNazionale 1995- Romania Palmarès Competizione Ori Argenti Bronzi Giochi olimpici 0 1 0 Mondiali 1 0 2 Mondiali di mezza maratona 0 1 3 Europei 0 0 1 Vedi maggiori dettagliStatistiche aggiornate al 19 luglio 2011 Modifica dati su Wikidata · Manuale Lidia Elena...