Highway network

In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous neural networks.[1][2][3] It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by long short-term memory (LSTM) recurrent neural networks.[4][5] The advantage of the Highway Network over other deep learning architectures is its ability to overcome or partially prevent the vanishing gradient problem,[6] thus improving its optimization. Gating mechanisms are used to facilitate information flow across the many layers ("information highways").[1][2]

Highway Networks have found use in text sequence labeling and speech recognition tasks.[7][8]

In 2014, the state of the art was training deep neural networks with 20 to 30 layers.[9] Stacking too many layers led to a steep reduction in training accuracy,[10] known as the "degradation" problem.[11] In 2015, two techniques were developed to train such networks: the Highway Network (published in May), and the residual neural network, or ResNet[12] (December). ResNet behaves like an open-gated Highway Net.

Model

The model has two gates in addition to the gate: the transform gate and the carry gate . The latter two gates are non-linear transfer functions (specifically sigmoid by convention). The function can be any desired transfer function.

The carry gate is defined as:

while the transform gate is just a gate with a sigmoid transfer function.

Structure

The structure of a hidden layer in the Highway Network follows the equation:

Sepp Hochreiter analyzed the vanishing gradient problem in 1991 and attributed to it the reason why deep learning did not work well.[6] To overcome this problem, Long Short-Term Memory (LSTM) recurrent neural networks[4] have residual connections with a weight of 1.0 in every LSTM cell (called the constant error carrousel) to compute . During backpropagation through time, this becomes the residual formula for feedforward neural networks. This enables training very deep recurrent neural networks with a very long time span t. A later LSTM version published in 2000[5] modulates the identity LSTM connections by so-called "forget gates" such that their weights are not fixed to 1.0 but can be learned. In experiments, the forget gates were initialized with positive bias weights,[5] thus being opened, addressing the vanishing gradient problem. As long as the forget gates of the 2000 LSTM are open, it behaves like the 1997 LSTM.

The Highway Network of May 2015[1] applies these principles to feedforward neural networks. It was reported to be "the first very deep feedforward network with hundreds of layers".[13] It is like a 2000 LSTM with forget gates unfolded in time,[5] while the later Residual Nets have no equivalent of forget gates and are like the unfolded original 1997 LSTM.[4] If the skip connections in Highway Networks are "without gates," or if their gates are kept open (activation 1.0), they become Residual Networks.

The residual connection is a special case of the "short-cut connection" or "skip connection" by Rosenblatt (1961)[14] and Lang & Witbrock (1988)[15] which has the form . Here the randomly initialized weight matrix A does not have to be the identity mapping. Every residual connection is a skip connection, but almost all skip connections are not residual connections.

The original Highway Network paper[16] not only introduced the basic principle for very deep feedforward networks, but also included experimental results with 20, 50, and 100 layers networks, and mentioned ongoing experiments with up to 900 layers. Networks with 50 or 100 layers had lower training error than their plain network counterparts, but no lower training error than their 20 layers counterpart (on the MNIST dataset, Figure 1 in [16]). No improvement on test accuracy was reported with networks deeper than 19 layers (on the CIFAR-10 dataset; Table 1 in [16]). The ResNet paper,[17] however, provided strong experimental evidence of the benefits of going deeper than 20 layers. It argued that the identity mapping without modulation is crucial and mentioned that modulation in the skip connection can still lead to vanishing signals in forward and backward propagation (Section 3 in [17]). This is also why the forget gates of the 2000 LSTM[18] were initially opened through positive bias weights: as long as the gates are open, it behaves like the 1997 LSTM. Similarly, a Highway Net whose gates are opened through strongly positive bias weights behaves like a ResNet. The skip connections used in modern neural networks (e.g., Transformers) are dominantly identity mappings.

References

  1. ^ a b c Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (2 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].
  2. ^ a b Srivastava, Rupesh K; Greff, Klaus; Schmidhuber, Juergen (2015). "Training Very Deep Networks". Advances in Neural Information Processing Systems. 28. Curran Associates, Inc.: 2377–2385.
  3. ^ Schmidhuber, Jürgen (2021). "The most cited neural networks all build on work done in my labs". AI Blog. IDSIA, Switzerland. Retrieved 2022-04-30.
  4. ^ a b c Sepp Hochreiter; Jürgen Schmidhuber (1997). "Long short-term memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
  5. ^ a b c d Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". Neural Computation. 12 (10): 2451–2471. CiteSeerX 10.1.1.55.5709. doi:10.1162/089976600300015015. PMID 11032042. S2CID 11598600.
  6. ^ a b Hochreiter, Sepp (1991). Untersuchungen zu dynamischen neuronalen Netzen (PDF) (diploma thesis). Technical University Munich, Institute of Computer Science, advisor: J. Schmidhuber.
  7. ^ Liu, Liyuan; Shang, Jingbo; Xu, Frank F.; Ren, Xiang; Gui, Huan; Peng, Jian; Han, Jiawei (12 September 2017). "Empower Sequence Labeling with Task-Aware Neural Language Model". arXiv:1709.04109 [cs.CL].
  8. ^ Kurata, Gakuto; Ramabhadran, Bhuvana; Saon, George; Sethy, Abhinav (19 September 2017). "Language Modeling with Highway LSTM". arXiv:1709.06436 [cs.CL].
  9. ^ Simonyan, Karen; Zisserman, Andrew (2015-04-10), Very Deep Convolutional Networks for Large-Scale Image Recognition, arXiv:1409.1556
  10. ^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification". arXiv:1502.01852 [cs.CV].
  11. ^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (10 Dec 2015). Deep Residual Learning for Image Recognition. arXiv:1512.03385.
  12. ^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE. pp. 770–778. arXiv:1512.03385. doi:10.1109/CVPR.2016.90. ISBN 978-1-4673-8851-1.
  13. ^ Schmidhuber, Jürgen (2015). "Highway Networks (May 2015): First Working Really Deep Feedforward Neural Networks With Over 100 Layers".
  14. ^ Rosenblatt, Frank (1961). Principles of neurodynamics. perceptrons and the theory of brain mechanisms (PDF).
  15. ^ Lang, Kevin; Witbrock, Michael (1988). "Learning to tell two spirals apart" (PDF). Proceedings of the 1988 Connectionist Models Summer School: 52–59.
  16. ^ a b c Srivastava, Rupesh Kumar; Greff, Klaus; Schmidhuber, Jürgen (3 May 2015). "Highway Networks". arXiv:1505.00387 [cs.LG].
  17. ^ a b He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian (2015). "Identity Mappings in Deep Residual Networks". arXiv:1603.05027 [cs.CV].
  18. ^ Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). "Learning to Forget: Continual Prediction with LSTM". Neural Computation. 12 (10): 2451–2471. CiteSeerX 10.1.1.55.5709. doi:10.1162/089976600300015015. PMID 11032042. S2CID 11598600.

Read other articles:

Peta wilayah Komune Cascia (merah) di Provinsi Perugia (emas), Umbria, Italia. Cascia commune di Italia Cascia (it) Tempat categoria:Articles mancats de coordenades Negara berdaulatItaliaRegion di ItaliaUmbraProvinsi di ItaliaProvinsi Perugia NegaraItalia Ibu kotaCascia PendudukTotal2.957  (2023 )Bahasa resmiItalia GeografiLuas wilayah180,85 km² [convert: unit tak dikenal]Ketinggian653 m Berbatasan denganCerreto di Spoleto Cittareale (en) Leonessa (en) Monteleone di Spoleto Norcia ...

 

 

Bangunan altar utama (honden) Kuil Ujigami di Uji, Prefektur Kyoto, Situs Warisan Dunia UNESCO. Bangunan tertua kuil Shinto yang masih ada, didirikan sekitar tahun 1060. Arsitektur kuil Shinto (神社建築code: ja is deprecated , Jinja kenchiku) adalah arsitektur yang berkaitan dengan bangunan kuil Shinto, mencakup kompleks kuil yang antara lain terdiri dari altar utama (honden), aula persembahan (heiden), aula pemujaan (haiden), tempat air untuk bersuci (chōzuya), pagar (tamagaki), dan tor...

 

 

Acmocerini Acmocera olympiana Klasifikasi ilmiah Kerajaan: Animalia Filum: Arthropoda Kelas: Insecta Ordo: Coleoptera Famili: Cerambycidae Subfamili: Lamiinae Tribus: AcmoceriniThomson, 1864 Acmocerini merupakan tribus dari kumbang tanduk panjang (famili Cerambycidae) subfamilia Lamiinae. Pertama kali dideskripsikan oleh Thomson pada 1864.[1] Genus Acmocera Dejean, 1835 Acridocera Jordan, 1903 Acridoschema Thomson, 1858 Discoceps Jordan, 1894 Fasciculacmocera Breuning, 1966 Mimacmoce...

Strada statale 282delle FossiateLocalizzazioneStato Italia Regioni Calabria DatiClassificazioneStrada statale InizioSS 177 presso Lago Cecita Fineex SS 108 ter presso Campana Lunghezza44,760[1] km Provvedimento di istituzioneD.M. 16/11/1959 - G.U. 41 del 18/02/1960 (da Lago di Cecita alla località Fossiate)[2] D.M. 6/11/1967 - G.U. 319 del 22/12/1967 (dalla località Fossiate a Campana[3] GestoreTratte ANAS: nessuna (dal 2002 la gestione è passata alla Prov...

 

 

The Right HonourableSir Winston ChurchillKG OM CH TD DL FRS PC RAThe Roaring Lion, potret ikonis karya Yousuf Karsh, diambil di Parlemen Kanada, Desember 1941 Perdana Menteri Britania RayaMasa jabatan26 Oktober 1951 – 5 April 1955Penguasa monarkiGeorge VIElizabeth IIWakilAnthony Eden PendahuluClement AttleePenggantiAnthony EdenMasa jabatan10 Mei 1940 – 26 Juli 1945Penguasa monarkiGeorge VIWakilClement Attlee (1942–1945) PendahuluNeville Cham...

 

 

Antenna of Otakadoyayama Transmitter Otakadoyayama Transmitter (おおたかどや山標準電波送信所, otakadoyayama-hyoujyundenpa-soushinjyo) is an LF-time signal transmitter at Tamura-City, Fukushima-ken, Japan used for transmitting the time signal JJY on 40 kHz. The Otakadoyama site is one of two JJY transmitters, another is the Haganeyama site. Summy[1][2] NAME:NICT Otakadoyayama LF station Location:Summit of Mt. Otakadoya, Tamura-City, Fukushima-ken Elevatio...

Study of the structure of organisms and their parts Anatomic redirects here. For the Afro Celt Sound System album, see Volume 5: Anatomic. For the anatomy of plants, see Plant anatomy. For other uses, see Anatomy (disambiguation). One of the large, detailed illustrations in Andreas Vesalius's De humani corporis fabrica 16th century, marking the rebirth of anatomy Part of a series onBiologyScience of life Index Outline Glossary History (timeline) Key components Cell theory Ecosystem Evolution ...

 

 

Ираклеониты — ученики гностика Ираклеона (II век). Упоминаются как особая секта Епифанием и Августином; при крещении и миропомазании они соблюдали обряд помазания елеем и при этом произносили воззвания на арамейском языке, которые должны были освободить душу от власт�...

 

 

Державний комітет телебачення і радіомовлення України (Держкомтелерадіо) Приміщення комітетуЗагальна інформаціяКраїна  УкраїнаДата створення 2003Керівне відомство Кабінет Міністрів УкраїниРічний бюджет 1 964 898 500 ₴[1]Голова Олег НаливайкоПідвідомчі ор...

2007 studio album by Maxïmo ParkOur Earthly PleasuresStudio album by Maxïmo ParkReleased2 April 2007 (UK)8 May 2007 (US)RecordedAugust–December 2006GenreAlternative rockindie rockpost-punk revivalLength41:47LabelWarpProducerGil NortonMaxïmo Park chronology Found on Film(2006) Our Earthly Pleasures(2007) Quicken The Heart(2009) Singles from Our Earthly Pleasures Our VelocityReleased: 19 March 2007 Books from BoxesReleased: 11 June 2007 Girls Who Play GuitarsReleased: 20 August 20...

 

 

Untuk seri web, film, dan sinetron Indonesia berturut-turut yang terkait, lihat Sebelum Dunia Terbalik, Sebelum Dunia Terbalik: Tragedi Cinta Kemed dan Eros the Movie, dan Dunia Terbalik. 4 Sekawan Sebelum Dunia TerbalikBannerSutradaraIip S. HananProduser Tia Hendani Wailang Menayang Sam Sagi SkenarioSyarif UsmanPemeran Arbani Yasiz Denira Wiraguna Qausar H.Y. Mahdy Reza Kris Anjar Bhisma Mulia Mumu Gomez Penata musikJoseph S. DjafarSinematograferTurpin SihombingPenyuntingD.K.SenjaPerus...

 

 

Частина серії проФілософіяLeft to right: Plato, Kant, Nietzsche, Buddha, Confucius, AverroesПлатонКантНіцшеБуддаКонфуційАверроес Філософи Епістемологи Естетики Етики Логіки Метафізики Соціально-політичні філософи Традиції Аналітична Арістотелівська Африканська Близькосхідна іранська Буддій�...

Race or ethnic-based discrimination Part of a series onDiscrimination Forms Institutional Structural Statistical Taste-based Attributes Age Caste Class Dialect Disability Genetic Hair texture Height Language Looks Mental disorder Race / Ethnicity Skin color Scientific racism Rank Sex Sexual orientation Species Size Viewpoint Social Arophobia Acephobia Adultism Anti-albinism Anti-autism Anti-homelessness Anti-drug addicts Anti-intellectualism Anti-intersex Anti-left handedness Anti-Ma...

 

 

Canadian economist This biography of a living person relies too much on references to primary sources. Please help by adding secondary or tertiary sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately, especially if potentially libelous or harmful.Find sources: William Lazonick – news · newspapers · books · scholar · JSTOR (April 2015) (Learn how and when to remove this message)This article m...

 

 

Basketball competition in South America South American Basketball ChampionshipMost recent season or competition:2016 South American Basketball ChampionshipSportBasketballFounded1930First season1930No. of teams10CountrySouth American countriesContinentSouth AmericaMost recentchampion(s) Venezuela (3rd title)Most titles Brazil (18 titles)RelatedcompetitionsFIBA AmeriCupOfficial websiteFIBAAmericas.com The South American Basketball Championship, or FIBA South American Championship, was...

Dance leading to a trance and a feeling of ecstasy Ecstatically dancing maenad. Detail from a Paestan red-figure skyphos, c. 330-320 BC Ecstatic dance is a form of dance in which the dancers, sometimes without the need to follow specific steps, release themselves to the rhythm and move freely as the music takes them, leading to trance and a feeling of ecstasy. The effects of ecstatic dance begin with ecstasy itself, which may be experienced in differing degrees. Dancers are described as feel...

 

 

Whitney Houston là một trong những nghệ sĩ âm nhạc nổi danh nhất mọi thời đại. Nhạc sĩ (Hán Nôm: 藝士音樂; tiếng Anh: musical artist, gọi tắt: musician) hoặc nghệ sĩ âm nhạc là người sáng tác, chỉ huy và biểu diễn âm nhạc (không bao gồm mảng lý luận).[1] Bài viết này sẽ phân tích về thuật ngữ tiếng Anh musician theo cách hiểu của người Mỹ và theo cơ quan Dịch vụ Việc làm Hoa Kỳ th...

 

 

Marks & Spencer Logo Rechtsform Public Limited Company ISIN GB0031274896 Gründung 1884 Sitz Lancing, Vereinigtes Konigreich Vereinigtes Königreich Leitung Archie Norman, ChairmanSteve Rowe (CEO) Mitarbeiterzahl 78.597[1] Umsatz 9,155 Mrd. GBP[2] Branche Einzelhandel Website www.marksandspencer.com Stand: 30. März 2019 Marks & Spencer ist ein großes Einzelhandelsunternehmen aus Großbritannien. Das Unternehmen ist neben den Supermarktketten Tesco, Sainsbu...

Belgian historian (1862–1935) For the Belgian scientist in the physiology of vision, see Maurice Henri Léonard Pirenne. Henri PirennePortrait of Pirenne, c. 1910Born(1862-12-23)23 December 1862Verviers, BelgiumDied25 October 1935(1935-10-25) (aged 72)Uccle, BelgiumOccupation(s)Historian and political activistSpouseJenny VanderhaeghenChildrenHenri Pirenne (1888–1935), Jacques Pirenne (1891–1972), Pierre Pirenne (1895–1914), Robert Pirenne (1900–1931), Jacqueline Pirenne (Grandd...

 

 

التركيب الكيميائي العام لـ acyl-CoA ، حيث R هي سلسلة جانبية من الأحماض الدهنية أسيل مرافق الإنزيم-أ (بالإنجليزية: Acyl-CoA)‏ هي مجموعة من الإنزيمات المساعدة التي تستقلب الأحماض الدهنية . تتعرض Acyl-CoA لعملية أكسدة بيتا لتشكل في النهاية(أستيل كو أيه) acetyl-CoA. يدخل أستيل كو أيه في حلقة حمض...