HITS algorithm

Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that they held, but were used as compilations of a broad catalog of information that led users direct to other authoritative pages. In other words, a good hub represents a page that pointed to many other pages, while a good authority represents a page that is linked by many different hubs.[1]

The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages.

History

In journals

Many methods have been used to rank the importance of scientific journals. One such method is Garfield's impact factor. Journals such as Science and Nature are filled with numerous citations, making these magazines have very high impact factors. Thus, when comparing two more obscure journals which have received roughly the same number of citations but one of these journals has received many citations from Science and Nature, this journal needs be ranked higher. In other words, it is better to receive citations from an important journal than from an unimportant one.[2]

On the Web

This phenomenon also occurs in the Internet. Counting the number of links to a page can give us a general estimate of its prominence on the Web, but a page with very few incoming links may also be prominent, if two of these links come from the home pages of sites like Yahoo!, Google, or MSN. Because these sites are of very high importance but are also search engines, a page can be ranked much higher than its actual relevance.

Algorithm

Expanding the root set into a base set

Steps

In the HITS algorithm, the first step is to retrieve the most relevant pages to the search query. This set is called the root set and can be obtained by taking the top pages returned by a text-based search algorithm. A base set is generated by augmenting the root set with all the web pages that are linked from it and some of the pages that link to it. The web pages in the base set and all hyperlinks among those pages form a focused subgraph. The HITS computation is performed only on this focused subgraph. According to Kleinberg the reason for constructing a base set is to ensure that most (or many) of the strongest authorities are included.

Authority and hub values are defined in terms of one another in a mutual recursion. An authority value is computed as the sum of the scaled hub values that point to that page. A hub value is the sum of the scaled authority values of the pages it points to. Some implementations also consider the relevance of the linked pages.

The algorithm performs a series of iterations, each consisting of two basic steps:

  • Authority update: Update each node's authority score to be equal to the sum of the hub scores of each node that points to it. That is, a node is given a high authority score by being linked from pages that are recognized as Hubs for information.
  • Hub update: Update each node's hub score to be equal to the sum of the authority scores of each node that it points to. That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject.

The Hub score and Authority score for a node is calculated with the following algorithm:

  • Start with each node having a hub score and authority score of 1.
  • Run the authority update rule
  • Run the hub update rule
  • Normalize the values by dividing each Hub score by square root of the sum of the squares of all Hub scores, and dividing each Authority score by square root of the sum of the squares of all Authority scores.
  • Repeat from the second step as necessary.

Comparison to PageRank

HITS, like Page and Brin's PageRank, is an iterative algorithm based on the linkage of the documents on the web. However it does have some major differences:

  • It is processed on a small subset of ‘relevant’ documents (a 'focused subgraph' or base set), instead of the set of all documents as was the case with PageRank.
  • It is query-dependent: the same page can receive a different hub/authority score given a different base set, which appears for a different query;
  • It must, as a corollary, be executed at query time, not at indexing time, with the associated drop in performance that accompanies query-time processing.
  • It computes two scores per document (hub and authority) as opposed to a single score;
  • It is not commonly used by search engines (though a similar algorithm was said to be used by Teoma, which was acquired by Ask Jeeves/Ask.com).

In detail

To begin the ranking, we let and for each page . We consider two types of updates: Authority Update Rule and Hub Update Rule. In order to calculate the hub/authority scores of each node, repeated iterations of the Authority Update Rule and the Hub Update Rule are applied. A k-step application of the Hub-Authority algorithm entails applying for k times first the Authority Update Rule and then the Hub Update Rule.

Authority update rule

For each , we update to where is all pages which link to page . That is, a page's authority score is the sum of all the hub scores of pages that point to it.

Hub update rule

For each , we update to where is all pages which page links to. That is, a page's hub score is the sum of all the authority scores of pages it points to.

Normalization

The final hub-authority scores of nodes are determined after infinite repetitions of the algorithm. As directly and iteratively applying the Hub Update Rule and Authority Update Rule leads to diverging values, it is necessary to normalize the matrix after every iteration. Thus the values obtained from this process will eventually converge.

Pseudocode

G := set of pages
for each page p in G do
    p.auth = 1 // p.auth is the authority score of the page p
    p.hub = 1 // p.hub is the hub score of the page p
for step from 1 to k do // run the algorithm for k steps
    norm = 0
    for each page p in G do  // update all authority values first
        p.auth = 0
        for each page q in p.incomingNeighbors do // p.incomingNeighbors is the set of pages that link to p
            p.auth += q.hub
        norm += square(p.auth) // calculate the sum of the squared auth values to normalise
    norm = sqrt(norm)
    for each page p in G do  // update the auth scores 
        p.auth = p.auth / norm  // normalise the auth values
    norm = 0
    for each page p in G do  // then update all hub values
        p.hub = 0
        for each page r in p.outgoingNeighbors do // p.outgoingNeighbors is the set of pages that p links to
            p.hub += r.auth
        norm += square(p.hub) // calculate the sum of the squared hub values to normalise
    norm = sqrt(norm)
    for each page p in G do  // then update all hub values
        p.hub = p.hub / norm   // normalise the hub values

The hub and authority values converge in the pseudocode above.

The code below does not converge, because it is necessary to limit the number of steps that the algorithm runs for. One way to get around this, however, would be to normalize the hub and authority values after each "step" by dividing each authority value by the square root of the sum of the squares of all authority values, and dividing each hub value by the square root of the sum of the squares of all hub values. This is what the pseudocode above does.

Non-converging pseudocode

G := set of pages
for each page p in G do
    p.auth = 1 // p.auth is the authority score of the page p
    p.hub = 1 // p.hub is the hub score of the page p

function HubsAndAuthorities(G)
    for step from 1 to k do // run the algorithm for k steps
        for each page p in G do  // update all authority values first
            p.auth = 0
            for each page q in p.incomingNeighbors do // p.incomingNeighbors is the set of pages that link to p
                p.auth += q.hub
        for each page p in G do  // then update all hub values
            p.hub = 0
            for each page r in p.outgoingNeighbors do // p.outgoingNeighbors is the set of pages that p links to
                p.hub += r.auth

See also

References

  1. ^ Christopher D. Manning; Prabhakar Raghavan; Hinrich Schütze (2008). "Introduction to Information Retrieval". Cambridge University Press. Retrieved 2008-11-09.
  2. ^ Kleinberg, Jon (December 1999). "Hubs, Authorities, and Communities". Cornell University. Retrieved 2008-11-09.

Read other articles:

Pour les articles homonymes, voir Heavy metal (homonymie) et Metal (homonymie). Heavy metalLe guitariste Janick Gers et le bassiste Steve Harris du groupe Iron Maiden.DétailsOrigines stylistiques Hard rock, rock 'n' roll, blues rock, rock psychédélique, garage rock[1]Origines culturelles Fin des années 1960 ; Royaume-Uni et États-UnisInstruments typiques Guitare électrique, basse, batterie, chant, parfois : claviersPopularité Mondiale, surtout dans les années 1980Scènes r�...

 

 

العلاقات البحرينية الجزائرية البحرين الجزائر   البحرين   الجزائر تعديل مصدري - تعديل   العلاقات البحرينية الجزائرية هي العلاقات الثنائية التي تجمع بين البحرين والجزائر.[1][2][3][4][5] مقارنة بين البلدين هذه مقارنة عامة ومرجعية للدولتين: وجه ال...

 

 

Membabi ButaIndeks kartuSutradaraJoel FadlyProtagonisPrisia NasutionIvanka SuwandiLenny CharlotteProduksi seni pertunjukanDevi MonicaPenampilan perdana4 Mei 2017Bahasa asli (film atau acara televisi)Indonesia Lokasi pemfilmanIndonesia DeskripsiGenrefilm horor Latar tempatIndonesia lbs Membabi Buta (digayakan sebagai Membabi-Buta) merupakan film hantu-seru Indonesia yang dirilis pada 4 Mei 2017 dan disutradarai oleh Joel Fadly. Film ini dibintangi oleh Prisia Nasution, Ivanka Suwandi, dan Len...

† Человек прямоходящий Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:ЧетвероногиеКлада:АмниотыКлада:Синапсиды�...

 

 

Major watercourse in northwestern North America Yukon RiverCanoeing the Yukon RiverLocation of the Yukon River and watershedNative nameŲųg Han (Gwichʼin)Yuk Han (Gwichʼin)Kuigpak (Central Yupik)Kuukpak (Inupiaq)Yeqin (Degexit'an)Tth'echù' (Hän)Chuu k'onn (Hän)Chu Nìikwän (Southern Tutchone)LocationCountriesUnited StatesCanadaStateAlaskaProvince/TerritoryBritish ColumbiaYukonPhysical characteristicsSourceLlewellyn Glacier at Atlin Lake ...

 

 

School Success and Opportunity ActCalifornia State LegislatureFull nameSchool Success and Opportunity ActIntroducedFebruary 22, 2013Assembly votedJuly 3, 2013Senate votedJuly 3, 2013Signed into lawAugust 12, 2013Sponsor(s)Sen. Mark Leno, Assem. Tom AmmianoGovernorJerry BrownCodeEducation CodeSectionSection 221.5ResolutionAB 1266Websitehttp://www.leginfo.ca.gov/pub/13-14/bill/asm/ab_1251-1300/ab_1266_cfa_20130501_170107_asm_floor.htmlStatus: Current legislation The School Success and Opportuni...

Lemony Snicket - Una serie di sfortunati eventiIl conte Olaf (Jim Carrey) in una scena del filmTitolo originaleLemony Snicket's A Series of Unfortunate Events Lingua originaleinglese Paese di produzioneStati Uniti d'America Anno2004 Durata108 min Rapporto1,85:1 Generecommedia, fantastico, avventura RegiaBrad Silberling SoggettoLemony Snicket (romanzi) SceneggiaturaRobert Gordon ProduttoreJim Van Wyck, Walter F. Parkes, Laurie MacDonald Produttore esecutivoJulia Pistor, Scott Rudin, Ba...

 

 

Star in the constellation Cygnus V1500 Cygni Nova Cygni 1975 (center), photographed at 07:00 UT August 30, 1975. Also shown are 59 Cygni (magnitude 4.8), 60 Cygni (magnitude 5.4) and 63 Cygni (magnitude 4.5) Observation dataEpoch J2000.0      Equinox J2000.0 Constellation Cygnus Right ascension 21h 11m 36.5810s[1] Declination +48° 09′ 01.952″[1] Apparent magnitude (V) 1.69 to <21[2] Characteristics V...

 

 

The reconstruction of armed forces in West Germany after World War II West Germany joins NATO: Walter Hallstein (left) and Konrad Adenauer (centre) at the NATO Conference in Paris in 1954 West German rearmament (German: Wiederbewaffnung) began in the decades after World War II. Fears of another rise of German militarism caused the new military to operate within an alliance framework, under NATO command.[1] The events led to the establishment of the Bundeswehr, the West German military...

Mirogojkyrkogården Entrén till Mirogojkyrkogården år 2007.Plats Zagreb, KroatienKoordinater45°50′08″N 15°59′04″Ö / 45.835694444444°N 15.984444444444°Ö / 45.835694444444; 15.984444444444 Invigd6 november 1876[1]ÄgareZagrebs stad Mirogojkyrkogården (kroatiska: Mirogoj) är den centrala begravningsplatsen i Kroatiens huvudstad Zagreb.[2] På begravningsplatsen som ligger i stadsdelen Gornji grad-Medveščak, cirka 4 km från Ban Jelačićs torg och Zag...

 

 

American actor (born 1967) For the British composer, musician, and music critic, see Michael Easton (composer). Michael EastonBornMichael Easton (1967-02-15) February 15, 1967 (age 57)Inglewood, California, USAlma materUniversity of California, Los AngelesOccupationActorYears active1990–presentSpouse Ginevra Arabia ​(m. 2004)​Children2Websitehttps://www.michaeleaston.com/ Michael Easton (born February 15, 1967) is an American film, television and ...

 

 

  لمعانٍ أخرى، طالع محمد الحسن (توضيح). محمد الحسن معلومات شخصية الميلاد 9 يناير 1984 (العمر 40 سنة)غانا  مركز اللعب حارس مرمى الجنسية غانا  معلومات النادي النادي الحالي New York Clarkstown SC Eagles الرقم 1 مسيرة الشباب سنوات فريق –2003 أشانتي كوتوكو المسيرة الاحترافية1 سنوات فريق م...

Merna KennedyKennedy 1933LahirMaude Kahler(1908-09-07)7 September 1908Kankakee, Illinois, Amerika SerikatMeninggal20 Desember 1944(1944-12-20) (umur 36)Los Angeles, California, Amerika SerikatPekerjaanAktrisTahun aktif1928–1934Suami/istriBusby Berkeley ​ ​(m. 1934; bercerai 1935)​Forrest Brayton ​(m. 1944)​ Merna Kennedy (nama lahir Maude Kahler;[1] 7 September 1908 – 20 Desember ...

 

 

Sihanoukville Administration Pays Cambodge Province Province de Sihanoukville Gouverneur Eav Chanvatanak Démographie Population 156 691 hab. (2021[1]) Densité 1 959 hab./km2 Géographie Coordonnées 10° 38′ nord, 103° 30′ est Altitude 15 m Superficie 8 000 ha = 80 km2 Localisation L'aire urbaine de la ville (jaune) est comprise dans le district de Mittakpheap (rouge), l'une des trois subdivisions de la province de ...

 

 

Pour les articles homonymes, voir LGG. Aéroport de LiègeLiege Airport (en)Aéroport de Liège-Bierset (fr) Terminal passager de l'aéroport de Bierset. Localisation Pays Belgique Région  Région wallonne Province  Province de Liège Ville Grâce-Hollogne Date d'ouverture 13 février 1920 (base aèrienne)1957 (terminal Sabena) Coordonnées 50° 38′ 11″ nord, 5° 26′ 34″ est Superficie 400 ha Altitude 201 m (659 ft) Informations aér...

Layout plan of the Gem-pa-Aten, constructed by Amenhotep IV The Temple of Amenhotep IV was an ancient monument at Karnak in Luxor, Egypt. The structures were used during the New Kingdom, in the first four years of the 18th Dynasty reign of the Egyptian Pharaoh Akhenaten, when he still used the name Amenhotep IV. The edifices may have been constructed at the end of the reign of his father, Amenhotep III, and completed by Akhenaten.[1] Location and layout The temple was constructed outs...

 

 

This article is about the now-closed tramways of Tianjin. For the current network, see TEDA Modern Guided Rail Tram. Tianjin, a major port and industrial center in China. Tianjin once had a standard steel-wheeled tramway network. But the original tram service was completely stopped in 1973. In 2006, tram service returned to Tianjin in the form of the TEDA Modern Guided Rail Tram. History The first tram of Tianjin in 1906 Postcard of Tianjin trams Tianjin was the first city to have its own cit...

 

 

American laborer and assassin (1873–1901) Leon CzolgoszCzolgosz in 1899BornLeon F. CzolgoszMay 5, 1873[1]Detroit, Michigan, U.S.DiedOctober 29, 1901(1901-10-29) (aged 28)Auburn Prison, Auburn, New York, U.S.Cause of death Execution by electric chairOccupationLaborerKnown forAssassination of William McKinleyConviction(s)First degree murderCriminal penaltyDeath Signature Leon F. Czolgosz (/ˈtʃɒlɡɒʃ/ CHOL-gosh,[2] Polish: [ˈlɛɔn ˈt͡ʂɔwɡɔʂ]...

Military term Part of a series onWar(outline) History Prehistoric Ancient Post-classical castles Early modern pike and shot napoleonic Late modern industrial fourth-gen Military Organization Command and control Defense ministry Army Navy Air force Marines Coast guard Space force Reserves Regular / Irregular Ranks Standing army / Militia Specialties: Rifleman Staff Engineers Intel Recon Medical Police Diving Comms Pilot Commissar Land units: Infantry Armor Cavalry Artillery Special forces Sign...

 

 

John Wesley TurnerBorn(1833-07-19)July 19, 1833Saratoga, New York, U.S.DiedApril 8, 1899(1899-04-08) (aged 65)St. Louis, Missouri, U.S.Place of burialCalvary Cemetery and Mausoleum, St. Louis, MissouriAllegianceUnited States of AmericaUnionService/branchUnited States ArmyUnion ArmyYears of service1851–1871Rank Brigadier General Brevet Major General, U.S.V.CommandsXXIV CorpsBattles/wars Third Seminole War American Civil War Battle of Fort Pulaski Second Battle of Fort Wagner S...