HITS algorithm

Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that they held, but were used as compilations of a broad catalog of information that led users direct to other authoritative pages. In other words, a good hub represents a page that pointed to many other pages, while a good authority represents a page that is linked by many different hubs.[1]

The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages.

History

In journals

Many methods have been used to rank the importance of scientific journals. One such method is Garfield's impact factor. Journals such as Science and Nature are filled with numerous citations, making these magazines have very high impact factors. Thus, when comparing two more obscure journals which have received roughly the same number of citations but one of these journals has received many citations from Science and Nature, this journal needs be ranked higher. In other words, it is better to receive citations from an important journal than from an unimportant one.[2]

On the Web

This phenomenon also occurs in the Internet. Counting the number of links to a page can give us a general estimate of its prominence on the Web, but a page with very few incoming links may also be prominent, if two of these links come from the home pages of sites like Yahoo!, Google, or MSN. Because these sites are of very high importance but are also search engines, a page can be ranked much higher than its actual relevance.

Algorithm

Expanding the root set into a base set

Steps

In the HITS algorithm, the first step is to retrieve the most relevant pages to the search query. This set is called the root set and can be obtained by taking the top pages returned by a text-based search algorithm. A base set is generated by augmenting the root set with all the web pages that are linked from it and some of the pages that link to it. The web pages in the base set and all hyperlinks among those pages form a focused subgraph. The HITS computation is performed only on this focused subgraph. According to Kleinberg the reason for constructing a base set is to ensure that most (or many) of the strongest authorities are included.

Authority and hub values are defined in terms of one another in a mutual recursion. An authority value is computed as the sum of the scaled hub values that point to that page. A hub value is the sum of the scaled authority values of the pages it points to. Some implementations also consider the relevance of the linked pages.

The algorithm performs a series of iterations, each consisting of two basic steps:

  • Authority update: Update each node's authority score to be equal to the sum of the hub scores of each node that points to it. That is, a node is given a high authority score by being linked from pages that are recognized as Hubs for information.
  • Hub update: Update each node's hub score to be equal to the sum of the authority scores of each node that it points to. That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject.

The Hub score and Authority score for a node is calculated with the following algorithm:

  • Start with each node having a hub score and authority score of 1.
  • Run the authority update rule
  • Run the hub update rule
  • Normalize the values by dividing each Hub score by square root of the sum of the squares of all Hub scores, and dividing each Authority score by square root of the sum of the squares of all Authority scores.
  • Repeat from the second step as necessary.

Comparison to PageRank

HITS, like Page and Brin's PageRank, is an iterative algorithm based on the linkage of the documents on the web. However it does have some major differences:

  • It is processed on a small subset of ‘relevant’ documents (a 'focused subgraph' or base set), instead of the set of all documents as was the case with PageRank.
  • It is query-dependent: the same page can receive a different hub/authority score given a different base set, which appears for a different query;
  • It must, as a corollary, be executed at query time, not at indexing time, with the associated drop in performance that accompanies query-time processing.
  • It computes two scores per document (hub and authority) as opposed to a single score;
  • It is not commonly used by search engines (though a similar algorithm was said to be used by Teoma, which was acquired by Ask Jeeves/Ask.com).

In detail

To begin the ranking, we let and for each page . We consider two types of updates: Authority Update Rule and Hub Update Rule. In order to calculate the hub/authority scores of each node, repeated iterations of the Authority Update Rule and the Hub Update Rule are applied. A k-step application of the Hub-Authority algorithm entails applying for k times first the Authority Update Rule and then the Hub Update Rule.

Authority update rule

For each , we update to where is all pages which link to page . That is, a page's authority score is the sum of all the hub scores of pages that point to it.

Hub update rule

For each , we update to where is all pages which page links to. That is, a page's hub score is the sum of all the authority scores of pages it points to.

Normalization

The final hub-authority scores of nodes are determined after infinite repetitions of the algorithm. As directly and iteratively applying the Hub Update Rule and Authority Update Rule leads to diverging values, it is necessary to normalize the matrix after every iteration. Thus the values obtained from this process will eventually converge.

Pseudocode

G := set of pages
for each page p in G do
    p.auth = 1 // p.auth is the authority score of the page p
    p.hub = 1 // p.hub is the hub score of the page p
for step from 1 to k do // run the algorithm for k steps
    norm = 0
    for each page p in G do  // update all authority values first
        p.auth = 0
        for each page q in p.incomingNeighbors do // p.incomingNeighbors is the set of pages that link to p
            p.auth += q.hub
        norm += square(p.auth) // calculate the sum of the squared auth values to normalise
    norm = sqrt(norm)
    for each page p in G do  // update the auth scores 
        p.auth = p.auth / norm  // normalise the auth values
    norm = 0
    for each page p in G do  // then update all hub values
        p.hub = 0
        for each page r in p.outgoingNeighbors do // p.outgoingNeighbors is the set of pages that p links to
            p.hub += r.auth
        norm += square(p.hub) // calculate the sum of the squared hub values to normalise
    norm = sqrt(norm)
    for each page p in G do  // then update all hub values
        p.hub = p.hub / norm   // normalise the hub values

The hub and authority values converge in the pseudocode above.

The code below does not converge, because it is necessary to limit the number of steps that the algorithm runs for. One way to get around this, however, would be to normalize the hub and authority values after each "step" by dividing each authority value by the square root of the sum of the squares of all authority values, and dividing each hub value by the square root of the sum of the squares of all hub values. This is what the pseudocode above does.

Non-converging pseudocode

G := set of pages
for each page p in G do
    p.auth = 1 // p.auth is the authority score of the page p
    p.hub = 1 // p.hub is the hub score of the page p

function HubsAndAuthorities(G)
    for step from 1 to k do // run the algorithm for k steps
        for each page p in G do  // update all authority values first
            p.auth = 0
            for each page q in p.incomingNeighbors do // p.incomingNeighbors is the set of pages that link to p
                p.auth += q.hub
        for each page p in G do  // then update all hub values
            p.hub = 0
            for each page r in p.outgoingNeighbors do // p.outgoingNeighbors is the set of pages that p links to
                p.hub += r.auth

See also

References

  1. ^ Christopher D. Manning; Prabhakar Raghavan; Hinrich Schütze (2008). "Introduction to Information Retrieval". Cambridge University Press. Retrieved 2008-11-09.
  2. ^ Kleinberg, Jon (December 1999). "Hubs, Authorities, and Communities". Cornell University. Retrieved 2008-11-09.

Read other articles:

1968 filmRoma come ChicagoFrench film posterDirected byAlberto De MartinoScreenplay by Giacinto Ciaccio Massimo D'Avack Carlo Romano Dino Maiuri Massimo De Rita Alberto De Martino Fabio Carpi Lianella Carell Piero Tellini[1] Story by Giacinto Ciaccio Massimo D'Avack Carlo Romano[1] Produced byDino de Laurentiis[1]Starring John Cassavetes Gabriele Ferzetti Anita Sanders Nikos Kourkoulos CinematographyAldo Tonti[1]Edited byOtello Colangeli[1]Music by Enni...

 

Official or legally recognized title for a person or entity This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article appears to contradict the article Royal and noble styles. Please discuss at the talk page and do not remove this message until the contradictions are resolved. (June 2013) This article needs additional citations for verification. Please help improve this article by addi...

 

الدين في الأردن دين الدولة الإسلام الأديان في الأردن الإسلام المسيحية مؤسسات دينية دائرة قاضي القضاة دائرة الإفتاء بطريركية القدس وشرق الأردن منظمات دينية رابطة علماء الشام الإخوان المسلمون ديانات قديمة وثنية عربية - وثنية رومانية ديانات غير معترف بها بهائية - ديانات اخ�...

DC Universe supervillain who is primarily an enemy of Green Lantern Comics character Hector HammondHector HammondInterior artwork from Who's Who: The Definitive Directory of the DC Universe 10 (December 1985 DC Comics) Art by Gil KanePublication informationPublisherDC ComicsFirst appearanceGreen Lantern (vol. 2) #5 (March–April 1961)Created byJohn BroomeGil KaneIn-story informationAlter egoHector HammondSpeciesMetahumanTeam affiliationsThe SocietyRoyal Flush GangOrange Lantern Cor...

 

Veikkausliiga 2020 Competizione Veikkausliiga Sport Calcio Edizione 111ª Organizzatore SPL/FBF Date dal 1º luglio 2020al 4 novembre 2020 Luogo  Finlandia Partecipanti 12 Formula due fasi Sito web Veikkausliiga Risultati Vincitore HJK(30º titolo) Retrocessioni TPSRoPS Statistiche Miglior marcatore Roope Riski (16) Incontri disputati 132 Gol segnati 372 (2,82 per incontro) Cronologia della competizione 2019 2021 Manuale HIFKHJK Inter TurkuTPS Haka SJK KuPS Lahti IFK M...

 

† Человек прямоходящий Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:ЧетвероногиеКлада:АмниотыКлада:Синапсиды�...

هذه المقالة تحتاج للمزيد من الوصلات للمقالات الأخرى للمساعدة في ترابط مقالات الموسوعة. فضلًا ساعد في تحسين هذه المقالة بإضافة وصلات إلى المقالات المتعلقة بها الموجودة في النص الحالي. (يونيو 2016) صمويل فوكس معلومات شخصية الميلاد 17 يونيو 1815(1815-06-17) تاريخ الوفاة 25 فبراير 1887 (71 س�...

 

Jan SixPortrait de Jan Six par Rembrandt (1654).FonctionBourgmestre d'Amsterdam1691BiographieNaissance 14 janvier 1618AmsterdamDécès 28 mai 1700 (à 82 ans)AmsterdamDomicile Herengracht 619, Amsterdam (d)Formation Université de LeydeActivités Écrivain, homme politique, collectionneur d'œuvres d'artFamille Famille Six (en)Père Jean Six (d)Mère Anna Wijmer (d)Conjoint Margaretha Tulp (d)Enfant Jan Six (d)Parentèle Nicolaes TulpAutres informationsPropriétaire de Elsbroek (d)Membre...

 

It has been suggested that Owensmouth be merged into this article. (Discuss) Proposed since April 2024. Neighborhood of Los AngelesCanoga Park, Los AngelesNeighborhood of Los AngelesThe Platt Building, also known as The Victorian — a c. 1970s office building at 19725 Sherman Way, Canoga Park−WinnetkaCanoga Park, Los AngelesLocation within Los Angeles/San Fernando ValleyShow map of San Fernando ValleyCanoga Park, Los AngelesCanoga Park, Los Angeles (the Los Angeles metropolitan area)...

1975 San Francisco mayoral election ← 1971 November 4, 1975December 9, 1975 1979 →   Candidate George Moscone John J. Barbagelata Dianne Feinstein Party Democratic Republican Democratic First-round vote 66,195 40,540 39,344 First-round percentage 31.52% 19.31% 18.74% Second-round vote 101,528 97,213 Second-round percentage 51.09% 48.91%   Candidate John A. Ertola Milton Marks Party Nonpartisan Republican First-round vote 30,360 27,910 First-round percentage 14...

 

Benedict CumberbatchCBECumberbatch di San Diego Comic-Con 2019LahirBenedict Timothy Carlton Cumberbatch19 Juli 1976 (umur 47)Hammersmith, London, InggrisKebangsaanBritaniaAlmamater Universitas Manchester Akademi Musik dan Seni Drama London PekerjaanAktorTahun aktif1998–sekarangKaryaDaftar lengkapSuami/istriSophie Hunter ​(m. invalid year)​Anak3Orang tuaTimothy Carlton (bapak)Wanda Ventham (ibu)PenghargaanDaftar lengkap Cumberbatch's voice dari program...

 

See also: Neighbourhoods in North Bay, Ontario This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: North Bay, Ontario – news · newspapers · books · scholar · JSTOR (July 2021) (Learn how and when to remove this message) City in Ontario, CanadaNorth BayCity (single-tier)City of North BayMain Street FlagNicknames:...

2000 studio album by Meat PuppetsGolden LiesStudio album by Meat PuppetsReleasedSeptember 26, 2000Genre Hard rock[1] Length58:36LabelBreaking Records/Atlantic Records[2]ProducerCurt Kirkwood,[3] John Plymale[citation needed]Meat Puppets chronology You Love Me (EP)(1999) Golden Lies(2000) Live(2002) Professional ratingsAggregate scoresSourceRatingMetacritic60/100[4]Review scoresSourceRatingAllMusic[5]The Encyclopedia of Popular Music[...

 

Biografi ini memerlukan lebih banyak catatan kaki untuk pemastian. Bantulah untuk menambahkan referensi atau sumber tepercaya. Materi kontroversial atau trivial yang sumbernya tidak memadai atau tidak bisa dipercaya harus segera dihapus, khususnya jika berpotensi memfitnah.Cari sumber: Asy-Syaukani – berita · surat kabar · buku · cendekiawan · JSTOR (Pelajari cara dan kapan saatnya untuk menghapus pesan templat ini) MuhammadImam SyaukaniSampul depan bu...

 

Italian footballer Marco Tardelli Tardelli in 2016Personal informationFull name Marco TardelliDate of birth (1954-09-24) 24 September 1954 (age 69)Place of birth Capanne di Careggine, ItalyHeight 1.78 m (5 ft 10 in)Position(s) Midfielder, defenderSenior career*Years Team Apps (Gls)1972–1974 Pisa 41 (4)1974–1975 Como 36 (2)1975–1985 Juventus 259 (35)1985–1987 Internazionale 43 (2)1987–1988 St. Gallen 14 (0)Total 393 (43)International career1976–1986 Italy 81 (6)...

У этого топонима есть и другие значения, см. Марсас. КоммунаМарсасMarsas Герб 43°03′10″ с. ш. 0°13′37″ в. д.HGЯO Страна  Франция Регион Юг — Пиренеи Департамент Верхние Пиренеи Кантон Баньер-де-Бигор Мэр Филипп Вьо(2014—2020) История и география Площадь 2,77 км² Высота цент...

 

تفتقر سيرة هذه الشخصية الحيّة إلى الاستشهاد بمصدر موثوق به يمكن التحقق منه. فضلاً، ساهم في تطويرها من خلال إضافة مصادر موثوق بها. في سير الأحياء، يُزال المحتوى فوراً إذا كان من غير مصدر يدعمه أو إذا كان المصدر المُستشهد به مشكوكاً بأمره. (مارس 2016) مايكل سبنس (بالإنجليزية: Michae...

 

Town in Hesse, GermanyErlensee Town Coat of armsLocation of Erlensee within Main-Kinzig-Kreis district Erlensee Show map of GermanyErlensee Show map of HesseCoordinates: 50°8′N 08°56′E / 50.133°N 8.933°E / 50.133; 8.933CountryGermanyStateHesseAdmin. regionDarmstadt DistrictMain-Kinzig-Kreis Subdivisions2 districtsGovernment • Mayor (2019–25) Stefan Erb[1] (SPD)Area • Total18.59 km2 (7.18 sq mi)Elevation110...

ЯрмаркаРор-бай-Хартбергнем. Rohr bei Hartberg 47°14′15″ с. ш. 16°03′09″ в. д.HGЯO Страна  Австрия Федеральная земля Штирия Округ Хартберг-Фюрстенфельд Бургомистр Юрген Пайндль(АНП) История и география Площадь 16,79 км²27,7 км² (1 января 2018)[1] Высота центра 356 м Ча�...

 

Disambiguazione – Se stai cercando il diritto di voto in Italia, vedi Diritto di voto (Italia). Disambiguazione – Se stai cercando il racconto di Isaac Asimov, vedi Diritto di voto (racconto). Questa pagina sull'argomento diritto sembra trattare argomenti unificabili alla pagina Suffragio. Puoi contribuire unendo i contenuti in una pagina unica. Segui i suggerimenti del progetto di riferimento. Questa voce sull'argomento diritto è solo un abbozzo. Contribuisci a migliorarla se...