Hirschberg was elected a member of the National Academy of Engineering in 2017 for contributions to the use of prosody in text-to-speech and spoken dialogue systems, and to audio browsing and retrieval.
Julia Linn Bell Hirschberg received her first Ph.D degree in History (16th-century Mexico) from University of Michigan in 1976. She served on the History faculty of Smith College from 1974 to 1982. She subsequently shifted to Computer Science studies, receiving her M.S. in Computer and Information Science from University of Pennsylvania in 1982 and a Ph.D in Computer and Information Science from University of Pennsylvania in 1985.
Upon graduation from University of Pennsylvania in 1985, Hirschberg joined AT&T Bell Labs as a Member of Technical staff in the Linguistics Research Department, where she worked on improving prosody assignment for Text-to-Speech Synthesis (TTS) in the Bell Labs TTS system. She was promoted to Department Head in 1994 when she created a new Human Computer Interface Research Lab. She and her department remained at Bell Labs until 1996 when they moved to AT&T Labs Research as part of a corporate reorganization. In 2002, she joined the Columbia University faculty as a Professor in the Department of Computer Science. She served as Chair of the Computer Science Department from 2012 to 2018.
Research
Hirschberg's research has included prosody, discourse structure, spoken dialogue systems, speech search, and more recently analysis of deceptive speech.[2] Hirschberg was among the first to combine Natural Language Processing (NLP) approaches to discourse and dialogue with speech research. She pioneered techniques in text analysis for prosody assignment in Text-to-Speech synthesis at Bell laboratories in the 1980s and 1990s, developing corpus-based statistical models based upon syntactic and discourse information which are in general use today in TTS systems.[3][4] With Janet Pierrehumbert, she developed a theoretical model of intonational meaning.[5] She was a leader in the development of the ToBI conventions for intonational description, which have been extended to numerous languages and which today are the most widely used standard for intonational annotation.[6]
Hirschberg has been a pioneer together with Gregory Ward in much experimental work on intonational sources of language meaning and how these interact with pragmatic phenomena, particularly on the meaning of accent (intonational prominent) items and the meaning of intonational contours.[7][8] She also has innovated in numerous other areas involving prosody and meaning, including the role of grammatical function and surface position in pitch accent location,[9] the use of prosody in disambiguating cue phrases (discourse markers) with Diane Litman,[10] the role of prosody in disambiguation in English, Italian, and Spanish with Cinzia Avesani and Pilar Prieto,[11] and the automatic identification of speech recognition errors using prosodic information,[12] At AT&T Labs she worked with Fernando Pereira, Steve Whittaker, and others on speech search[13] and developing new interfaces for speech navigation.[14] At Columbia, she and her students have continued and extended research on spoken dialogue systems (automatically detecting speech recognition errors[15] and inappropriate system queries,[16] modeling turn-taking behavior,[17] dialogue entrainment,[18] modeling and generating clarification dialogues[19]); on the automatic classification of trust, charisma,[20] deception[21] and emotion[22] from speech; on speech summarization;[23] prosody translation, hedging behavior in text and speech,[24] text-to-speech synthesis, and speech search in low resource languages.[25] She also holds several patents in TTS and in speech search. Corpora she and collaborators have collected include the Boston Directions Corpus, the Columbia SRI Colorado Deception Corpus, and the Columbia Games Corpus.
She has served on numerous technical boards and editorial committees, and is now on the Computing Research Association's (CRA) Board of Directors and serves as co-chair of CRA-W.[26] She is also noted for her leadership in broadening participation in computing. She has served as a member of the CRA Committee on the Status of Women in Computing Research CRA-W since 2010.
^
Pierrehumbert & Hirschberg (1990). "The Meaning of Intonational Contours in the Interpretation of Discourse". Intentions and Plans in Communication and Discourse: 271–311. doi:10.7551/mitpress/3839.003.0016. ISBN978-0-262-27054-0.
^Beckman, M. E.; Hirschberg, J. & Shattuck-Hufnagel, S. (2004). "The original ToBI system and the evolution of the ToBI framework". Prosodic Typology: The Phonology of Intonation and Phrasing: 9–54.
^J. Terken; J. Hirschberg (1994). "Deaccentuation and Persistence of Grammatical Function and Surface Position". Language and Speech. 37 (2): 125–145. doi:10.1177/002383099403700202. S2CID145696152.
^J. Hirschberg; D. Litman (1993). "Empirical Studies on the Disambiguation of Cue Phrases". Computational Linguistics.
^J. Hirschberg; C. Avesani (2000). "Prosodic Disambiguation in English and Italian". Intonation. Text, Speech and Language Technology. Vol. 15. pp. 87–95. doi:10.1007/978-94-011-4317-2_4. ISBN978-0-7923-6723-9.
^Julia Hirschberg; Diane Litman; Marc Swerts (2004). "Prosodic and Other Cues to Speech Recognition Failures". Speech Communication. 43 (1–2): 155–175. doi:10.1016/j.specom.2004.01.006.
^J. Choi; D. Hindle; J. Hirschberg; F. Pereira; A. Singhal & S.Whittaker (1999). "Spoken Content-Based Audio Navigation (SCAN)". ICPhS-99.
^S. Whittaker; J. Hirschberg; J. Choi; D. Hindle; F. Pereira; A. Singhal (1999). "SCAN: Designing and evaluating user interfaces to support retrieval from speech archives". Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. pp. 26–33. doi:10.1145/312624.312639. ISBN978-1581130966. S2CID15089338.
^E. Pincus; S. Stoyanchev; J. Hirschberg (2013). "Exploring Features for Localized Detection of Speech Recognition Errors". SigDIAL.
^Alex Liu; Rose Sloan; Mei-Vern Then; Svetlana Stoyanchev; Julia Hirschberg; Elizabeth Shriberg (2014). "Detecting Inappropriate Clarification Requests in Spoken Dialogue Systems". SigDIAL.
^Agustín Gravano; Julia Hirschberg (2011). "Turn-taking cues in task-oriented dialogue". Computer Speech and Language. 25 (3): 601–634. doi:10.1016/j.csl.2010.10.003. hdl:11336/68351.
^Z. Xia; R. Levitan; J. Hirschberg (2014). "Prosodic Entrainment in Mandarin and English: A Cross-Linguistic Comparison". Speech Prosody. doi:10.21437/SpeechProsody.2014-1. S2CID15063969.
^S. Stoyanchev; A. Liu; J. Hirschberg (2013). "Modeling Human Clarification Strategies". SigDIAL.
^F. Biadsy; A. Rosenberg; R. Carlson; J. Hirschberg; E. Strangert (2008). "A Cross-cultural Comparison of American, Palestinian, and Swedish Perception of Charismatic Speech". Speech Prosody.
^J. Hirschberg; S. Benus; J. M. Brenier; F. Enos; S. Friedman; S. Gilman; C. Girand; M. Graciarena; A. Kathol; L. Michaelis; B. Pellom; E. Shriberg; A. Stolcke (2005). "Distinguishing deceptive from non-deceptive speech". Interspeech: 1833–1836. doi:10.21437/Interspeech.2005-580. S2CID6415344.
^S. Maskey; J. Hirschberg (2005). "Comparing Lexical, Acoustic/Prosodic, Discourse and Structural Features for Speech Summarization". Interspeech. doi:10.21437/Interspeech.2005-66.
^A. Prokofieva; J. Hirschberg (2014). "Hedging and Speaker Commitment". LREC.
^V. Soto; L. Mangu; A. Rosenberg; J. Hirschberg (2014). "A Comparison of Multiple Methods for Rescoring Keyword Search Lists for Low Resource Languages". Interspeech.