7 episodes

This CSAIL seminar series, organized in cooperation with the Siri team at Apple, invites leading researchers in HLT to give lectures that introduce the fundamentals of spoken language systems, assess the current state of the art, outline challenges, and speculate on how they can be met. Lectures occur 2-3 times per semester and should be accessible to undergraduates with some technical background.

Human Language Technology Lecture Series MIT

    • Technology

This CSAIL seminar series, organized in cooperation with the Siri team at Apple, invites leading researchers in HLT to give lectures that introduce the fundamentals of spoken language systems, assess the current state of the art, outline challenges, and speculate on how they can be met. Lectures occur 2-3 times per semester and should be accessible to undergraduates with some technical background.

    • video
    Towards Open-Domain Spoken Dialogue Systems

    Towards Open-Domain Spoken Dialogue Systems

    In contrast to traditional rule-based approaches to building spoken dialogue systems, recent research has shown that it is possible to implement all of the required functionality using statistical models trained using a combination of supervised learning and reinforcement learning. This new approach to spoken dialogue is based on the mathematics of partially observable Markov decision processes (POMDPs) in which user inputs are treated as observations of some underlying belief state, and system responses are determined by a policy which maps belief states into actions.

    Virtually all current spoken dialogue systems are designed to operate in a specific carefully defined domain such as restaurant information, appointment booking, product installation support, etc. However, if voice is to become a significant input modality for accessing web-based information and services, then techniques will be needed to enable spoken dialogue systems to operate within open domains.

    The first part of the talk will briefly review the basic ideas of POMDP dialogue systems as currently applied to closed-domains. Unlike many other areas of machine learning, spoken dialogue systems always have a user on-hand to provide supervision. Based on this idea, the second part of the talk describes a number of techniques by which implicit user supervision can allow a spoken dialogue system to adapt on-line to extended domains.

    Steve Young is Professor of Information Engineering and Senior Pro-Vice Chancellor at Cambridge University. His main research interests lie in the area of spoken language systems including speech recognition, speech synthesis and dialogue management. He is the inventor and original author of the HTK Toolkit for building hidden Markov model-based recognition systems, and he co-developed the HTK large vocabulary speech recognition system. More recently he has worked on statistical dialogue systems and pioneered the use of Partially Observable Markov Decision Processes for modelling them.

    He is a Fellow of the Royal Academy of Engineering, the International Speech Communication Association, the Institution of Engineering and Technology, and the Institute of Electrical and Electronics Engineers. In 2004, he was a recipient of an IEEE Signal Processing Society Technical Achievement Award; in 2010, he received the ISCA Medal for Scientific Achievement; and in 2013, he received the European Signal Processing Society Individual Technical Achievement Award.

    • 1 hr 19 min
    • video
    Spoken Term Detection - A Loss for Words

    Spoken Term Detection - A Loss for Words

    As speech recognition continues to improve, new applications of the technology have been enabled. It is now common to search for information and send accurate short messages by speaking into a cellphone - something completely impractical just a few years ago. Another application that has recently been gaining attention is "Spoken Term Detection" - using speech recognition technology to locate key words or phrases of interest in running speech of variable quality. Spoken Term Detection can be used to issue real time alerts, rapidly identify multimedia clips of interesting content, and, when combined with search technology, even provide real-time commentary during broadcasts and meetings. This talk will describe the basics of Spoken Term Detection systems, including recent advances in core speech recognition technology, performance metrics, how out-of-vocabulary queries are handled, and ways of using score normalization and system combination to dramatically improve system performance.

    Michael Picheny is the Senior Manager of the Speech and Language Algorithms Group at the IBM TJ Watson Research Center. Michael has worked in the Speech Recognition area since 1981, joining IBM after finishing his doctorate at MIT. He has been heavily involved in the development of almost all of IBM's recognition systems, ranging from the world's first real-time large vocabulary discrete system through IBM's product lines for telephony and embedded systems. He has published numerous papers in both journals and conferences on almost all aspects of speech recognition. He has received several awards from IBM for his work, including a corporate award, three outstanding Technical Achievement Awards and two Research Division Awards. He is the co-holder of over 30 patents and was named a Master Inventor by IBM in 1995 and again in 2000. Michael served as an Associate Editor of the IEEE Transactions on Acoustics, Speech, and Signal Processing from 1986-1989, was the chairman of the Speech Technical Committee of the IEEE Signal Processing Society from 2002-2004, and is a Fellow of the IEEE. He served as an Adjunct Professor in the Electrical Engineering Department of Columbia University in 2009 and co-taught a course in speech recognition. He recently completed an eight-year term of service on the board of ISCA (International Speech Communication Association). Most recently he was the co-general chair of the IEEE ASRU 2011 Workshop in Hawaii.

    • 1 hr 25 min
    • video
    Language as Influence(d): Power and Memorability

    Language as Influence(d): Power and Memorability

    What effect does language have on people, and what effect do people have on language?

    You might say in response, "Who are you to discuss these problems?" and you would be right to do so; these are Major Questions that science has been tackling for many years. But as a field, I think natural language processing and computational linguistics have much to contribute to the conversation, and I hope to encourage the community to further address these issues. To this end, I'll describe two efforts I've been involved in.

    The first project provides evidence that in group discussions, power differentials between participants are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. We consider multiple types of power: status differences (which are relatively static), and dependence (a more ''situational'' relationship). Using a precise probabilistic formulation of the notion of linguistic coordination, we study how conversational behavior can reveal power relationships in two very different settings: discussions among Wikipedians and arguments before the U.S. Supreme Court.

    Our second project is motivated by the question of what information achieves widespread public awareness. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. We introduce an experimental paradigm that seeks to separate contextual from language effects, using movie quotes as our test case. We find that there are significant differences between memorable and non-memorable quotes in several key dimensions, even after controlling for situational and contextual factors. One example is lexical distinctiveness: in aggregate, memorable quotes use less common word choices (as measured by statistical language models), but at the same time are built upon a scaffolding of common syntactic patterns.

    Joint work with Justin Cheng, Cristian Danescu-Niculescu-Mizil, Jon Kleinberg, and Bo Pang.

    Lillian Lee is a professor of computer science at Cornell University. Her research interests include natural language processing, information retrieval, and machine learning. She is the recipient of the inaugural Best Paper Award at HLT-NAACL 2004 (joint with Regina Barzilay), a citation in "Top Picks: Technology Research Advances of 2004" by Technology Research News (also joint with Regina Barzilay), and an Alfred P. Sloan Research Fellowship; and in 2013, she was named a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI). Her group's work has received several mentions in the popular press, including The New York Times, NPR's All Things Considered, and NBC's The Today Show.

    • 1 hr 5 min
    • video
    Extracting Social Meaning from Language: The Computational Linguistics of Food and the Spread of Innovation

    Extracting Social Meaning from Language: The Computational Linguistics of Food and the Spread of Innovation

    Automatically extracting social meaning from language is one of the most exciting challenges in natural language understanding. In this talk I’ll summarize a number of recent results using the tools of natural language processing to help extract and understand social meaning from texts of different sorts. We’ll explore the relationship between language, economics and social psychology in the automatic processing of the language of restaurant menus and reviews. And I’ll show how natural language processing can help model different aspects of the spread of innovation through communities: how interdisciplinarity plays a crucial role in the spread of scientific innovation, and how the spread of linguistic innovation is intricately tied up with people's lifecycle in online communities.

    Dan Jurafsky is Professor and Chair of Linguistics, and Professor of Computer Science, at Stanford University. He is the co-author of the widely-used textbook "Speech and Language Processing”, co-created one of the first massively open online courses, Stanford’s course in Natural Language Processing, and is the recipient of a 2002 MacArthur Fellowship. His trade book “The Language of Food” comes out September 2014. His research focuses on computational linguistics, and its application to the social and behavioral sciences.

    • 1 hr
    • video
    Appropriate and Inappropriate Clarification Questions in Spoken Dialogue Systems

    Appropriate and Inappropriate Clarification Questions in Spoken Dialogue Systems

    Clarification in Spoken Dialogue Systems such as in mobile applications often consists of simple requests to “Please repeat” or “Please rephrase” when the system fails to understand a word or phrase. However, human-human dialogues rarely include such questions. When humans ask for clarification of user input such as “I want to travel on XXX”, they typically use targeted clarification questions, such as “When do you want to travel?” However, systems frequently make mistakes when they try to behave more like humans, sometimes asking inappropriate clarification questions. We present research on more human-like clarification behavior based on a series of crowd-sourcing experiments whose results are implemented in a speech-to-speech translation system. We also describe strategies for detecting when our system has asked the ‘wrong’ question of a user, based upon features of the user’s response.

    Julia Hirschberg is Percy K. and Vida L. W. Hudson Professor of Computer Science and Chair of the Computer Science Department at Columbia University. She worked at Bell Laboratories and AT&T Laboratories -- Research from 1985-2003 as a Member of Technical Staff and a Department Head, creating the Human-Computer Interface Research Department in 1994. She served as editor-in-chief of Computational Linguistics from 1993-2003 and co-editor-in-chief of Speech Communication from 2003-2006. She served on the Executive Board of the Association for Computational Linguistics (ACL) from 1993-2003, on the Permanent Council of International Conference on Spoken Language Processing (ICSLP) since 1996, and on the board of the International Speech Communication Association (ISCA) from 1999-2007 (as President 2005-2007); she has served on the CRA Executive Board (2013-14). She now serves on the IEEE Speech and Language Processing Technical Committee, the Association for the Advancement of Artificial Intelligence (AAAI) Council, the Executive Board of the North American ACL, and the board of the CRA-W. She has been an AAAI fellow since 1994, an ISCA Fellow since 2008, and a (founding) ACL Fellow since 2011, and was elected to the American Philosophical Society in 2014. She is a winner of the IEEE James L. Flanagan Speech and Audio Processing Award (2011) and the ISCA Medal for Scientific Achievement (2011).

    • 1 hr 10 min
    • video
    Human-like Singing and Talking Machines

    Human-like Singing and Talking Machines

    Human-like Singing and Talking Machines: Flexible Speech Synthesis in Karaoke, Anime, Smart Phones, Video Games, Digital Signage, TV and Radio Programs

    This talk will give an overview of statistical approach to flexible speech synthesis. For constructing human-like talking machines, speech synthesis systems are required to have an ability to generate speech with arbitrary speaker's voice, various speaking styles in different languages, varying emphasis and focus, and/or emotional expressions. The main advantage of the statistical approach is that such flexibility can easily be realized using mathematically well-defined algorithms. In this talk, the system architecture is outlined and then recent results and demos will be presented.

    Keiichi Tokuda is a Professor in the Department of Computer Science at Nagoya Institute of Technology and currently he is visiting Google on sabbatical. He is also an Honorary Professor at the University of Edinburgh. He was an Invited Researcher at the National Institute of Information and Communications Technology (NICT), formally known as the ATR Spoken Language Communication Research Laboratories, Kyoto, Japan from 2000 to 2013, and was a Visiting Researcher at Carnegie Mellon University from 2001 to 2002. He has been working on statistical parametric speech synthesis after he proposed an algorithm for speech parameter generation from HMM in 1995. He received six paper awards and two achievement awards. He is an IEEE Fellow and an ISCA Fellow.

    • 1 hr 1 min

Top Podcasts In Technology

Listeners Also Subscribed To

More by MIT