VizWiz

Danna Gurari

Welcome to the VizWiz podcast, where we explore two questions: "what are visual challenges blind people encounter in their daily lives?" and "what are technological advancements helping blind people to live more independently?" The hosts are a diverse team of researchers from academia and industry with a shared passion around developing more inclusive artificial intelligence (AI) solutions. We interview the world’s leading AI researchers, technology advocates from the blind community, and industry specialists to raise awareness about current access technologies and help inspire their future.

Episodes

  1. 01/08/2022

    EP02: Perspective of Industry Developers on The Challenges and Opportunities for Advancing Visual Interpretation Technologies

    Brief summary of the episode Saqib Shaikh, Will Butler, and Karthik Kannan share about their experiences developing visual interpretation products and services for blind and low vision users. Questions asked in the episode [03:10] Could you share how you are trying to improve products and services for people with vision impairments within your company and how you got into your current line of work? [11:18] What are the challenges that are encountered in trying to improve upon the state of accessibility in practice?  Please comment both on challenges your team has overcome as well as unsolved problems currently encountered. [22:29] There are a spectrum of solutions for visual assistance that could range from sole reliance on remote sighted humans to sole reliance on artificial intelligence.  Could you talk about which solution you use or prefer?  Could you also discuss what aspects you think should be done by humans vs artificial intelligence? [35:10] Do you engage your target audience in developing or improving upon the products or services and, if so, how? [42:45] How can students and others get involved?  For example, what skills does your team look for when recruiting and how might a computer vision or accessibility person make themselves appealing to an accessibility team? Guest bios Saqib Shaikh is an Engineer Manager at Microsoft, where he founded Seeing AI - an app which enables someone who is visually impaired to hold up their phone, and hear more about the text, people, and objects in their surroundings. Will Butler is the Chief Experience Officer at Be My Eyes, a free app with one of the largest online communities, that supports visually impaired individuals to get free, on-demand, live video support from around 4 million volunteers and companies.  Will also has hosted two podcasts on the topic of vision loss and accessibility. Karthik Kannan is the co-founder and chief technology officer of Envision, a company that provides technology to help people with visual impairments in their daily lives. His company builds an app and smart glasses which helps people with visual impairments learn about their surroundings including to read text and recognize faces. Samreen Anjum is a PhD student at University of Colorado Boulder.  Her research focuses on computer vision and its applications in the fields of biomedical sciences and assistive technologies as well as designing systems that enable collaborations between humans and machines. Links to resources mentioned https://vizwiz.org/workshops/2022-workshop/

    51 min
  2. 01/08/2022

    EP01: Computer Vision: What People with Vision Impairments Experience and Want

    Brief summary of the episode Stephanie Enyart, Robin Christopherson, and Daniel Kish share about their experiences advocating for better visual interpretation technology for blind and low vision users as well as their experiences—as people with visual impairments—using such technologies.  Questions asked in the episode [03:50] How did you get into your current line of work? [13:50] What technologies do you use to overcome accessibility barriers you encounter in your daily life and what features do you find most useful versus least useful? [20:15] Could you share a bit about your journey in developing photography skills needed to solicit visual assistance using technology, as well as any ongoing challenges you encounter with photography? [26:00] For a few years now, computer vision researchers have been developing models for tackling two tasks of describing images for people with vision impairments.  One task is to create captions for images and the second task is to answer questions about images.  Could you talk about the usefulness of these different types of approaches in practice?  In doing so, can you also comment on if one approach is preferable over another? [32:05] What are your thoughts about researchers or industry service providers collecting and using data to provide visual assistance in order to improve upon existing algorithms,  and develop new algorithms to provide visual assistance? [36:10] If data collection is okay, what type of data record lineage should be provided?  By record lineage I mean the record of who has handled data, what decisions are made with the data, and how. Please share about your individual perspective as well as any perspectives that you have heard from the community [39:15] What level of interaction do you want from a visual assistant that interprets visual information? Also, do you prefer that assistance to come from humans or computers? [39:15] What level of interaction do you want from a visual assistant that interprets visual information? Also, do you prefer that assistance to come from humans or computers? [43:35] From your perspective, what are the most important open accessibility obstacles today that should be solved first?  Please share about your individual perspective as well as what you hear from their community. Guest bios Stephanie Enyart is the Chief Public Policy and Research Officer at the American Foundation for the Blind. Stephanie serves as a strategic leader in developing policy that benefits people who are blind in education, employment, aging, and the intersectional issues of technology and transportation.  Robin Christopherson is a co-founder and Head of Digital Inclusion at AbilityNet. His work has led to accessibility improvements in many organizations spanning industry, government, and universities.   Robin also has served as an expert technical witness around assistive technology in software, systems and websites.  Daniel Kish is the President of World Access for the Blind.  He is a world leader in perceptual navigation and ecolocation, through which he has developed his own method of generating vocal clicks and using echoes to identify his surroundings and navigate.    Abigale Stangl is a CRA/NSF Computing Innovation Fellow at the University of Washington. Her research lies at the intersection of human-computer interaction, non-visual accessibility, and data privacy and ownership. Links to resources mentioned https://vizwiz.org/workshops/2022-workshop/

    52 min
  3. 01/08/2022

    EP04: A Product Developer, Blind Technology Advocate, and Computer Vision Researcher Discuss the Future for Visual Interpretation Technologies (Stephanie, James, Karthik)

    Brief summary of the episode Stephanie Enyart, James Coughlan, and Karthik Kannan share very diverse perspectives around the development of visual interpretation technologies to meet the interests and needs of people with vision impairments. Questions asked in the episode [02:24] Could you share about what has surprised you the most about progress that has taken place over the past 10-20 years around technologies that provide visual assistance to real-world users? [08:25] What do you see as the current limiting factor or barriers in developing better visual interpretation technologies? [15:00] Could you describe how you envision technology will work in 10 years for interpreting visual information for real-world users?  For example, what skills will the technology have?  Also, how will the technology deliver information, such as via a live video feed or augmented reality or something else? [23:18] Could you discuss how you think we should decide what information to include in a visual description? [32:28] I next want to dig into one of the issues that is critical for designing vision assistance technology, which is access to large datasets from people with vision impairments to support evaluation and training of computer vision models. What are your expectations about how such datasets can be built responsibly and any experience you have in building such datasets? [41:55] Could you please share about to what extent each of you already have conversations with or collaborate with researchers, industry developers, and blind technology advocates to advance products and services that can advance visual assistance products and services?  What do you find works well versus does not work well in these collaborations or conversations? Guest bios Stephanie Enyart is the Chief Public Policy and Research Officer at the American Foundation for the Blind. Stephanie serves as a strategic leader in developing policy that benefits people who are blind in education, employment, aging, and the intersectional issues of technology and transportation. James Coughlan is a Senior Scientist at the Smith-Kettlewell Eye Research Institute, with a PhD in Physics from Harvard University. James has been at Smith-Kettlewell since 1998 and over this time has developed a wide array of impactful technologies for the blind and low-vision community. Karthik Kannan is the co-founder and chief technology officer of Envision, a company that provides technology to help people with visual impairments in their daily lives. His company builds an app and smart glasses which helps people with visual impairments learn about their surroundings including to read text and recognize faces. Danna Gurari is an Assistant Professor at University of Colorado Boulder where she also leads the Image and Video Computing research group. Links to resources mentioned https://vizwiz.org/workshops/2022-workshop/

    54 min
  4. 01/08/2022

    EP05: A Product Developer, Blind Technology Advocate, and Computer Vision Researcher Discuss the Future for Visual Interpretation Technologies (Daniel, Andrew, Karthik)

    Brief summary of the episode Daniel Kish, Andrew Howard, and Will Butler share very diverse perspectives around the development of visual interpretation technologies to meet the interests and needs of people with vision impairments. Questions asked in the episode [03:04] Could you share about what has surprised you the most about progress that has taken place over the past 10-20 years around technologies that provide visual assistance to real-world users? [08:10] What do you see as the current limiting factor or barriers in developing better visual interpretation technologies? [12:21] Could you describe how you envision technology will work in 10 years for interpreting visual information for real-world users?  For example, what skills will the technology have?  Also, how will the technology deliver information, such as via a live video feed or augmented reality or something else? [18:34] Could you discuss how you think we should decide what information to include in a visual description? [26:44] I next want to dig into one of the issues that is critical for designing vision assistance technology, which is access to large datasets from people with vision impairments to support evaluation and training of computer vision models. What are your expectations about how such datasets can be built responsibly and any experience you have in building such datasets? [35:18] Could you please share about to what extent each of you already have conversations with or collaborate with researchers, industry developers, and blind technology advocates to advance products and services that can advance visual assistance products and services?  What do you find works well versus does not work well in these collaborations or conversations? Guest bios Daniel Kish is the President of World Access for the Blind.  He is a world leader in perceptual navigation and ecolocation, through which he has developed his own method of generating vocal clicks and using echoes to identify his surroundings and navigate. Andrew Howard is a Senior Staff Software Engineer at Google Research, with a PhD in Computer Science from Columbia University. Andrew is most well-known for his work in mobile-friendly deep learning models. Starting with MobileNets, then MobileNetsV2, then MobileNetsV3, and also MnasNets, his work has been broadly adopted in deep learning packages like PyTorch and Tensorflow as well as across a host of mobile phone platforms and apps. Will Butler is the Chief Experience Officer at Be My Eyes, a free app with one of the largest online communities, that supports visually impaired individuals to get free, on-demand, live video support from around 4 million volunteers and companies.  Will also has hosted two podcasts on the topic of vision loss and accessibility. Ed Cutrell is a Senior Principal Research Manager at Microsoft Research (MSR), where he leads the MSR Ability Team, a group of researchers focused on innovating new technologies for people with a range of disabilities. Links to resources mentioned https://vizwiz.org/workshops/2022-workshop/

    43 min
  5. 01/08/2022

    EP03: Perspectives of Computer Vision Researchers on the Challenges and Opportunities for Advancing Visual Interpretation Technologies

    Brief summary of the episode Marcus Rohrbach, Andrew Howard, and James Coughlan share about their experiences developing state-of-art research in visual interpretation algorithms and systems. Questions asked in the episode [04:30] For the sake of the audience, could you share how you got into your current line of work as a computer vision researcher and share about what problems you work on? [17:46] What are the kinds of methods you are using to tackle the problems you are working on and what are the kinds of errors and limitations you encounter with these methods? [38:03] What do you see as the key turning points in the computer vision community over the years or decades that have shifted which problems our community is able to address and what do these changes mean for end users? [48:11] What’s the next decade look like?  What do you think is going to be the next big thing? Guest bios Marcus Rohrbach is a Research Scientist at Meta AI Research, with a PhD from the Max Planck Institute for Informatics. Marcus is most well-known for his work at the intersection of computer vision and natural language processing. Over his career, he has driven key progress in visual question answering, language grounding, and generating descriptions about image and videos, in particular movies - all of which is highly relevant for the blind/low-vision community. Andrew Howard is a Senior Staff Software Engineer at Google Research, with a PhD in Computer Science from Columbia University. Andrew is most well-known for his work in mobile-friendly deep learning models. Starting with MobileNets, then MobileNetsV2, then MobileNetsV3, and also MnasNets, his work has been broadly adopted in deep learning packages like PyTorch and Tensorflow as well as across a host of mobile phone platforms and apps. James Coughlan is a Senior Scientist at the Smith-Kettlewell Eye Research Institute, with a PhD in Physics from Harvard University. James has been at Smith-Kettlewell since 1998 and over this time has developed a wide array of impactful technologies for the blind and low-vision community. Daniela Massiceti is a machine learning researcher at Microsoft Research.  Her research focuses on the intersection of ML and human-computer interaction. She is primarily interested in ML systems that learn and evolve with human input, so called “teachable” systems, giving users the power to completely customise their AI experiences – from personalised assistive tools for people who are blind/low-vision, to personalised avatars in the metaverse. Links to resources mentioned https://vizwiz.org/workshops/2022-workshop/

    52 min
  6. 01/08/2022

    EP06: A Product Developer, Blind Technology Advocate, and Computer Vision Researcher Discuss the Future for Visual Interpretation Technologies (Robin, Saqib, Marcus)

    Brief summary of the episode Robin Christopherson, Saqib Shaikh, and Marcus Rohrbach share very diverse perspectives around the development of visual interpretation technologies to meet the interests and needs of people with vision impairments. Questions asked in the episode [02:22] Could you share about what has surprised you the most about progress that has taken place over the past 10-20 years around technologies that provide visual assistance to real-world users? [08:10] What do you see as the current limiting factor or barriers in developing better visual interpretation technologies? [11:52] Could you describe how you envision technology will work in 10 years for interpreting visual information for real-world users?  For example, what skills will the technology have?  Also, how will the technology deliver information, such as via a live video feed or augmented reality or something else? [17:00] Could you discuss how you think we should decide what information to include in a visual description? [22:17] I next want to dig into one of the issues that is critical for designing vision assistance technology, which is access to large datasets from people with vision impairments to support evaluation and training of computer vision models. What are your expectations about how such datasets can be built responsibly and any experience you have in building such datasets? [30:48] Could you please share about to what extent each of you already have conversations with or collaborate with researchers, industry developers, and blind technology advocates to advance products and services that can advance visual assistance products and services?  What do you find works well versus does not work well in these collaborations or conversations? Guest bios Robin Christopherson is a co-founder and Head of Digital Inclusion at AbilityNet. His work has led to accessibility improvements in many organizations spanning industry, government, and universities.   Robin also has served as an expert technical witness around assistive technology in software, systems and websites. Saqib Shaikh is an Engineer Manager at Microsoft, where he founded Seeing AI - an app which enables someone who is visually impaired to hold up their phone, and hear more about the text, people, and objects in their surroundings. Marcus Rohrbach is a Research Scientist at Meta AI Research, with a PhD from the Max Planck Institute for Informatics. Marcus is most well-known for his work at the intersection of computer vision and natural language processing. Over his career, he has driven key progress in visual question answering, language grounding, and generating descriptions about image and videos, in particular movies - all of which is highly relevant for the blind/low-vision community. Danna Gurari is an Assistant Professor at University of Colorado Boulder where she also leads the Image and Video Computing research group. Links to resources mentioned https://vizwiz.org/workshops/2022-workshop/

    36 min

About

Welcome to the VizWiz podcast, where we explore two questions: "what are visual challenges blind people encounter in their daily lives?" and "what are technological advancements helping blind people to live more independently?" The hosts are a diverse team of researchers from academia and industry with a shared passion around developing more inclusive artificial intelligence (AI) solutions. We interview the world’s leading AI researchers, technology advocates from the blind community, and industry specialists to raise awareness about current access technologies and help inspire their future.