R25 VOICE Section 3 - Datasets

R25 VOICE Podcast

Papers discussed in this Section 3 podcast:

  • Liao, Fangzhou; Liang, Ming; Li, Zhe; Hu, Xiaolin; and Song, Sen. Evaluate the Malignancy of Pulmonary Nodules Using the 3D Deep Leaky Noisy-or Network. eprint arXiv:1711.08324, 2017
  • Pollard, T. J., & Johnson, A. E. W. The MIMIC-III Clinical Database. http://dx.doi.org/10.13026/C2XW26 (2016)
  • Pranav Rajpurkar, Jeremy Irvin, Aarti Bagul, Daisy Ding, Tony Duan, Hershel Mehta, Brandon Yang, Kaylie Zhu, Dillon Laird, Robyn L. Ball, Curtis Langlotz, Katie Shpanskaya, Matthew P. Lungren, and Andrew Ng. MURA Dataset: Towards Radiologist-Level Abnormality Detection in Musculoskeletal Radiographs. arXiv:1712.06957, 2017
  • X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. IEEE CVPR (spotlight);  arXiv:1705.02315, 2017

Podcast Contents:

  • Why Datasets are important?
  • Kinds of Datasets?
  • What's a gold standard?
  • Best practices in dataset descriptions.
    • Sample distribution
    • Meta-data
      • Patients
      • Radiologists
      • PACS Systems Used for Annotation
      • Images
  • Strategies for Labeling Data
    • Natural Language Processing
    • Amazon Mechanical Turk
    • Natural Language Processing Validation Sets 

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada