9 episodes

Data mining, pattern recognition, image processing, remote sensing.

Enjoy!

Thales Sehn Körting Thales

    • Science

Data mining, pattern recognition, image processing, remote sensing.

Enjoy!

    What is Data Science? (Part 1)

    What is Data Science? (Part 1)

    In this podcast I provide a detailed discussion of what is Data Science. In Part 2 I will continue...
    Follow my podcast: http://anchor.fm/tkorting
    Subscribe to my YouTube channel: http://youtube.com/tkorting
    The intro and the final sounds were recorded at my home, using an old clock that belonged to my grandmother.
    Thanks for listening

    • 25 sec
    Is Deep Learning FAIR?

    Is Deep Learning FAIR?

    Deep Learning articles use benchmarks to measure the quality of the results. However, several benchmarks do not have the copyright of all data used. So, how to believe that every paper uses the same benchmark?

    From https://www.go-fair.org/fair-principles/ we have the description of the FAIR acronym


    Findable: The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. 
    Accessible: Once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.
    Interoperable: The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
    Reusable: The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.

    From the article Implementing FAIR Data Principles: The Role of Libraries (https://libereurope.eu/wp-content/uploads/2017/12/LIBER-FAIR-Data.pdf) we include the following additional description on the Reusable term: Data and collections have a clear usage licenses and provide accurate information on provenance.

    Top-3 dataset for Deep Learning, based on a 25 list (https://www.analyticsvidhya.com/blog/2018/03/comprehensive-collection-deep-learning-datasets/)


    From http://cocodataset.org/#termsofuse: The COCO Consortium does not own the copyright of the images. 
    From http://image-net.org/download-faq: The images in their original resolutions may be subject to copyright, so we do not make them publicly available on our server.
    From https://storage.googleapis.com/openimages: While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.

    Follow my podcast: http://anchor.fm/tkorting

    Subscribe to my YouTube channel: http://youtube.com/tkorting

    The intro and the final sounds were recorded at my home, using an old clock that belonged to my grandmother.

    Thanks for listening

    • 8 min
    Do you trust in pretrained Deep Learning models?

    Do you trust in pretrained Deep Learning models?

    Several authors rely on transfer learning from pretrained models, arguing that using well-known datasets, which are available on the internet (e.g. ImageNet) their model will be able to handle a specific problem with a reduced training step.

    In Remote Sensing this perspective is also becoming a trend when using Deep Learning techniques to classify Remote Sensing datasets.

    In my opinion, the datasets used for pretrain are very different from Remote Sensing targets, mainly in two aspects:


    spatial resolution: a sensor can be ultra high spatial resolution (50cm for example) or very low resolution (2km for a single pixel), and the edges in all these images are different
    spectral resolution: the datasets found on the internet are composed by color pictures, obtained mainly by phone cameras, which are composed by 3 channels (red, green and blue). In Remote Sensing we can have several spectral channels, such as yellow or red-edge bands (available in WorldView-2), or infra-red channels, available in most of the satellites. How to train a model using 3 bands, when in reality you can have at least 5 bands with so different information?

    If you agree, or if you do not agree, please give some feedback and let's learn together.

    Follow my podcast: http://anchor.fm/tkorting

    Subscribe to my YouTube channel: http://youtube.com/tkorting

    The intro and the final sounds were recorded at my home, using an old clock that belonged to my grandmother.

    Thanks for listening

    • 6 min
    Are you sure you apply only Data Mining to your database?

    Are you sure you apply only Data Mining to your database?

    In this podcast I discuss the (sometimes) wrong use of the term Data Mining, with in accord to the paper

    From Data Mining to Knowledge Discovery in Databases, written in 1996 by Usama Fayyad, Gregory Shapiro, and Padhraic Smyth, 

    is defined as: Data mining is a step in the KDD process that consists of applying data analysis and discovery algorithms that produce a particular enumeration of patterns (or models) over the data.

    KDD means Knowledge Discovery in Databases, and is composed by the following steps:

    Data -> (selection) -> Target Data -> (preprocessing) -> Preprocessed Data -> (transformation) -> Transformed Data -> (data mining) -> Patterns -> (interpretation/evaluation) -> Knowledge

    Several authors call Data Mining when they are performing the entire cycle (from Data to Knowledge) and not only the data mining step, which can be represented also by the use of classification/clustering algorithms.

    The reference paper is available at: https://wvvw.aaai.org/ojs/index.php/aimagazine/article/download/1230/1131

    Follow my podcast: http://anchor.fm/tkorting

    Subscribe to my YouTube channel: http://youtube.com/tkorting

    The intro and the final sounds were recorded at my home, using an old clock that belonged to my grandmother.

    Thanks for listening

    • 5 min
    When the high resolution is not so high...

    When the high resolution is not so high...

    In this podcast I discuss the wrong use of the term Resolution in scientific articles or in the general media. Resolution in Remote Sensing can be used to describe several aspects of images, such as:


    temporal resolution: the time difference between two images of the same place
    spectral resolution: related to the number of bands and wavelengths, such as in Panchromatic, Multispectral, Hyperspectral, or Ultraspectral
    radiometric resolution: the number of bits needed to store a pixel value (e.g. 8 bits in Landsat 7 or 11 bits in WorldView-2)
    spatial resolution: the focus of this podcast, relating the area represented by a single pixel in an image

    I provide an interesting reference with an easy to use table, to understand what can be considered High Spatial Resolution, or Low Spatial Resolution:

    Taxonomy of Remote Sensing Systems - Spatial Ground Resolution


    Ultra High: 250m

    The reference is:

    Ehlers, M., Janowsky, R., Gähler, M., 2001. New remote sensing concepts for environmental monitoring. Proceedings of SPIE - The International Society for Optical Engineering.

    The original paper is available at https://www.researchgate.net/publication/252130745_New_remote_sensing_concepts_for_environmental_monitoring

    Follow my podcast: http://anchor.fm/tkorting

    Subscribe to my YouTube channel: http://youtube.com/tkorting

    The intro and the final sounds were recorded at my home, using an old clock that belonged to my grandmother.

    Thanks for listening

    • 5 min
    Is there an "Almost Perfect" agreement in a classification?

    Is there an "Almost Perfect" agreement in a classification?

    I discuss the extensive use of the Table Strength of Agreement based on different Kappa values, provided by:
    Landis, J.R. and Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics, pp.159-174.
    According to Google Scholar, this paper has more than 53.000 citations (up to October, 2019). In my opinion this table has been used sometimes with a different purpose than the original paper, which, according to the authors, "have been illustrated with an example involving  only two observers", and "these divisions are clearly arbitrary".
    The original paper is available at https://www.jstor.org/stable/pdf/2529310.pdf
    Follow my podcast: http://anchor.fm/tkorting
    Subscribe to my YouTube channel: http://youtube.com/tkorting
    The intro and the final sounds were recorded at my home, using an old clock that belonged to my grandmother.
    Thanks for listening

    • 7 min

Top Podcasts In Science

Hidden Brain
Hidden Brain, Shankar Vedantam
Radiolab
WNYC Studios
Something You Should Know
Mike Carruthers | OmniCast Media | Cumulus Podcast Network
Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas
Sean Carroll | Wondery
Ologies with Alie Ward
Alie Ward
StarTalk Radio
Neil deGrasse Tyson