Data Sets

Conversational English audio annotations

Audio-based NER annotations for selected Switchboard and Fisher conversations

Title Conversational English audio annotations
Overview Named Entity Recognition (NER) has been mostly studied in the context of written text. Specifically, NER is an important step in de-identification (de-ID) of medical records, many of which are recorded conversations between a patient and a doctor. In such recordings, audio spans with personal information should be redacted, similar to the redaction of sensitive character spans in de-ID for written text. This dataset was used to test the performance of our Audio De-id pipeline in our NAACL 2019 paper 'Audio De-identification: A New Entity Recognition Task' We evaluated our pipeline using a random subset of conversations from the Switchboard (LDC2001S13) and Fisher (LDC2004S13) datasets, which consist of English conversations.
  • Data analysis
  • Education
  • File Size 4MB

    © copyright 2017 All Rights Reserved.

    A Product of HunterTech Ventures