9

I'm trying to get some test data for a conversation dataset for free. I have referred to: Speech audio files dataset with language labels, but unfortunately it does not meet my requirements.

I am specifically looking for a natural conversation dataset (Dialog Corpus?) such as a phone conversations, talk shows, and meetings. I've considered two approaches:

1) Find a suitable dataset

2) Scrape talk radio podcasts for audio content.

These files need to be stored as a .wav format.

Any suggestions and help would be appreciated.

Patrick Hoefler
  • 5,790
  • 4
  • 31
  • 47
andor kesselman
  • 191
  • 1
  • 3
  • a) i wouldn't focus on file format; conversions are typically easy with media, getting the data seems more difficult. b) .wav is not an open data format. – albert Oct 03 '16 at 21:31
  • for a) you are correct, conversion isn't too difficult. The main advantage of WAV is it is loseless however. Reconsidering my goals with this project it actually shouldn't matter too much however. b) Thanks. I did not realise that. – andor kesselman Oct 03 '16 at 22:10
  • lol...i hope that didn't come across as snide. i've found converting formats, especially media, relatively painless in most cases. the not open format comment though was legit. proprietary formats only hold open data back. – albert Oct 03 '16 at 22:28
  • not at all. Thanks for the input. :) Still looking for an easy dataset to get my hands on....I would rather avoid creating a scraper if possible as I'm trying to simply do some quick prototyping. – andor kesselman Oct 03 '16 at 22:33

1 Answers1

5

Oyez all recorded audio of Supreme Court since 1955. Not sure if that fits...

Internet Archive's Audio Collection looks like it has a few channels worth checking out. I'd have checked them out and linked to them, but for some reason the Internet Archive doesn't use anchor elements....
EDIT: since posting these, they do use anchor elements. here's one:
Old Radio Shows

Orson Welles Show Recordings

List of sites with more public domain offerings

Not sure if any of these are 100% match for your request; feel free to pick these options apart.

albert
  • 11,885
  • 4
  • 30
  • 57
  • 1
    sorry, haven't had time yet to delve deeply into this. Once I have, if it turns out to answer the question then I will give it the answer vote. This is a great start – andor kesselman Oct 06 '16 at 21:56
  • appreciate that, but no rush whatsoever. i do like having them curated, so again, its appreciated. whatever does/doesn't fit should be edited in the answer. – albert Oct 07 '16 at 01:14
  • Are any of these recordings provided as separated into different channels for the individual speakers, or are they all just mashed together into one audio channel in all of the sources? – HelloGoodbye Apr 05 '19 at 00:57
  • i have no idea. if you check out it, please come back and let us know. – albert Apr 06 '19 at 20:07