3

I am creating a learning program which should learn how to answer to a binary yes/no question given numeric information.

For now, I used this data to train it:

The problem is, I only have 569 records. It is not few, but I'd like more. Also, I should train it with other types of data (not related to breast cancer), to see how it does with different classes of problems.

What I need is a list of records each containing a bunch of numeric fields and a yes/no answer (such as "Is this tumor malicious?" in the data I already use).

Does anybody know where to find such repositories?

Orophile
  • 1,751
  • 4
  • 11
  • 30
Aspie96
  • 31
  • 1

1 Answers1

1

If you want to deal with breast cancer - http://www.kdd.org/kdd-cup/view/kdd-cup-2008/Data

For every year (1997-2014) there is a data set. Just change the year in URL

http://www.kdd.org/kdd-cup/view/kdd-cup-xxxx/Data

You can also check http://gallery.cortanaintelligence.com/experiments. If you sign up for Azure ML (it is free) you can download any data set used in any of these experiments.

Stealth
  • 111
  • 2
  • Thank you so much. The reason I took long to anser is that I don't really know how to read that data, so I couldn't at first understand if it's what I am looking for. Each record contains a list of 117 records, but from the page you linked I could not understand how to associate to each record its answer (malignant or benign). Can you understand that data better than I do? – Aspie96 Jun 23 '16 at 21:10
  • Here is described how to use data sets. It also includes step by step tutorial for Azure ML – Stealth Jun 23 '16 at 21:24