Questions tagged [machine-learning]

Algorithmic learning from data. A form of Pattern Recognition.

The tag is used for questions related to systems learning from data. This often involves training an algorithm with data and then testing it. Data sets are typically very structured and even calibrated or designed for a machine learning approach. Machine learning, a branch of artificial intelligence, concerns the construction and study of systems that can learn from data. For example, a machine learning system could be trained on email messages to learn to distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders.

The core of machine learning deals with representation and generalization. Representation of data instances and functions evaluated on these instances are part of all machine learning systems. Generalization is the property that the system will perform well on unseen data instances; the conditions under which this can be guaranteed are a key object of study in the subfield of computational learning theory.

There are a wide variety of machine learning tasks and successful applications. Optical character recognition, in which printed characters are recognized automatically based on previous examples, is a classic example of machine learning.

265 questions
11
votes
4 answers

Netflix Data set

One of the canonical examples of a big data competition was the Netflix prize data set. It seems to have disappeared from the Internet. Is that the case, or is it still accessible somewhere?
Sycorax
  • 213
  • 1
  • 2
  • 6
6
votes
3 answers

List of iOS and/or Android apps

I am looking for as large as possible list of app names. The app names can be iOS or Android, does not really matter. The data is needed for a machine learning solution that requires large number of training examples. I can harvest manually a couple…
5
votes
3 answers

Where can I find automobile insurance claims data set?

I really need a dataset about automobile insurance claims to train and test learning algorithms. I found references to Masachussets PIP claims data and to Spanish claims data in many scientific articles, but I couldn't find them... The best would be…
StackUser
  • 231
  • 1
  • 2
  • 7
5
votes
1 answer

Dataset suggestions for teaching data science in a for-profit setting

I'm developing an ebook for a publishing company on Data Science. I'm hunting for a dataset that would be appropriate for this. I've seen many tutorials use iris, but I don't want to - I want to use a larger dataset that allows the audience to have…
3
votes
0 answers

Gold standard dataset for entity recognition in email

I'm looking for an annotated dataset for named entity recognition and classification which I could use as a gold standard. Preferably I'm looking for an email dataset, something like Enron (though I couldn't actually find a version with named entity…
khal28
3
votes
1 answer

Where do I get data to train a program?

I am creating a learning program which should learn how to answer to a binary yes/no question given numeric information. For now, I used this data to train it: The problem is, I only have 569 records. It is not few, but I'd like more. Also, I should…
Aspie96
  • 31
  • 1
3
votes
0 answers

Which agencies and which part to get training data for passports, id cards and licenses

I am interested in doing some research relating to computer vision and passports, id cards and licenses. I have reached out to several agencies seeking data I would use as training data but keep running into dead ends. Who would you start with for…
rmhrisk
  • 131
  • 1
3
votes
2 answers

Looking for malware detection dataset

I'm looking for malware detection dataset. I have found some but they have very few features from the pe headers, which is not helpful for detecting malware as APIs/resource .... It would be better if there is raw files and i extract the features…
0xDEADC0DE
  • 143
  • 4
2
votes
1 answer

Mental health diagnosis datasets?

What I am looking for is a dataset that has a number of independent variables, such as age, sex, smoker/non-smoker etc. for which I intend to carry out supervised learning for the dependent variable which would be a mental health diagnosis such as…
2
votes
0 answers

Regression datasets for benchmarking

Is there a collection of regression datasets for the purpose of benchmarking? For classification I found this paper (https://arxiv.org/abs/1708.03731) which has a collection of classification datasets (some of the UCI ML Repository). Any suggestions…
PhilippPro
  • 121
  • 2
2
votes
1 answer

How do i import a 2gig csv file into R and be able to work with it on my PC?

Importing data into the R programming environment. I've tried imorting the data set with read.csv() but its telling me it cannot import vector of more than 500mb.
1
vote
1 answer

List of complex datasets for ML in the cloud comparison

I'm looking for complex datasets for performing a comparison on machine learning in the cloud solutions such as Amazon Machine Learning, Microsoft Azure, IBM Watson, Google Prediction API, Rapid Miner and some others. In particular I will try the…
Javierfdr
1
vote
0 answers

Heavy tailed dataset for heavy hitters problem

I'm looking for datasets for evaluating algorithms for finding top-k on data streams (e.g.). I currently have network trace from Caida, and some self-generated zipf i.i.d. distributed datasets. I'm looking for real-life data sets which are heavy…
R B
  • 111
  • 2
1
vote
1 answer

Where can I download tagged dataset of text related to finance, programming, analytics etc

I want to create and train a model which classifies a new text content into finance, programming, analytics, design etc. Where can I get enough dataset to train my models? TIA.
Abhishek
  • 111
  • 3
1
vote
0 answers

Trying to understand output of model.get_weights() in keras

To get a feeling of how neural nets work, I decided to train a super simple 2-1 net to add up its two inputs. Here's my code. import tensorflow as tf import numpy as np import math x_train = np.random.rand(10000,2) y_train =…
shebuesh
  • 11
  • 1
1
2