Highest Voted Questions - Data Science Stack Exchange

9

votes

1 answer

How can I do simple machine learning without hard-coding behavior?

I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior? For example, if I wanted to "teach" a bot how to avoid randomly placed…

machine-learning

asked May 13 '14 at 23:58

Doorknob

215
2
8

9

votes

1 answer

Generate predictions that are orthogonal (uncorrelated) to a given variable

I have an X matrix, a y variable, and another variable ORTHO_VAR. I need to predict the y variable using X, however, the predictions from that model need to be orthogonal to ORTHO_VAR while being as correlated with y as possible. I would prefer…

correlation

asked Apr 13 '19 at 03:32

Chris

224
2
9

9

votes

5 answers

Tutorials on topic models and LDA

I would like to know if you people have some good tutorials (fast and straightforward) about topic models and LDA, teaching intuitively how to set some parameters, what they mean and if possible, with some real examples.

asked Jan 08 '15 at 15:47

pedrobisp

191
1
1
3

9

votes

1 answer

Similarity measure based on multiple classes from a hierarchical taxonomy?

Could anyone recommend a good similarity measure for objects which have multiple classes, where each class is part of a hierarchy? For example, let's say the classes look like: 1 Produce 1.1 Eggs 1.1.1 Duck eggs 1.1.2 Chicken eggs 1.2…

similarity

asked Jan 08 '15 at 10:09

Dave Challis

395
2
10

9

votes

2 answers

In which cases shouldn't we drop the first level of categorical variables?

Beginner in machine learning, I'm looking into the one-hot encoding concept. Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k…

asked Mar 19 '19 at 19:55

Dan Chaltiel

341
2
10

9

votes

2 answers

Transform an Autoencoder to a Variational Autoencoder?

I would like to compare the training by an Autoencoder and a variational autoencoder. I have already run the traing using AE. I would like to know if it's possible to transform this AE into a VAE and maintain the same outputs and inputs. Thank you.

asked Mar 11 '19 at 06:09

Kahina

624
1
9
23

9

votes

2 answers

predict gives the same output value for every image (Keras)

I am trying to classify images and assign them label 1 or 0. (Skin cancer or not). I am aware of the three main issues regarding having the same output in every input. I did not split the set and I'm just trying to apply the CNN on the train set,…

asked Mar 06 '19 at 11:16

Florian Laborde

115
1
1
7

9

votes

6 answers

When to use mean vs median

I'm new to data science and stats, so this might seems like a beginner question. I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the…

asked Mar 06 '19 at 03:30

Mukul Jain

203
2
6

9

votes

1 answer

Bert Fine Tuning with additional features

I want to use Bert for an nlp task. But I also have additional features that I would like to include. From what I have seen, with fine tuning, one only changes the labels and retrains the classification layer. Is there a way to used pre-trained…

asked Mar 05 '19 at 02:57

Jeff

193
1
3

9

votes

2 answers

How to implement hierarchical labeling classification?

I am currently working on the task of eCommerce product name classification, so I have categories and subcategories in product data. I noticed that using subcategories as labels delivers worse results (84% acc) than categories (94% acc). But…

asked Feb 25 '19 at 12:17

chacid

171
7

9

votes

1 answer

What is GridSearchCV doing after it finishes evaluating the performance of parameter combinations that takes so long?

I'm running GridSearchCV to tune some parameters. For example: params = { 'max_depth':[18,21] } gscv = GridSearchCV( xgbc, params, scoring='roc_auc', verbose=50, cv=StratifiedKFold(n_splits=2,…

asked Feb 19 '19 at 12:48

Dan Scally

1,754
7
25

9

votes

4 answers

Suggest text classifier training datasets

Which freely available datasets can I use to train a text classifier? We are trying to enhance our users engagement by recommending the most related content for him, so we thought If we classified our content based on a predefined bag of words we…

asked Jun 18 '14 at 16:21

Abdelmawla

121
1
8

9

votes

4 answers

Loss Function for Probability Regression

I am trying to predict a probability with a neural network, but having trouble figuring out which loss function is best. Cross entropy was my first thought, but other resources always talk about it in the context of a binary classification problem…

asked Feb 09 '19 at 00:14

ahbutfore

191
1
2

9

votes

1 answer

Learning signal encoding

I have a large number of samples which represent Manchester encoded bit streams as audio signals. The frequency at which they are encoded is the primary frequency component when it is high, and there is a consistent amount of white noise in the…

asked Jun 18 '14 at 03:19

ragingSloth

1,824
3
14
15

9

votes

1 answer

Relational Data Mining without ILP

I have a huge dataset from a relational database which I need to create a classification model for. Normally for this situation I would use Inductive Logic Programming (ILP), but due to special circumstances I can't do that. The other way to tackle…

asked Jun 17 '14 at 13:46

user697110

259
2
5

Most Popular