Most Popular

1500 questions
9
votes
1 answer

How can I do simple machine learning without hard-coding behavior?

I've always been interested in machine learning, but I can't figure out one thing about starting out with a simple "Hello World" example - how can I avoid hard-coding behavior? For example, if I wanted to "teach" a bot how to avoid randomly placed…
Doorknob
  • 215
  • 2
  • 8
9
votes
1 answer

Generate predictions that are orthogonal (uncorrelated) to a given variable

I have an X matrix, a y variable, and another variable ORTHO_VAR. I need to predict the y variable using X, however, the predictions from that model need to be orthogonal to ORTHO_VAR while being as correlated with y as possible. I would prefer…
Chris
  • 224
  • 2
  • 9
9
votes
5 answers

Tutorials on topic models and LDA

I would like to know if you people have some good tutorials (fast and straightforward) about topic models and LDA, teaching intuitively how to set some parameters, what they mean and if possible, with some real examples.
pedrobisp
  • 191
  • 1
  • 1
  • 3
9
votes
1 answer

Similarity measure based on multiple classes from a hierarchical taxonomy?

Could anyone recommend a good similarity measure for objects which have multiple classes, where each class is part of a hierarchy? For example, let's say the classes look like: 1 Produce 1.1 Eggs 1.1.1 Duck eggs 1.1.2 Chicken eggs 1.2…
Dave Challis
  • 395
  • 2
  • 10
9
votes
2 answers

In which cases shouldn't we drop the first level of categorical variables?

Beginner in machine learning, I'm looking into the one-hot encoding concept. Unlike in statistics when you always want to drop the first level to have k-1 dummies (as discussed here on SE), it seems that some models needs to keep it and have k…
Dan Chaltiel
  • 341
  • 2
  • 10
9
votes
2 answers

Transform an Autoencoder to a Variational Autoencoder?

I would like to compare the training by an Autoencoder and a variational autoencoder. I have already run the traing using AE. I would like to know if it's possible to transform this AE into a VAE and maintain the same outputs and inputs. Thank you.
Kahina
  • 624
  • 1
  • 9
  • 23
9
votes
2 answers

predict gives the same output value for every image (Keras)

I am trying to classify images and assign them label 1 or 0. (Skin cancer or not). I am aware of the three main issues regarding having the same output in every input. I did not split the set and I'm just trying to apply the CNN on the train set,…
Florian Laborde
  • 115
  • 1
  • 1
  • 7
9
votes
6 answers

When to use mean vs median

I'm new to data science and stats, so this might seems like a beginner question. I'm working on a dataset where I've user's Twitter followers gain per day. I want to measure the average growth he had over a period of time, which I did by finding the…
Mukul Jain
  • 203
  • 2
  • 6
9
votes
1 answer

Bert Fine Tuning with additional features

I want to use Bert for an nlp task. But I also have additional features that I would like to include. From what I have seen, with fine tuning, one only changes the labels and retrains the classification layer. Is there a way to used pre-trained…
Jeff
  • 193
  • 1
  • 3
9
votes
2 answers

How to implement hierarchical labeling classification?

I am currently working on the task of eCommerce product name classification, so I have categories and subcategories in product data. I noticed that using subcategories as labels delivers worse results (84% acc) than categories (94% acc). But…
chacid
  • 171
  • 7
9
votes
1 answer

What is GridSearchCV doing after it finishes evaluating the performance of parameter combinations that takes so long?

I'm running GridSearchCV to tune some parameters. For example: params = { 'max_depth':[18,21] } gscv = GridSearchCV( xgbc, params, scoring='roc_auc', verbose=50, cv=StratifiedKFold(n_splits=2,…
Dan Scally
  • 1,754
  • 7
  • 25
9
votes
4 answers

Suggest text classifier training datasets

Which freely available datasets can I use to train a text classifier? We are trying to enhance our users engagement by recommending the most related content for him, so we thought If we classified our content based on a predefined bag of words we…
Abdelmawla
  • 121
  • 1
  • 8
9
votes
4 answers

Loss Function for Probability Regression

I am trying to predict a probability with a neural network, but having trouble figuring out which loss function is best. Cross entropy was my first thought, but other resources always talk about it in the context of a binary classification problem…
ahbutfore
  • 191
  • 1
  • 2
9
votes
1 answer

Learning signal encoding

I have a large number of samples which represent Manchester encoded bit streams as audio signals. The frequency at which they are encoded is the primary frequency component when it is high, and there is a consistent amount of white noise in the…
ragingSloth
  • 1,824
  • 3
  • 14
  • 15
9
votes
1 answer

Relational Data Mining without ILP

I have a huge dataset from a relational database which I need to create a classification model for. Normally for this situation I would use Inductive Logic Programming (ILP), but due to special circumstances I can't do that. The other way to tackle…
user697110
  • 259
  • 2
  • 5