Highest Voted Questions - Data Science Stack Exchange

9

votes

5 answers

Why 100% accuracy on test data is not good?

I was asked this question in an interview and wasn’t able to give a satisfactory answer not only upto the interviewers' expectations but of my own as well. The question was as above only, he later gave an example as if why if my model predicted the…

asked Dec 30 '18 at 17:40

Rishabh Sharma

659
2
8
18

9

votes

1 answer

What is meant by Distributed for a gradient boosting library?

I am checking out XGBoost documentation and it's stated that XGBoost is an optimized distributed gradient boosting library. What is meant by distributed? Have a nice day

asked Nov 15 '18 at 14:24

Tommaso Bendinelli

275
1
9

9

votes

3 answers

Can we remove features that have zero-correlation with the target/label?

So I draw a pairplot/heatmap from the feature correlations of a dataset and see a set of features that bears Zero-correlations both with: every other feature and also with the target/label .Reference code snippet in python is below: corr =…

asked Nov 02 '18 at 08:48

karthiks

342
1
2
10

9

votes

0 answers

Why is my Keras model not learning image segmentation?

Edit: as is turns out, not even the model's initial creator could successfully fine-tune it. This is most likely a problem of implementation, or possibly related to the non-intuitive way in which the Keras batch normalization layer works. I'm trying…

asked Oct 30 '18 at 12:06

Matt

199
8

9

votes

1 answer

Validation loss is lower than the training loss

I am using autoencoder for anomaly detection in warranty data. Architecture 1: The plot shows the training vs validation loss based on Architecture 1. As we see in the plot, validation loss is lower than the train loss which is totally weird.…

asked Oct 14 '18 at 13:16

Ashwini

235
1
2
7

9

votes

1 answer

How to make two parallel convolutional neural networks in Keras?

I created two convolutional neural networks (CNN), and I want to make these networks work in parallel. Each network takes different type of images and they join in the last fully connected layer. How to do this?

asked Oct 09 '18 at 06:31

N.IT

1,995
4
19
35

9

votes

1 answer

clipping the reward for adam optimizer in keras

I would like to clip the reward in keras. I saw it is possible to clip the norm and clip the value is sgd as follows: sgd = optimizers.SGD(lr=0.01, clipnorm=1.) sgd = optimizers.SGD(lr=0.01, clipvalue=0.5) What are clipping the norm and clipping…

asked Oct 03 '18 at 20:07

user10296606

1,834
5
17
31

9

votes

2 answers

Python - Converting 3D numpy array to 2D

I have a 3D matrix like this: array([[[ 0, 1], [ 2, 3]], [[ 4, 5], [ 6, 7]], [[ 8, 9], [10, 11]], [[12, 13], [14, 15]]]) and would like to stack them in a grid format, ending up with: array([[ 0, …

asked Oct 03 '18 at 15:14

Tarlan Ahad

271
2
5
15

9

votes

1 answer

keras' ModelCheckpoint not working

I'm trying to train a model in keras and I'm using ModelCheckpoint to save the best model according to a monitored validation metric (in my case the Jaccard index). While I can see the model improving in tensorboard, when I try to load the weights…

asked Sep 16 '18 at 18:41

ILM91

338
1
7

9

votes

1 answer

When does decision tree perform better than the neural network?

I was experimenting with different modelling methods including KNN, Decision Trees, Neural Networks and SVN and trying to fit my data to see which works the best. To my surprise, the decision tree works the best with training accuracy of 1.0 and…

asked Sep 16 '18 at 11:29

Suhail Gupta

601
8
15

9

votes

2 answers

Display Images (url) Inside Pandas Dataframe

I would like to display images (mostly jpg and png formats) directly from their url link inside a pandas dataframe. Imagine I already have the following dataframe: id image_url 1 …

asked Sep 11 '18 at 06:38

TwinPenguins

4,249
3
19
53

9

votes

2 answers

Is there any consensus on choosing an appropriate ML approach?

I am studying data science at the moment and we are taught a dizzying variety of basic regression/classification techniques (linear, logistic, trees, splines, ANN, SVM, MARS, and so on....), along with a variety of extra tools (bootstrapping,…

asked Sep 09 '18 at 06:23

Brendan Hill

155
8

9

votes

3 answers

R random forest on Amazon ec2 Error: cannot allocate vector of size 5.4 Gb

I am training random forest models in R using randomForest() with 1000 trees and data frames with about 20 predictors and 600K rows. On my laptop everything works fine, but when I move to amazon ec2, to run the same thing, I get the error: Error:…

asked Dec 19 '14 at 16:02

SOUser

9

votes

2 answers

Dealing with feature vectors of variable length

How does one deal with a feature vector that can vary in size? Let's say per object, I calculate 4 features. In order to solve a certain regression problem, I may have 1, 2, or more of these objects (no more than 10). Thus, the feature vector is…

asked Aug 21 '18 at 20:59

Otto Nahmee

91
1
4

9

votes

3 answers

Interactive Graphing while logging data

I'm looking to graph and interactively explore live/continuously measured data. There are quite a few options out there, with plot.ly being the most user-friendly. Plot.ly has a fantastic and easy to use UI (easily scalable, pannable, easily…

asked Dec 17 '14 at 21:17

Clayton Pipkin

93
3

Most Popular