Highest Voted Questions - Artificial Intelligence Stack Exchange

5

votes

1 answer

How does the Ornstein-Uhlenbeck process work, and how it is used in DDPG?

In section 3 of the paper Continuous control with deep reinforcement learning, the authors write As detailed in the supplementary materials we used an Ornstein-Uhlenbeck process (Uhlenbeck & Ornstein, 1930) to generate temporally correlated…

asked Aug 21 '20 at 20:00

dani

51
1
3

5

votes

1 answer

Why is the mean used to compute the expectation in the GAN loss?

From Goodfellow et al. (2014), we have the adversarial loss: $$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$ In practice, the expectation is…

asked Aug 21 '20 at 05:01

A is for Ambition

153
4

5

votes

1 answer

Can you convert a MDP problem to a Contextual Multi-Arm Bandits problem?

I'm trying to get a better understanding of Multi-Arm Bandits, Contextual Multi-Arm Bandits and Markov Decision Process. Basically, Multi-Arm Bandits is a special case of Contextual Multi-Arm Bandits where there is no state(features/context). And…

asked Aug 17 '20 at 03:17

peidaqi

151
2

5

votes

2 answers

Why are policy iteration and value iteration studied as separate algorithms?

In Sutton and Barto's book about reinforcement learning, policy iteration and value iterations are presented as separate/different algorithms. This is very confusing because policy iteration includes an update/change of value and value iteration…

asked Aug 13 '20 at 13:31

User007

51
3

5

votes

2 answers

How can we prevent AGI from doing drugs?

I recently read some introductions to AI alignment, AIXI and decision theory things. As far as I understood, one of the main problems in AI alignment is how to define a utility function well, not causing something like the paperclip apocalypse. Then…

asked Aug 10 '20 at 05:26

user3584499

153
2

5

votes

1 answer

Why does TD Learning require Markovian domains?

One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…

asked Aug 07 '20 at 05:19

stoic-santiago

1,141
8
19

5

votes

1 answer

How can I find a specific word in an audio file?

I'm trying to train and use a neural network to detect a specific word in an audio file. The input of the neural network is an audio of 2-3 seconds duration, and the neural network must determine whether the input audio (the voice of a person)…

asked Aug 03 '20 at 09:28

Ali.kavari76

111
6

5

votes

1 answer

What is eager learning and lazy learning?

What is the difference between eager learning and lazy learning? How does eager learning or lazy learning help me build a neural network system? And how can I use it for any target function?

asked Jul 30 '20 at 13:31

mogoja

73
5

5

votes

1 answer

Why do DQNs tend to forget?

Why do DQNs tend to forget? Is it because when you feed highly correlated samples, your model (function approximation) doesn't give a general solution? For example: I use level 1 experiences, my model $p$ is fitted to learn how to play that…

asked Jul 27 '20 at 11:51

Chukwudi

369
2
7

5

votes

2 answers

Could an AI be sentient?

In theory, could an AI become sentient, as in learning and becoming self-aware, all from its source code?

asked Nov 04 '16 at 12:22

MountainSide Studios

353
1
9

5

votes

3 answers

Why is symbolic AI not so popular as ANN but used by IBM's Deep Blue?

Everybody is implementing and using DNN with, for example, TensorFlow or PyTorch. I thought IBM's Deep Blue was an ANN-based AI system, but this article says that IBM's Deep Blue was symbolic AI. Are there any special features in symbolic AI that…

asked Jul 20 '20 at 07:07

Dan D.

1,283
1
11
38

5

votes

1 answer

Why do we need target network in deep Q learning?

I already know deep RL, but to learn it deeply I want to know why do we need 2 networks in deep RL. What does the target network do? I now there is huge mathematics into this, but I want to know deep Q-learning deeply, because I am about to make…

asked Jul 15 '20 at 17:03

dato nefaridze

862
8
20

5

votes

1 answer

What is a "closed expression" in the context of logic?

I was reading about logic systems and the following phrase appeared. any closed expression that is not derivable inside the same system What is a "closed expression" in this context? What does "closed expression that is not derivable" mean?

asked Nov 01 '16 at 17:48

Ale

153
3
11

5

votes

2 answers

What is a trap function in the context of a genetic algorithm?

What is a trap function in the context of a genetic algorithm? How is it related to the concepts of local and global optima?

asked Oct 31 '16 at 21:09

mountaincloud

63
7

5

votes

1 answer

Which paper introduced the term "softmax"?

Nowadays, the softmax function is widely used in deep learning and, specifically, classification with neural networks. However, the origins of this term and function are almost never mentioned anywhere. So, which paper introduced this term?

asked Jul 10 '20 at 00:37

nbro

40,454
12
105
192

Most Popular