Most Popular
1500 questions
5
votes
1 answer
How does the Ornstein-Uhlenbeck process work, and how it is used in DDPG?
In section 3 of the paper Continuous control with deep reinforcement learning, the authors write
As detailed in the supplementary materials we used an Ornstein-Uhlenbeck process (Uhlenbeck & Ornstein, 1930) to generate temporally correlated…
dani
- 51
- 1
- 3
5
votes
1 answer
Why is the mean used to compute the expectation in the GAN loss?
From Goodfellow et al. (2014), we have the adversarial loss:
$$ \min_G \, \max_D V (D, G) = \mathbb{E}_{x∼p_{data}(x)} \, [\log \, D(x)] + \, \mathbb{E}_{z∼p_z(z)} \, [\log \, (1 − D(G(z)))] \, \text{.} \quad$$
In practice, the expectation is…
A is for Ambition
- 153
- 4
5
votes
1 answer
Can you convert a MDP problem to a Contextual Multi-Arm Bandits problem?
I'm trying to get a better understanding of Multi-Arm Bandits, Contextual Multi-Arm Bandits and Markov Decision Process.
Basically, Multi-Arm Bandits is a special case of Contextual Multi-Arm Bandits where there is no state(features/context). And…
peidaqi
- 151
- 2
5
votes
2 answers
Why are policy iteration and value iteration studied as separate algorithms?
In Sutton and Barto's book about reinforcement learning, policy iteration and value iterations are presented as separate/different algorithms.
This is very confusing because policy iteration includes an update/change of value and value iteration…
User007
- 51
- 3
5
votes
2 answers
How can we prevent AGI from doing drugs?
I recently read some introductions to AI alignment, AIXI and decision theory things.
As far as I understood, one of the main problems in AI alignment is how to define a utility function well, not causing something like the paperclip apocalypse.
Then…
user3584499
- 153
- 2
5
votes
1 answer
Why does TD Learning require Markovian domains?
One of my friends and I were discussing the differences between Dynamic Programming, Monte-Carlo, and Temporal Difference (TD) Learning as policy evaluation methods - and we agreed on the fact that Dynamic Programming requires the Markov assumption…
stoic-santiago
- 1,141
- 8
- 19
5
votes
1 answer
How can I find a specific word in an audio file?
I'm trying to train and use a neural network to detect a specific word in an audio file. The input of the neural network is an audio of 2-3 seconds duration, and the neural network must determine whether the input audio (the voice of a person)…
Ali.kavari76
- 111
- 6
5
votes
1 answer
What is eager learning and lazy learning?
What is the difference between eager learning and lazy learning?
How does eager learning or lazy learning help me build a neural network system? And how can I use it for any target function?
mogoja
- 73
- 5
5
votes
1 answer
Why do DQNs tend to forget?
Why do DQNs tend to forget? Is it because when you feed highly correlated samples, your model (function approximation) doesn't give a general solution?
For example:
I use level 1 experiences, my model $p$ is fitted to learn how to play that…
Chukwudi
- 369
- 2
- 7
5
votes
2 answers
Could an AI be sentient?
In theory, could an AI become sentient, as in learning and becoming self-aware, all from its source code?
MountainSide Studios
- 353
- 1
- 9
5
votes
3 answers
Why is symbolic AI not so popular as ANN but used by IBM's Deep Blue?
Everybody is implementing and using DNN with, for example, TensorFlow or PyTorch.
I thought IBM's Deep Blue was an ANN-based AI system, but this article says that IBM's Deep Blue was symbolic AI.
Are there any special features in symbolic AI that…
Dan D.
- 1,283
- 1
- 11
- 38
5
votes
1 answer
Why do we need target network in deep Q learning?
I already know deep RL, but to learn it deeply I want to know why do we need 2 networks in deep RL. What does the target network do? I now there is huge mathematics into this, but I want to know deep Q-learning deeply, because I am about to make…
dato nefaridze
- 862
- 8
- 20
5
votes
1 answer
What is a "closed expression" in the context of logic?
I was reading about logic systems and the following phrase appeared.
any closed expression that is not derivable inside the same system
What is a "closed expression" in this context? What does "closed expression that is not derivable" mean?
Ale
- 153
- 3
- 11
5
votes
2 answers
What is a trap function in the context of a genetic algorithm?
What is a trap function in the context of a genetic algorithm? How is it related to the concepts of local and global optima?
mountaincloud
- 63
- 7
5
votes
1 answer
Which paper introduced the term "softmax"?
Nowadays, the softmax function is widely used in deep learning and, specifically, classification with neural networks. However, the origins of this term and function are almost never mentioned anywhere. So, which paper introduced this term?
nbro
- 40,454
- 12
- 105
- 192