I wish to reproduce the following diagram in tikz (Source: Matlab):

So far, I have managed to come up with the main nodes though I am struggling to add the arrows. Below is my MWE which I managed to come up with after going through the solution at Tikz flow chart coloring.
\documentclass[tikz,multi,border=10pt]{standalone}
\usetikzlibrary{shadows,arrows.meta,positioning,backgrounds,fit,chains,scopes}
% Define block styles
\tikzset{%
materia/.style={draw, fill=blue!20, text width=6.0em, text centered, minimum height=1.5em,drop shadow},
etape/.style={materia, text width=8em, minimum width=10em, minimum height=3em, rounded corners, drop shadow},
linepart/.style={draw, thick, color=black!50, -LaTeX, thick},
line/.style={draw, thick, color=black!50, -LaTeX},
ur/.style={draw, text centered, minimum height=0.01em},
back group/.style={fill=yellow!20,rounded corners, draw=black!50, thick, inner xsep=15pt, inner ysep=10pt},
}
\newcommand{\transreceptor}[3]{%
\path [linepart] (#1.east) -- node [above] {\scriptsize #2} (#3);}
\begin{document}
\begin{tikzpicture}
[
start chain=p going above,
every on chain/.append style={etape},
every join/.append style={line},
node distance=1 and -.25,
]
{
\node [on chain] {\textbf{ENVIRONMENT}};
\node [on chain, join] {REINFORCEMENT LEARNING ALGORITHM};
\node [on chain, join] {POLICY};
}
\begin{scope}[on background layer]
\node (bk1) [back group] [fit=(p-2) (p-3)] {};
\end{scope}
\path (bk1.east)+(+6.0,0) node (ur1)[ur] {};
\transreceptor{bk1}{Action $A_t$}{ur1};
\end{tikzpicture}
\end{document}

