1

figure 1 from attention is all you need Figure 2 from Attention Is All You Need

I quite like the styling of these figures, and I've been working with TikZ a bit to create diagrams. I'd like to learn how to create such nice diagrams, if possible, with TikZ.

These two figures come from a popular paper, Attention Is All You Need, by researchers at Google Brain and Google Research: https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

ctesta01
  • 127
  • 1
    Welcome to TeX.SX! For such questions, it is always helpful (and de facto required) to provide a minimal working example (MWE) that shows what you tried so far. As for the styling: I think the rounded corners option will help you a lot here. Also, maybe have a look at the chains library provided by TikZ for easier positioning of the nodes. For the larger gray boxes, the fit and backgrounds libraries might be helpful. – Jasper Habicht Jun 08 '23 at 22:43
  • 2
    Welcome. There's a lot to breakdown. First image: chains library for placing and connecting nodes in a, well, chain. A few more connections, the symbol besides the circled + can be done with a path picture (ex). The gray boxes behind a selection of nodes can be done with backgrounds and fit libraries. The second image is interesting. Usually, the shadows library can help with producing these triple nodes however connecting lines to it won't be easy. Here's probably a bit manual work necessary. Especially for the h brace on the right side. – Qrrbrbirlbel Jun 08 '23 at 22:43
  • This site contains dozens if not hundreds of various flowcharts. The Internet has also examples. – Qrrbrbirlbel Jun 08 '23 at 22:46

1 Answers1

8

I like the problem of the second diagram …

The shadows library provides already a way to shadow a node in this same way, but this is done in a way that makes it hard to reference the shadows for connecting edges to it. (Though, it's not impossible by doing some careful calculations.)

The ext.paths.ortho's only vertical first key is used (via the shortcut |*) to make the strictly vertical connections between nodes.

The phantom node key makes sure that the shadowed nodes have no text while they still appear the same size as if they had text.


I don't like both versions of the brace at the right side:

The left one is using the brace decoration, the right one is using Hooks arrows.

brace decoration Hooks arrow tips

Both are commented out in the code below.


This answer doesn't provide any information on how to place nodes in relation to each other. Plenty of resources are available for that in this place.

For making the Attention node as wide as the three Linear nodes, the ext.positioning-plus library's above = of -(Linear-1)(Linear-3) can be used.

I'm using the layers that are provided by each path to place some edges behind the nodes that were placed earlier in the code, see the comments in the code.

Code

\documentclass[tikz]{standalone}
\usetikzlibrary{arrows.meta, calc, decorations.pathreplacing,
                ext.paths.ortho, positioning, quotes}
\makeatletter
\tikzset{phantom node/.code=\tikz@addoption{\expandafter\let\csname pgf@sh@boxes@\tikz@shape\endcsname\pgfutil@empty}}
\makeatother
\tikzset{
  shadowed node xshift/.initial=1.5ex, shadowed node yshift/.initial=1ex, shadowed node list/.initial={2, 1},
  pics/shadowed node/.default=\pgfkeysvalueof{/tikz/shadowed node list},
  shadowed node/.pic={
    \foreach[expand list] \elem in {#1}
      \scoped[transparency group, shadowed node calculation={\elem}]
        \node[style/.expand once=\tikzpictextoptions, phantom node,
              xshift={\elem*\pgfkeysvalueof{/tikz/shadowed node xshift}},
              yshift={\elem*\pgfkeysvalueof{/tikz/shadowed node yshift}}] (-\elem) {\tikzpictext};
    \node[alias=-0, style/.expand once=\tikzpictextoptions] () {\tikzpictext};},
  set shadowed node calculation parameter/.style={shadowed node calculation/.style={opacity={(#1-##1+1)/(#1+1)}}},
  set shadowed node calculation parameter=2,
  overshoot line to/.style={to path={($(\tikztostart)!-(#1)!(\tikztotarget)$)--($(\tikztotarget)!-(#1)!(\tikztostart)$)\tikztonodes}},
  edges have transparency group/.style={execute at begin to={\scope[transparency group,#1]}, execute at end to=\endscope}}
\begin{document}
\sffamily
\begin{tikzpicture}[
  thick, x=2cm, node distance=7.5mm,
  n/.style={rounded corners, draw, fill={#1!20}},
  > = {Stealth[round, sep]}]
% By default, all nodes/pics/edges are placed "in front of path"
% here, the actual path is empty (edges create their own path)
\node foreach[count=\i] \t in {V, K, Q} (VKQ-\i) at (\i, 0) {\t}
  pic foreach \i in {1, 2, 3}
    ["Linear" n=green, above=of VKQ-\i] (Linear-\i) {shadowed node}
  pic["Scaled Dot-Product Attention" n=blue, above = of Linear-2]
    (Attention) {shadowed node}
  node[above = of Attention, n = yellow] (Concat) {Concat}
  node[above = of Concat,    n = green]  (Linear) {Linear}
  node[above = of Linear]                (MHAtt)  {Multi-Head Attention}
%
  [->, ortho/install shortcuts]
  foreach \i in {1, 2, 3}{
    (VKQ-\i) edge coordinate [pos=.2] (@) (Linear-\i)
    (Linear-\i) edge[|*] (Attention)
    foreach \j in {2, 1}{
      [
        edges have transparency group={shadowed node calculation=\j},
        behind path % this edges should be behind the other nodes/pics
      ]
      (@) edge[out=90, in=-90] (Linear-\i-\j)
      [in front of path] % split the following path midway (→ @@)
      (Linear-\i-\j) edge[path only, |*] coordinate (@@) (Attention-\j)
                     edge[-] (@@)                % draw the first half on top
      [behind path] (@@) edge[|*] (Attention-\j) % and the second behind
    }
  }
  foreach \i in {0, 1, 2}{
    [edges have transparency group={shadowed node calculation=\i}]
    (Attention-\i) edge[|*] (Concat)
  }
  (Concat) edge (Linear)
  (Linear) edge (MHAtt);
%\draw[decorate, decoration = {name = brace, amplitude = 2mm}]
%  (Attention-2.east) to[overshoot line to=1mm] (Attention.east);
%\path[every pin edge/.style={black, thin}] coordinate [pin=right:h] ()
%  at ($(Attention-2.east)!.5!(Attention.east)!2mm+.5\pgflinewidth!90:(Attention.east)$);
%\draw[
%  arrows={[arc=135]}, arrows={Hooks[left]-Hooks[right]},
%  s/.style={shift={(.75mm,-1mm)}}]
%  ([s]Attention-2.east) to[overshoot line to=2mm] coordinate(@) ([s]Attention.east);
%\path[every pin edge/.style={black, thin}] node also[pin=right:h](@);
\end{tikzpicture}
\end{document}

Output

enter image description here

Qrrbrbirlbel
  • 119,821
  • Thanks for the link. Is this better than \myphantom? I don't exactly know what it's doing. – cfr Aug 31 '23 at 12:58
  • 1
    @cfr The \pgf@sh@boxes@<shape name> contains a list of TeX boxes the shape uses. In most cases this is just text (which stands for the \pgfnodeparttextbox), the coordinate shape has it empty. By settings this temporarily to empty the box will not be placed in the picture but the shape's definition will still consider its size in determing the node's measurements. It's basically a key version of \phantom but should work for all kinds of content. (I'm not sure how \phantom will interact with multiline nodes.) And well, the user doesn't have to use \myphantom. – Qrrbrbirlbel Aug 31 '23 at 13:11
  • How safe is it to use pgf@sh@boxes@\tikzshape compared with \phantom? I guess my inclination is to use something documented when possible. – cfr Aug 31 '23 at 13:11
  • \phantom works OK with multline nodes as far as I can tell. And the user doesn't have to use \myphantom in my real case. That's just a product of turning it into an MWE. – cfr Aug 31 '23 at 13:13
  • 1
    Store the content of the boxes list in your own macro to be able to unphantom it and it shouldn't break anything. the node path operation will construct the box and typeset its content as usual. The same already happens when you do node[shape=coordinate] {stuff}. That stuff will be typeset inside a box but will not be put on to the page. (And internally, the coordinate path operation will do exactly that but where stuff is empty.) – Qrrbrbirlbel Aug 31 '23 at 13:14
  • I was thinking more whether the macro might disappear in a future release of tikz. I figure \phantom is likely here to stay. I'm not concerned about losing content in this case as the placeholder is getting dummy content anyway. I know the content of the placeholder. What I don't know is the style. (The placeholders are not a very accurate method of placement, but they are sufficient to be helpful.) – cfr Aug 31 '23 at 13:19
  • My actual placeholders have e.g. \foreach \i [count=\ino] in {2,...,\chronos@uchod} \node (u\i) [anchor=south west, alias=level \i] at (u\ino.north west) {\phantom{Enw}u\i{} \textbar{} level \i\\\phantom{1234}}; – cfr Aug 31 '23 at 14:53