What I'm trying to accomplish is to parse an SGF file (a text file) into a TikZ diagram, something like this (ideally, I would point to the SGF file itself, instead of inlining a string):
\parsesgf{(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2024-02-05];B[as];W[bs];B[cs])}
Becoming, under the hood, this (I've added some boilerplate as well just so we have a minimal, complete example):
\documentclass{article}
\usepackage{tikz}
\newlength{\step}
\begin{document}
\begin{tikzpicture}
\setlength{\step}{\dimexpr 10cm / 18 \relax}
\draw[step=\step] (0, 0) grid (10, 10);
\draw[draw = white, fill = black, line width = 0.1mm]
(0 * 10cm / 18, 0 * 10cm / 18)
circle [radius = 0.2575cm]
node[color = white] {1};
\draw[draw = black, fill = white, line width = 0.1mm]
(1 * 10cm / 18, 0 * 10cm / 18)
circle [radius = 0.2575cm]
node[color = black] {2};
\draw[draw = white, fill = black, line width = 0.1mm]
(2 * 10cm / 18, 0 * 10cm / 18)
circle [radius = 0.2575cm]
node[color = white] {3};
\end{tikzpicture}
\end{document}
I don't really know the best way to do this. Is there a good package of this kind of thing? I suppose Regexes would do actually, is there a better way? I'm gonna update this question as I try more stuff.
In this answer, somebody helped me with a PCRE-complete way of parsing SGFs (in PHP), but I don't know to what extent that can be done in TeX. At any rate, maybe the key-value part of that recipe is useful:
/(?<key>[A-Z]+)\[(?<value>(?:\\\]|[^\]])*)\]/gy
Parsing SGFs is something I would like to include in a package I'm trying to build here. Hopefully, you can find some useful macros about building Go diagrams there.
SGF is the representation of a tree structure in text, but, for simplicity, at first, I think I would like to only parse the main branch, in files with only one branch.
(There are some SGF parsers in JS available out there — including mine and Sabaki's — maybe they're useful references, or maybe they could be invoked by TeX somehow?)
In the game of Go, since the board is much bigger than in Chess, we typically don't use coordinates but visual editors (or paper kifus) — two of the best editors are CGoban and Sabaki — in order to save games. In the end, the Smart Game Format (SGF) is the standard (it also supports other games actually), and it ends up saving the file in text format anyways.
Here's an example of an SGF file:
(;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]
RU[Japanese]SZ[19]KM[6.50]
DT[2023-12-25]
;B[pd]
;W[dd]
;B[dp]
;W[pp]
;AW[ji]AB[jj]PL[B]
;B[jq]CR[pp]LB[dd:A][jd:C][pd:B]TR[jj]SQ[ji]MA[dp])
()denote branches, and data is within[]- The label before
[]denotes the type of the data ;are node delimitersBandWare Black and White moves, respectivelyABandAWare added or edited stonesLBis a label on top of a coordinate,:Ais theAlabelCRis a circle labelTRis an triangle labelSQis a square labelMAis a cross (X) label
The whole grammar can be defined as:
Collection = { GameTree }
GameTree = "(" RootNode NodeSequence { Tail } ")"
Tail = "(" NodeSequence { Tail } ")"
NodeSequence = { Node }
RootNode = Node
Node = ";" { Property }
Property = PropIdent PropValue { PropValue }
PropIdent = UcLetter { UcLetter }
PropValue = "[" Value "]"
UcLetter = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
"J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
"S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"
Edit after @StevenB.Segletes' Answers
Following @StevenB.Segletes' listofitems package, and very helpful answers, especially this one, I've been able to create the following MWE:
\documentclass{article}
\usepackage{tikz}
\usetikzlibrary{shapes.geometric}
\usepackage{listofitems}
%-----------------------------------------------------------
% Drawing Stones
% From this answer by @DavidCarlisle.
\newcommand\notwhite{black}
\newcommand\notblack{white}
% From this answer by @DavidCarlisle.
\ExplSyntaxOn
\cs_generate_variant:Nn \int_from_alph:n {e}
\NewExpandableDocumentCommand{\stringToCoordX}{ m }{
\int_from_alph:e { \use_i:nn #1 }
}
\NewExpandableDocumentCommand{\stringToCoordY}{ m }{
\boardSize + 1 - ~\int_from_alph:e { \use_ii:nn #1 }
}
\ExplSyntaxOff
\newcommand{\setCoords}[1]{
\pgfmathsetmacro{\x}{\stringToCoordX{#1} - 1}
\pgfmathsetmacro{\y}{\stringToCoordY{#1} - 1}
}
\newcommand{\drawStoneFromSgfCoords}[2]{%
\setCoords{#2}
\draw[draw = \UseName{not#1}, fill = #1, line width = 0.1mm]
(\x * \step, \y * \step)
circle [radius = 0.2575cm];
}
\newcommand{\drawMoveFromSgfCoords}[2]{
\drawStoneFromSgfCoords{#1}{#2}
\textLabel{#1}{#2}{\themoveCounter}
\stepMoveCounter
}
\newcounter{moveCounter}
\setcounter{moveCounter}{1}
\newcommand{\stepMoveCounter}{
\stepcounter{moveCounter}
}
%-----------------------------------------------------------
% Labels
\newcommand{\textLabel}[3]{
\setCoords{#2}
\draw (\x * \step, \y * \step)
node[color = \UseName{not#1}] {#3};
}
\newcommand{\crossLabel}[2]{
\setCoords{#2}
\draw (\x * \step, \y * \step)
node[color = \UseName{not#1}] {X};
}
\newcommand{\triangleLabel}[2]{
\setCoords{#2}
\draw (\x * \step, \y * \step)
node[
isosceles triangle,
draw = #1,
line width = 0.5mm,
fill = \UseName{not#1},
minimum height = \step * 10,
minimum width = \step * 10,
rotate = 90,
isosceles triangle apex angle = 60,
inner sep = 0pt,
] {};
}
\newcommand{\squareLabel}[2]{
\setCoords{#2}
\draw (\x * \step, \y * \step)
node[
draw = #1,
line width = 0.5mm,
fill = \UseName{not#1},
minimum size = \step * 10,
inner sep = 0pt,
] {};
}
\newcommand{\circleLabel}[2]{
\setCoords{#2}
\draw[
draw = #1,
line width = 0.5mm,
fill = \UseName{not#1},
inner sep = 0pt,
] (\x * \step, \y * \step)
circle[radius = \step / 4];
}
%-----------------------------------------------------------
% SGF Parser
% From this answer by @StevenB.Segletes.
\long\def\Firstof#1#2\endFirstof{#1}
\newcommand\thecolorofB{black}
\newcommand\thecolorofAB{black}
\newcommand\thecolorofW{white}
\newcommand\thecolorofAW{white}
\newcommand\thecolorofMA{white}
\newcommand\thecolorofCR{white}
\newcommand\thecolorofTR{white}
\newcommand\thecolorofSQ{white}
\newcommand\thecolorofLB{white}
\long\def\Keytypeof#1{\csname thekeytypeof#1\endcsname}
\newcommand\thekeytypeofB{M} % black move
\newcommand\thekeytypeofAB{A} % added (edited) black stone
\newcommand\thekeytypeofW{M} % white move
\newcommand\thekeytypeofAW{A} % added (edited) white stone
\newcommand\thekeytypeofMA{K} % cross (mark) label
\newcommand\thekeytypeofCR{C} % circle label
\newcommand\thekeytypeofTR{T} % triangle label
\newcommand\thekeytypeofSQ{S} % square label
\newcommand\thekeytypeofLB{L} % text label
\ignoreemptyitems
\newcommand{\parseSgf}[1]{%
\setsepchar{;||(||)/]/[}%
\readlist*\Z{#1}%
\foreachitem \i \in \Z[]{%
\foreachitem \z \in \Z[\icnt]{%
\itemtomacro\Z[\icnt, \zcnt, 1]\KeyName
\itemtomacro\Z[\icnt, \zcnt, 2]\KeyValue
\edef\tmp{{\csname thecolorof\KeyName\endcsname}{\KeyValue}}
\if\Keytypeof\KeyName M
\expandafter\drawMoveFromSgfCoords\tmp
\fi
\if\Keytypeof\KeyName A
\expandafter\drawStoneFromSgfCoords\tmp
\fi
\if\Keytypeof\KeyName K
\expandafter\crossLabel\tmp
\fi
\if\Keytypeof\KeyName C
\expandafter\circleLabel\tmp
\fi
\if\Keytypeof\KeyName T
\expandafter\triangleLabel\tmp
\fi
\if\Keytypeof\KeyName S
\expandafter\squareLabel\tmp
\fi
\if\Keytypeof\KeyName L
\expandafter\textLabel\tmp % this macro has 1 extra argument compared to the other ones (the text itself)
\fi
}
}%
}
%-----------------------------------------------------------
% SGFs
\def\sgfA{;B[ab];W[cd]}
\def\sgfB{(
;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2024-02-05]
;B[as]
;W[bs]
;B[cs]
)}
\def\sgfC{(
;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.2]KM[6.5]SZ[19]DT[2024-02-05]
;B[as]
;W[bs]
;B[cs]
;PL[W]AB[dq]AW[eq]
)}
\def\sgfD{( % Basically sgfE with no labels (LB)
;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]RU[Japanese]SZ[19]KM[6.50]DT[2023-12-25]
;B[pd]
;W[dd]
;B[dp]
;W[pp]
;AW[ji]AB[jj]PL[B]
;B[jq]CR[pp]TR[jj]SQ[ji]MA[dp]
)}
\def\sgfE{(
;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]RU[Japanese]SZ[19]KM[6.50]DT[2023-12-25]
;B[pd]
;W[dd]
;B[dp]
;W[pp]
;AW[ji]AB[jj]PL[B]
;B[jq]CR[pp]LB[dd:A][jd:C][pd:B]TR[jj]SQ[ji]MA[dp]
% The LB data is apparently the only weird one.
% Besides having the LB[<coords>:<label>] format,
% If many of them are on the same node, there will only be one LB for all of them.
)}
%-----------------------------------------------------------
% Setup
\pgfmathsetmacro{\boardDimension}{10}
\pgfmathsetmacro{\boardSize}{19}
\pgfmathsetmacro{\step}{\boardDimension / (\boardSize - 1)}
%-----------------------------------------------------------
\begin{document}
\begin{tikzpicture}
\draw[step=\step] (0, 0) grid (10, 10);
\parseSgf{\sgfD}
\textLabel{white}{ab}{A}
\triangleLabel{black}{ac}
\squareLabel{black}{ad}
\circleLabel{black}{ae}
\crossLabel{white}{af}
\end{tikzpicture}
\end{document}
Essentially here's what's missing:
- The label key (
LB) is the only one with a weird format, out of the essential keys. In the previous answer, @StevenB.Segletes did manage to cover the formatLB[<coords>:<label>], however, when multiple labels are on the same node, it gets even weirder, because they get grouped together, without a key, e.g.LB[dd:A][jd:C][pd:B](this actually happens to every label key apparently, see the example below). - Since the labels are usually marked after a stone has been placed, I don't know how I would be able to find the appropriate color for them when parsing. When parsing, since the parser doesn't know if there's a black or a white stone underneath the label, it won't know which color to paint with. Maybe there should be a list of coords vs colors, and then the parser should check before painting the label, I don't know.
Actually, the grouping shown by LB happens to every label and edited or added stones:
(
;GM[1]FF[4]CA[UTF-8]AP[CGoban:3]ST[2]RU[Japanese]SZ[19]KM[6.50]DT[2023-12-25]
;B[pd]
;W[dd]
;B[dp]
;W[pp]
;AW[ji][jn][kn][ln]AB[jj][jm][km]TR[jp][kp][lp]PL[B]
;B[jq]CR[pp]LB[dd:A][jd:C][pd:B]TR[jj]SQ[ji]MA[dp]
)

[]and;– David Carlisle Feb 05 '24 at 14:41\foreachon each letter of the string with some sort of variable keeping track of where things start (;) and things end (])? Is there a package for this kind of thing? – psygo Feb 05 '24 at 15:47\foreach:-) I may have a go at coding something later. You could use an expl3 loop but probably I'd just use a macro that looked at#1and tested if it was(or;` and branched accordingly to call the next macro.... (simiar to the way tikz parses path expressions of \bm parses stuff or xmltex parses xml or .... – David Carlisle Feb 05 '24 at 16:06listofitemscan probably deal with even the nested case. – psygo Feb 07 '24 at 13:40ABandAW, strictly speaking. I've addedBandWto my repo, which is basically anifon top of your code to draw numbers on moves (I'll work on a MWE soon, but here's the code in the repo). Besides that, I guess I would need extraifs forCR,TR,SQ, andMA(how to draw them is also in my repo). And one lastifforLB, which I think you managed to do in your bonus edit from the last question. – psygo Feb 14 '24 at 12:39KEY[label]. But now you introduceLB[dd:A][jd:C][pd:B], in which we haveKEY[label][label][label]. How am I to interpret this new syntax? If it could be required thatLB[dd:A]LB[jd:C]LB[pd:B], I could easily fix it – Steven B. Segletes Feb 15 '24 at 15:49LB[<coords>:<label>]...[<coords>:<label>]either. And I didn't know SGF had that syntax at first, sorry. I thought everything was of the formatkey[value]. – psygo Feb 15 '24 at 16:08LBallowed this odd syntax, or could there beAB[dq][eq]? – Steven B. Segletes Feb 15 '24 at 16:29