-27

First, what is the proper terminology for a "string" in TeX/LaTeX that is an argument for a macro? e.g., \macro{somestring} Obviously somestring is a token, set of tokens, string, argument, etc. I will call it a string since I do not know what else to call it. I'm not thinking it is another macro but sort of "inert" text.

What I want to be able to do is parse a string that represents macros in LaTeX but codified in a very simple way:

The best way to describe it is to give an example

The follow string

x4y3[cgreen, t4[-3,4,a90]]

represents setting properties

x-position = 4
y-position = 3
color = green

test = 4 with x offset -3, y offset 4 and angle 90
[ ] = optional (optional list of optional specifiers in no particular order)

It should be VERY self explanatory how I went from the condensed string to the properties it represents. If not stare at the string for a few hours. You should be able to read the string and tell exactly what it means after knowing what the indicator symbols are. (this is no different than what xparse does with it's specification string except mine is a little more complex)

Now the hard part, I want to be able to parse that string into a series of macros that set data.

As you can probably guess it, the string represents something to do with placing a graphic at a location.

If you want to have some interpretation for

x4y3[cgreen, t4[-3,4,a90]]

to make you feel better: Place a green circle at coordinates(4,3) with a piece of text "4" offset (-3,4) from it and rotated 90 degrees.

To do this cleanly and efficiently I would like to create macros that are called for each corresponding macro in the string

\def\x#1{}
\def\y#1{}
\def\c#1{}
\def\t#1#2{}

So we could see the string as simply calling \x{-3}\y{4}\c{green}\t{4}{-3,4,a90}

So as you can see the string is just a "condensed" version of a macro. The string simply is a command value pair with possible optional arguments.

One could almost simply go in and add \'s and brackets to get the command.

For example,

pseudo-code:

parse_string{
    foreach token in command-list
        write '\'token'{'next_token'}'    
}

But I would like to add commands/macros that have more than one character (the smallest token size would take precedence) AND do other things like default values.

Anyways, as usual this is relatively straightforward in most procedural programming languages but I'm sure I'll spend the next few weeks trying to figure out how to do it in TeX/LaTeX and then 10k lines of code later I'll probably end up with a buggy as hell version.

egreg
  • 1,121,712
Uiy
  • 6,132
  • 45
    It should be VERY self explanatory., If you want to have some interpretation for x4y3[cgreen, t4[-3,4,a90]] to make you feel better: etc. Do you really need such a tone? – percusse Mar 16 '12 at 20:59
  • 6
    Well, I'm pretty sure that nobody is going to give you a complete answer. What you describe is a programming language/markup and what you want is to have a translator of this language written in LaTeX, which is a Task. I would recommend that you pre-process your codes in some other software that would produce the desired LaTeX code. In my humble opinion, you should really well know what do you do (I mean, you should have at least some knowledge on formal gramatics and translators). – yo' Mar 16 '12 at 21:25
  • 1
    This is already done in latex. It is not impossible. The gchords package does this to represent strings on a chord. It is also not difficult programmatically. one does not need a full blown context sensitive grammar to parse what I am doing. The grammar is very simple, context free, etc. I could write up a parser in a few mins in C#. – Uiy Mar 16 '12 at 21:36
  • 2
    Then I say: better do it in C#. You can make it like \mymacro{x2y3[...]} that gets recognized by your program, and appends its correct interpretation in some environment, like \begin{myenv}...\end{myenv}. As well, you have to remove all myenv from previous run of your C# program. In LaTeX, you can put \def\mymacro#1{} so that the orinigal snippet gets ignored and \newenvironment{myenv}{}{} so that the myenv contents get processed. This is how I would do it myself, just in C++ instead of C#. – yo' Mar 16 '12 at 21:49
  • ? You mean write a preparser in C++/C# or that latex has some way to associate a macro with a compiled C++/C# code? I really don't want to write a parser as there can be too many pitfals(although in this case it probably would be easier than trying to do it in latex) – Uiy Mar 16 '12 at 22:03
  • 1
    I don't know about any real "association" of macro with a C# code. I say that you can search your file for \mymacro{...} in your C# program. An I really don't understand what you want. You want to parse some code but you don't want a parser? – yo' Mar 16 '12 at 22:21
  • 27
    @Uiy: Please note that this site is considered in "the better part of the Internet". Please don't post just some "stuff" here. Keep your posts and comments short, meaningful and polite. – Martin Scharrer Mar 17 '12 at 00:29
  • 6
    @Uiy: After seeing your other questions now I like to recommend learning a little TeX (the underlying engine of LaTeX) and how it works. See e.g. The TeXBook or TeX for the Impatient. Note that while TeX is Turing-complete it is not meant as a general purpose programming language and lacks a lot of features taken for granted in software programming language, especially modern ones. Have a look at the (closed!) question Why are there no alternatives to TeX, or, why is TeX still used?. Your questions remind me on it a little bit. – Martin Scharrer Mar 17 '12 at 00:41
  • @MartinScharrer The biggest problem is I already know too much!! ;) I've programmed in over 15+ programming languages(pascal, assembly, C\C++\C#, js, vb, etc...) exluding all these little scripting languages last thing I want to do is learn something else! Everything starts to bleed together at some point. Now TeX isn't a programming language and it makes it even worse since it seems to use "hacks" from my perspective. If I want to add two numbers, for example, I should just be able to do a+b and not \FPadd\result{a}{b}. What I've learned is leave typesetting to TeX and do programming in Lua. – Uiy Mar 21 '12 at 12:46
  • 19
    @Uiy: I know the too-many-programming-langages syndrome myself. You should always keep in mind that TeX is a Turing-complete document preparation language, not a general purpose programming language. If you write "a+b" it will simply typeset "a+b" in the output document. This makes perfectly sense! It doesn't provide a full mathematic evaluation mode because it doesn't have to. Have a look at the pgfmath package of PGF/TikZ which actually adds this. – Martin Scharrer Mar 21 '12 at 12:54

2 Answers2

42

First there is no notion of a string as we understand it in other computer languages in TeX. TeX works using tokens and boxes. Each token is then mapped onto a glyph or series of glyphs or does other operations. So how does one parse a string like the following and build commands?

x4y3[cgreen, t4[-3,4,a90]]

As with any computer language you will need to define your grammar and UI first, which you have partially done, but let us assume for a minute that you would like to do this on a letter by letter basis. In this case one can borrow macros from the LaTeX2e kernel (texdoc source2e for a general view) or some packages (the soul package has some good parsers). Let us use rather the kernel's macro @tfor

\makeatletter
  \@tfor \i:= x4y3[cgreen, t4[-3,4,a90]]\do{%
     \i
  }
\makeatother

The above iterates letter by letter over the token list and saves it in a variable \i. I have used \i to make it look more familiar. However note \i is pre-defined by TeX as the dotless i. Assuming your music package has nothing to do with a dotless i, you can throw it away or use another variable like \next.

You can also parse using delimited argument macros.

\def\amacro x#1y#2[#3]{
     ... do something with #1, #2 and #3
   }

When you call the above macro as \amacro x4y3[cgreen, t4[-3,4,a90]], the values of #1, #2 and #3 will be respectively 4, 3 and cgreen, t4[-3,4,a90, which you can then parse further along the line, using the same method.

Now that you have got the first values of the parameters, you can use them in macros. Let us define the \x, \y macros (I have renamed them as \pos@x and pos@y just in case the \x and \y were defined somewhere else..

\def\pos@x#1{This is position x1}

This should have been here,

\def\amacro x#1y#2[#3]{
     \def\pos@x{#1}
     \def\pos@y{#2}
     .... parse further the #3
   }

As you can observe the pos@x and pos@y macros grabbed their value from the mother of all macros, while it was defined. With a lot of patience and perseverance you stitch the parser together piece by piece. Talking about patience, we try to avoid flames here, such as "It should be VERY self explanatory". I would appreciate it if you can modify it or delete it.

yannisl
  • 117,160
19

You can parse strings by using a macro with a delimited parameter text. A simple example (which creates a compiler error if the input is not as expected) would be as follows:

\def\parsestuff x#1y#2[#3, t#4[#5,#6,#7]]{%
  \setx{#1}%
  \sety{#2}%
  \setcolor{#3}%
  \settest{#4}{#5}{#6}{#7}%
}

\parsestuff x4y3[cgreen, t4[-3,4,a90]]

You might want to use a wrapper function so that the string can be enclosed in braces, like:

\newcommand*\mywrapper[1]{\parsestuff #1}
\mywrapper{x4y3[cgreen, t4[-3,4,a90]]}

This can be improved on, but I think you can get the idea. Note that the delimiting characters must appear in the exact given form with the same catcodes as during the macro definition. For example, the trailing ]] must not have a space between them. If you want to relax this conditions you need to parse the string in multiple steps where a internal macro reads the inner [ ] etc.

It is also possible to parse an input string token by token. I do this in my tikz-timing package where a timing sequence is given by characters and turned into TikZ-code. Actually, TikZ itself does a form of string parsing to implement its syntax. Another example is my ydoc bundle where I parse tokens to format LaTeX macros in package/class documentations. Both use a state machine to implement the string parser. Check out the code if you are interested.

Martin Scharrer
  • 262,582