10

I have a log file generated in unix shell, with bunch of shell/terminal color escape sequences. E.g. echo -e '\e[1;34mBLUE TEXT\e[0m' outputs "BLUE TEXT" in blue, and so on. How can insert this log file into my LaTeX document, and make sure that text in resulting PDF looks exactly the same as it did in terminal (preserving colors)?

To clarify: echo above is an example on how colored output is generated. What I need to insert is the resulting output, not the echo command itself, so \e will actually be literal escape byte with hex code1B.

Example output:

https://www.filedropper.com/typescript

Picture of how it actually looks (or you can just cat it in unix terminal):

enter image description here

2 Answers2

9

Like this, for instance.

\documentclass{article}
\usepackage{fancyvrb}
\usepackage{color}

\def\defaultcode{[0}
\def\bluecode{[1;34}
\def\redcode{[1;31}
\makeatletter
\def\e#1m{%                                                                                                                                                                   
\def\colcode{#1}%                                                                                                                                                             
\ifx\colcode\defaultcode\color{black}%
\else\ifx\colcode\bluecode\color{blue}%
\else\ifx\colcode\redcode\color{red}%
\fi\fi\fi}
\makeatother

\begin{document}
\begin{Verbatim}[commandchars=\\\{\}]
echo -e '\e[1;34mBLUE TEXT\e[0m'
echo -e '\e[1;31mRED TEXT\e[0m'
echo -e 'DEFAULT COLOURED TEXT'
\end{Verbatim}
\end{document}

enter image description here

The idea is that control codes start with \e and end with m. Therefore we pick up everything in between, and then compare this to some predefined codes. It's easy to add additional codes, for example green would be supported by

\def\greencode{[1;32}

and then

\else\ifx\colcode\greencode\color{green}

with another \fi at the end. After this, you just need a verbatim environment that allows macros. Here I used the fancyvrb package. Note that you will need to change the colour in

\ifx\colcode\defaultcode\color{black}

if your default colour is not black.

egreg
  • 1,121,712
Ian Thompson
  • 43,767
  • @egreg --- Do those extra percent symbols make a difference? – Ian Thompson Oct 12 '19 at 09:00
  • 1
    It's true that \color eventually does \ignorespaces, but those unprotected end-of-lines are wrong from a pedagogical point of view; they're unnecessary here, but only because of the implementation of the specific command. – egreg Oct 12 '19 at 09:09
  • @egreg --- Interesting; originally I included the % signs, but then I deleted them to see if they made a difference. I didn't know the detail about the implementation of \color. – Ian Thompson Oct 12 '19 at 09:12
  • Mhhh... Let's see what the OP says, but the problem is that in the test that is generated by the shell, that \e is a literal ESC ascii byte.... I will try to add an example as soon as I have a bit of time. So the problem is having a verbatim environment where a literal ESC is translated into your \e macro... – Rmano Oct 12 '19 at 09:23
  • @Rmano --- I think it may depend on the way in which the terminal output is inserted (e.g. I tried pasting something similar, and the ESC characters came out as ^[, not \e). At worst a simple search and replace would fix this issue, but it may be possible to automate the solution. As you say, we need to wait for more information from the OP. – Ian Thompson Oct 12 '19 at 09:32
  • @IanThompson It's exactly as @Rmano says: The output is already generated by shell, so \e is actually is literal escape byte 1B. – Thunderbeef Oct 12 '19 at 09:56
  • @IanThompson I also have a question: won't this cause problems with literal m letters in the output? – Thunderbeef Oct 12 '19 at 10:06
  • @Thunderbeef --- Can you post a link to a file that I can experiment with? – Ian Thompson Oct 12 '19 at 10:06
  • @Thunderbeef --- There won't be any problems with m. The macro looks for \e m. On its own, m is still just m. – Ian Thompson Oct 12 '19 at 10:07
  • 1
    @Thunderbeef --- You can try \DeclareUnicodeCharacter{001B}{\e}. It fixes the issue for me when I include a file containing escape characters using \input. – Ian Thompson Oct 12 '19 at 10:15
  • @IanThompson I though I'm supposed to use \verbatiminput here? – Thunderbeef Oct 12 '19 at 10:18
  • @Thunderbeef ---- The problem is I don't know exactly what you are doing. The example I posted works for me. If I put \include{file.log} inside a Verbatim environment, where file.log contains escape characters, then the trick with \DeclareUnicodeCharacter works for me. I can probably make this idea work for you, but I need more information. Exactly how are you including these .log files in your document? – Ian Thompson Oct 12 '19 at 10:25
  • 1
    @IanThompson I've added link with small example output file to my question. As for how to properly input text from file in this case, I don't know myself yet. Whatever works will do, the main point is to include log file, and not write its contents in LaTeX directly. I didn't even know that you can \input inside verbatim. – Thunderbeef Oct 12 '19 at 10:39
  • 1
    @Thunderbeef --- OK, that file contains a lot of control codes, not just colour switches. I have some ideas, but right now I need to take my children to the park! – Ian Thompson Oct 12 '19 at 11:09
  • @IanThompson Thanks for the help so far! – Thunderbeef Oct 12 '19 at 11:11
  • Maybe https://github.com/chriskuehl/pygments-ansi-color can be useful --- extending pygment. – Rmano Oct 12 '19 at 16:52
9

Here is an approximation of what you are asking for.

TeX doesn't provide a way to input binary files reliably, but for pdfTeX there's the \pdffiledump primitive to read a binary file, escaped as a sequence of hex numbers. We first need to preprocess this string to get a sequence of characters, each of catcode 12 (\term@preprocess).

Then we parse through this input string (\term@process), building the current output line in a token list \term@line (for technical reasons in reversed order). Depending on the current input character, elements in the output token list are either appended, removed, or printed and a new one is started.

Whenever an escape character (\x1B) occurs, the next character determines the type of escape sequence. If it is an [, what follows are a series of parameter numbers, and a final character that specifies the type of terminal output. We then branch over this type and the given parameter number to modify global variables for the current display attributes (\term@process@termout). The latter are taken into account whenever a new character is added to the current output line.

The full code:

\documentclass{article}

\usepackage{xcolor}
\usepackage{pdftexcmds}
\usepackage{mdframed}
\usepackage[utf8]{inputenc}
\usepackage{textcomp}

\makeatletter
\endlinechar=-1


% Manipulation of current output line

\newif\ifterm@cr@

\def\term@line{}

\def\term@line@push#1{
    \ifterm@cr@
        \def\term@line{}
    \fi
    \xdef\term@line{
        {{\noexpand\strut
          \noexpand\color{\term@fgcolor\ifterm@intense@!75!white\fi}
          \unexpanded{#1}}}
        \unexpanded\expandafter{\term@line}
    }
    \term@cr@false
}

\def\term@line@pop{
    \xdef\term@line{\expandafter\unexpanded\expandafter\expandafter\expandafter
        {\expandafter\@gobble\term@line}}
    \term@cr@false
}

\def\term@line@print{
    \leavevmode
    \expandafter\term@line@reverse\expandafter\@sep\term@line{}\@end
}
\def\term@line@reverse#1\@sep#2#3\@end{
    \if@empty{#3}{
        #2#1
    }{
        \term@line@reverse#2#1\@sep#3\@end
    }
}


% Display attributes

\def\term@fgcolor{}
\def\term@default@fgcolor{lightgray}
\newif\ifterm@intense@


% Input parsing

\newcount\term@param@a
\newcount\term@param@b

\begingroup
\count0=0\relax
\loop
    \lccode`\*=\count0\relax
    \lowercase{
        \expandafter\gdef\csname term@preprocess@char@\number\count0\endcsname{*}
    }
    \advance\count0 by 1\relax
\ifnum\count0<256\relax
\repeat
\endgroup

\def\term@preprocess#1#2{
    \if@empty{#1}{}{
        \csname term@preprocess@char@\number"#1#2\endcsname
        \term@preprocess
    }
}

\def\term@process#1{
    % End of input
    \if@char@eq#1\relax{
        \@@par
        \condition@true{}
    }{}
    % Escape sequence
    \if@char@eq#1\term@escape@char{
        \condition@true\term@process@esc
    }{}
    % Newline
    \if@char@eq#1\term@lf@char{
        \term@line@print
        \@@par
        \condition@true\term@process
    }{}
    % Carriage return
    \if@char@eq#1\term@cr@char{
        \term@cr@true
        \condition@true\term@process
    }{}
    \if@char@eq#1\term@backspace@char{
        \term@line@pop
        \condition@true\term@process
    }{}
    \if@char@eq#1\term@space@char{
        \term@line@push{\ }
        \condition@true\term@process
    }{}
    \if@num@range{`#1}{194}{223}{
        \condition@true{\term@process@unicode@ii#1}
    }{}
    \if@num@range{`#1}{224}{239}{
        \condition@true{\term@process@unicode@iii#1}
    }{}
    \if@num@range{`#1}{240}{244}{
        \condition@true{\term@process@unicode@iv#1}
    }{}
    \condition@false{
        \term@line@push{#1}
        \term@process
    }
}

\def\term@process@esc#1{
    \if@char@eq#1[{
        \term@process@csi
    }{
        \GenericWarning{}{Warning: Ignored unknown escape sequence of type `#1'}
        \term@process
    }
}

\def\term@process@csi#1{
    \term@param@b=-1\relax
    % Is private sequence?
    \if@char@eq#1?{
        \afterassignment\term@process@csi@
        \term@param@a=0
    }{
        \afterassignment\term@process@csi@
        \term@param@a=0#1
    }
}

\def\term@process@csi@#1{
    % Check for second parameter
    \if@char@eq#1;{
        \afterassignment\term@process@termout
        \term@param@b=0
    }{
        \term@process@termout #1
    }
}

\def\term@process@termout#1{
    % SGR parameter
    \if@char@eq#1m{
        \term@process@sgr\term@param@a
        \ifnum\term@param@b<0\else
            \term@process@sgr\term@param@b
        \fi
        \condition@true{}
    }{}
    \condition@false{
%        \GenericWarning{}{Warning: Ignored unknown terminal output sequence of type `#1'}
    }
    \term@process
}

\def\term@process@sgr#1{
    % Reset
    \if@num@eq{#1}{0}{
        \let\term@fgcolor=\term@default@fgcolor
        \term@intense@false
        \condition@true{}
    }{}

    % Bold/intense
    \if@num@eq{#1}{1}{
        \term@intense@true
        \condition@true{}
    }{}

    % Standard colors
    \if@num@eq{#1}{30}{
        \def\term@fgcolor{black}
        \condition@true{}
    }{}
    \if@num@eq{#1}{31}{
        \def\term@fgcolor{red}
        \condition@true{}
    }{}
    \if@num@eq{#1}{32}{
        \def\term@fgcolor{green}
        \condition@true{}
    }{}
    \if@num@eq{#1}{33}{
        \def\term@fgcolor{yellow}
        \condition@true{}
    }{}
    \if@num@eq{#1}{34}{
        \def\term@fgcolor{blue}
        \condition@true{}
    }{}
    \if@num@eq{#1}{35}{
        \def\term@fgcolor{magenta}
        \condition@true{}
    }{}
    \if@num@eq{#1}{36}{
        \def\term@fgcolor{cyan}
        \condition@true{}
    }{}
    \if@num@eq{#1}{37}{
        \def\term@fgcolor{white}
        \condition@true{}
    }{}
    \condition@false{}
}

% n-byte Unicode sequences

\def\term@process@unicode@ii#1#2{
    \scantokens{\csname term@line@push\endcsname{#1#2}}
    \term@process
}

\def\term@process@unicode@iii#1#2#3{
    \scantokens{\csname term@line@push\endcsname{#1#2#3}}
    \term@process
}

\def\term@process@unicode@iv#1#2#3#4{
    \scantokens{\csname term@line@push\endcsname{#1#2#3#4}}
    \term@process
}


% Helper macros

\def\if@empty#1{
    \if\relax\detokenize{#1}\relax
        \expandafter\@firstoftwo
    \else
        \expandafter\@secondoftwo
    \fi
}

\def\if@char@eq#1#2{
    \if#1#2%
        \expandafter\@firstoftwo
    \else
        \expandafter\@secondoftwo
    \fi
}

\def\if@num@eq#1#2{
    \ifnum#1=#2 %
        \expandafter\@firstoftwo
    \else
        \expandafter\@secondoftwo
    \fi
}

\def\if@num@range#1#2#3{
    \ifnum#1<#2 %
        \expandafter\@firstoftwo
    \else
        \expandafter\@secondoftwo
    \fi
    {
        \@secondoftwo
    }{
        \ifnum#1>#3 %
            \expandafter\@secondoftwo
        \else
            \expandafter\@firstoftwo
        \fi
    }
}

\def\condition@true#1#2\condition@false#3{#1}
\def\condition@false#1{#1}

\def\@temp#1#2{
    \begingroup
    \lccode`\*=`#2
    \lowercase{\global\let#1=*}
    \endgroup
}
\@temp\term@backspace@char \^^H
\@temp\term@lf@char        \^^J
\@temp\term@cr@char        \^^M
\@temp\term@escape@char    \^^[
\@temp\term@space@char     \ %


% User macros

% Print terminal session stored in #1
\newcommand\terminalinput[1]{
    \begin{mdframed}[
        backgroundcolor=black,
        innerleftmargin=0pt,
        innerrightmargin=0pt,
        innertopmargin=0pt,
        innerbottommargin=0pt
    ]
    \begingroup
    \parindent=0pt
    \frenchspacing
    \ttfamily
    \fboxsep=0pt
    \term@process@sgr{0}

    \xdef\@temp{\pdf@filedump{0}{\pdf@filesize{#1}}{#1}}
    \expandafter\xdef\expandafter\term@input\expandafter{
        \expandafter\term@preprocess\@temp{}{}
    }
    \expandafter\term@process\term@input\relax
    \endgroup
    \end{mdframed}
}

\endlinechar=`\^^M
\makeatother

\DeclareUnicodeCharacter{279C}{\textrightarrow}

\begin{document}
\terminalinput{typescript.bin}
\end{document}

outputs

enter image description here

There are still several problems with the current implementation:

  • The code only works with the pdflatex or lualatex compilers, as it relies on the \pdffiledump primitive or its reimplementation in Lua code, respectively.
  • Currently no Unicode characters are supported, because the file is decoded as a single-byte stream. Characters outside the printable ASCII range are ignored for now.
    EDIT: Unicode characters are supported now. Whenever a UTF-8 2/3/4-byte sequence is encountered, the bytes are re-mapped to their standard catcodes via \scantokens such that inputenc can do its job normally. New character mappings can be added via \DeclareUnicodeCharacter.
  • The "terminal window" currently is just a black box spanning the whole text width.
  • The current implementation only covers a small subset of all the available escape sequences/terminal output sequences. Most notably, background colors aren't handled at all, there's only a lukewarm handling of bold/intense colors (by appending !75!white to the current color), only the eight standard colors (codes 30-37) are implemented etc.
  • Cursor positioning isn't handled correctly either, e.g. carriage return (\0x0D) clears the complete line instead of only moving the cursor to the beginning of the line.
  • In terminal output sequences, only one or two parameter numbers are supported, while any number or parameters should be supported.
  • The code hasn't been tested well, actually only with the given sample file. ;-)
siracusa
  • 13,411
  • I've just tested it, and it works. Maybe you can consider making a package out of this, it seems sophisticated enough. Anyways, I have a question: You say that file is treated as binary. AFAIK, only non-text character is escape (\x1b). If we redefine it with \DeclareUnicodeCharacter, wouldn't that eliminate the need to treat input file as binary? – Thunderbeef Oct 15 '19 at 13:08
  • @Thunderbeef No, \DeclareUnicodeCharacter is unrelated to the way TeX reads an input file at low-level. The problematic characters here are \x0A and \x0D which TeX can't distinguish with its normal input mechanism, but they have different meaning in this context. – siracusa Oct 15 '19 at 13:41
  • This looks like the answer, even though I don't really understand how the code works. If there are no updates, I'll accept this tomorrow. – Thunderbeef Oct 16 '19 at 11:22
  • @Thunderbeef Do you have questions about specific parts of the code? Describing all macros in detail would be too tedious. I thought most of the code would be quite straight-forward. – siracusa Oct 17 '19 at 00:02
  • 1
    @siracusa Could you make a Pacakge out of that? – Tobias Oct 17 '19 at 09:53
  • I just tried it and it does show console output with color (even the right colors) but each line is shown multiple times and in a disorderly fashion. I am trying to display colored output from git that I grabbed like this: git show --color | head -n 20 > show.bin – eftshift0 Aug 27 '20 at 04:08