20

I want to make a pdf book on Linux manual pages for C functions using LaTeX.

For example man malloc looks as bellow -

man ls output I want this type of style in my pdf book. Specific color is not mandatory, but the color must be matched with terminal color.

alhelal
  • 2,451
  • 9
    why use latex rather than groff since the man pages are all marked up for the (g)roff typesetting system? (Although convertors exist eg google for man2latex) – David Carlisle Sep 10 '17 at 14:06
  • 2
    Oh and also: the final “style” depends on your pager and terminal; it's not entirely under the control of man. For example, try PAGER=cat man malloc, PAGER=more man malloc, and PAGER=most man malloc (if you have most installed). You can also vary the LESS_TERMCAP_md (etc.) termcap settings, to change your terminals interpretation of what to do when it's asked to display something “bold” (whether to show it in a particular colour, etc). – ShreevatsaR Sep 10 '17 at 15:30
  • 6
    I just want to say that you have to be very careful with the licenses of the man pages because your book will have to respect every license of used man page and sometimes (if you will not have only one license) the licenses are not compatible. – koleygr Sep 10 '17 at 18:35
  • A lot of GNU utilities have info pages, which are done through texinfo with more detailed documentation. – Batman Sep 10 '17 at 23:24
  • @ShreevatsaR: It's also quite convenient (or at least I've found it so) to use none of those pagers, but instead write a macro that formats the page and dumps it into your editor-of-choice. – jamesqf Sep 11 '17 at 04:46
  • Just out of curiosity: what terminal are you using? Can you mention your exact terminal (and settings within that program) that led to that display / those colours? At a similar column width, I actually get different line breaks in my terminal for the last paragraph (in your screenshot), so I'm curious… – ShreevatsaR Sep 11 '17 at 15:54
  • @ShreevatsaR terminal type xterm, default terminal size 150columns, 100rows, text color=B58900 background color=FDF6F3, bold color = same as text color – alhelal Sep 12 '17 at 14:06

3 Answers3

46

If you did not know, man pages are already written in a typesetting system that is even older than TeX. This is the *roff family: at different points in history, related programs have gone by names like RUNOFF, runoff, roff, nroff, troff, ditroff, groff, etc. If you're using a Unix-like system (a good guess as you're asking about man pages), you probably have groff installed on your system already, and possibly under names like troff and nroff.

In fact, man pages are written in a macro package for the *roff typesetter. If you type man -d malloc, you get some debugging information: on my computer the last line shows what command it would have run, to (typeset and) display the malloc man page:

cd '/usr/share/man' && (echo ".ll 18.3i"; echo ".nr LL 18.3i"; /bin/cat '/usr/share/man/man3/malloc.3') | /usr/bin/tbl | /usr/bin/groff -Wall -mtty-char -Tascii -mandoc -c | (/usr/bin/less -is || true)

This shows that the file /usr/share/man/man3/malloc.3 is passed through first the tbl preprocessor (which deals with tables), and then is formatted by groff for display on screen. The “input” /usr/share/man/man3/malloc.3 file itself has instructions like this:

The
.Fn malloc
function allocates
.Fa size
bytes of memory and returns a pointer to the allocated memory.

This is analogous to writing (in some hypothetical LaTeX package) something like

The \functionname{malloc} function allocates \functionarg{size} bytes of memory and returns a pointer to the allocated memory.

This is “typeset” by the preprocessors ending with /usr/bin/groff -Wall -mtty-char -Tascii -mandoc -c (definitions of these .Fn and .Fa *roff macros, and the fact that function names should be typeset in bold and arguments should be underlined, are in the manpage-related macro package) and this is why it ends up on screen with appropriate bold and underlines as:

line from malloc man page


Therefore: if you want to generate PDF instead, you just have to change the output format. This you can do in multiple ways (unfortunately there are some differences in the output so you may want to try each of them and pick the one you like the most):

man -t malloc > malloc.ps
ps2pdf malloc.ps

or (yes there exist programs other than TeX that can generate DVI files!)

groff -T dvi -m mandoc '/usr/share/man/man3/malloc.3' > malloc.dvi
dvipdfmx malloc.dvi

or:

groff -T ps -m mandoc '/usr/share/man/man3/malloc.3' > malloc.ps
ps2pdf malloc.ps

or on Linux, you may need:

zcat '/usr/share/man/man3/malloc.3.gz' | tbl | groff -T ps -m mandoc > malloc.ps
ps2pdf malloc.ps

or variants where you use a different converter from PS to PDF or from DVI to PDF. Then you can include the PDF directly into your LaTeX document; you can search on this site for many ways of doing that. If you don't like the page margins, line lengths etc., there are ways you can specify them to groff.

Another alternative is to use the mandoc program, which understands the source format of man files:

zcat '/usr/share/man/man3/malloc.3.gz' | mandoc -T pdf > malloc.pdf

or

zcat '/usr/share/man/man3/malloc.3.gz' | mandoc -T html > malloc.html
# convert from html to pdf or to latex in your preferred way

Note that a conversion to html opens up various possibilities for converting into LaTeX. For example, you can use pandoc. Here is an example that matches some aspects of the display in your screenshot:

  • all bold text to displayed in red (as your terminal evidently does)
  • the background is not white
  • instead of italics, underlines are used (your terminal does that because it does not have an italic font; you may consider whether you want to match that in PDF: underlining is usually considered poor typography)

Create mancolours.tex containing:

\usepackage{pagecolor}

% Set background colour (of the page)
\definecolor{weirdbgcolor}{HTML}{FCF4F0}
\pagecolor{weirdbgcolor}

% Make bold text appear in a particular colour
\definecolor{boldcolor}{HTML}{6E0002}
\let\realtextbf=\textbf
\renewcommand{\textbf}[1]{\textcolor{boldcolor}{\realtextbf{#1}}}

% Use underlines instead of emphasis (ugh)
\renewcommand{\emph}[1]{\underline{#1}}

% % Use fixed-width font by default
% \renewcommand*\familydefault{\ttdefault}

and then:

zcat '/usr/share/man/man3/malloc.3.gz' | mandoc -T html > malloc.html
pandoc -s -o malloc.tex --include-in-header=mancolours.tex malloc.html
pdflatex malloc.tex

This produces stuff like:

matching the screenshot closely


Finally, if none of these are satisfactory, you can look at the source of the man page and write your own tool for translating *roff macros into whatever LaTeX macros you'd like as equivalents. There aren't too many of those, so this should be reasonably doable. (There are some scripts online where people have written similar translators, but I tried a couple and neither worked well enough. So it would be better to write your own.) You may also consider operating on the output from mandoc -Thtml or mandoc -Ttree, if you find those easier.


Yet another option, if you want to match formatted terminal output exactly, is dumping that to a file along with the formatting. When you run man malloc, the pager invoked is most likely something like less. If you dump to a file everything that is displayed, and open the file in a decent editor, you'll see how the terminal does it:

 The m^Hma^Hal^Hll^Hlo^Hoc^Hc() function allocates _^Hs_^Hi_^Hz_^He bytes of memory and…

(the actual character in the file is byte 8, I have changed it to the two characters ^H above so that you can see it). So: to make a character bold, it prints the character, then ^H, and then the character again. To make something underlined, it prints a _, then ^H and then the character. (These make sense if you imagine that ^H acted like moving backwards, and overprinting a character on itself made it bold — this is actually how things worked at some point historically.) On top of that, your terminal preferences get applied, for how it displays such bold and underlined characters.

So, now that you have this file, you can extract the formatting in it, into a format suitable for LaTeX. For example, with the following Python script I turn those into \bold{...} and \underline{...} respectively (man malloc happens to contain no backslashes, but if it did you'd probably want to replace those too):

import re
import sys

def parseFormatting(text):
    """Detects 'bold' and 'underlined' characters in the output to a terminal."""
    chars = [(c, '') for c in text]
    while True:  # Detect bold characters
        m = re.search('(.)\x08\\1', ''.join(c[0] for c in chars))
        if not m: break
        s = m.start()
        chars[s : s + 3] = [(chars[s + 2][0], 'bold')]
    while True:  # Detect underlined characters
        m = re.search('_\x08.', ''.join(c[0] for c in chars))
        if not m: break
        s = m.start()
        chars[s : s + 3] = [(chars[s + 2][0], 'underline')]
    i = 0
    while i < len(chars):  # Collapse runs of identical formatting (for efficiency later)
        j = i
        while j < len(chars) and chars[j][1] == chars[i][1]: j += 1
        chars[i : j] = [(''.join(chars[k][0] for k in range(i, j)), chars[i][1])]
        i += 1
    return chars

def parseFileReplaceFormatting(filename):
    text = open(filename, 'rb').read().decode('utf-8').split('\n')
    newtext = ''
    for line in text:
        for c in parseFormatting(line):
            if c[1] == '':
                newtext += c[0]
            elif c[1] == 'bold':
                newtext += '\\bold{%s}' % c[0]
            elif c[1] == 'underline':
                newtext += '\\underline{%s}' % c[0]
            else: assert False, ('Unknown formatting', c[1], 'for', c[0])
        newtext += '\n'
    return newtext

if __name__ == '__main__':
    infilename = sys.argv[1]
    outfilename = sys.argv[2]
    updated = parseFileReplaceFormatting(infilename)
    with open(outfilename, 'wb') as f:
        f.write(updated.encode('utf-8'))

So after running the above script with something like:

python malloc.man.less malloc.man.less.py2

you can process (\input) the resulting file with TeX. If you wish, you can even preserve line-breaks and whatever crude hyphenation-and-justification your terminal did! (Of course by doing this you lose all the benefits of TeX's beautiful line-breaking algorithm, but you get to match the terminal output exactly.) You just have to make sure that the width of your pages and your terminal are roughly compatible:

\documentclass{article}

\usepackage[paperwidth=11in, textwidth=10in, textheight=4in, paperheight=5in]{geometry}

\usepackage{fontspec}
\setmainfont{Consolas}

\usepackage{xcolor}
% Set background colour (of the page)
\definecolor{weirdbgcolor}{HTML}{FCF4F0}
\usepackage[pagecolor=weirdbgcolor]{pagecolor}
% Make bold text appear in a particular colour
\definecolor{boldcolor}{HTML}{6E0002}
\newcommand{\bold}[1]{\textcolor{boldcolor}{\textbf{#1}}}

\begin{document}
% Foreground colour
\definecolor{fgcolor}{HTML}{A57716}
\color{fgcolor}

\def\nextline{\null\par} % \null so that a blank line in input (two consecutive newlines) becomes an empty paragraph.
{\catcode`\^^M=\active \def^^M{\nextline} \catcode`#=12 \catcode`_=12 \catcode32=12\relax\input{malloc.man.less.py2}}

\end{document}

page 1 of output generated from above TeX document

(You can tell that the above output was generated by TeX because of the black page number at the bottom!)

ShreevatsaR
  • 45,428
  • 10
  • 117
  • 149
  • /usr/share/man/man3/malloc.3 file not found. /usr/share/man/man3$ ls | grep malloc results malloc.3.gz malloc_get_state.3.gz __malloc_hook.3.gz malloc_hook.3.gz malloc_info.3.gz __malloc_initialize_hook.3.gz malloc_set_state.3.gz malloc_stats.3.gz malloc_trim.3.gz malloc_usable_size.3.gz – alhelal Sep 10 '17 at 16:20
  • your first option of the solution create black and white pdf. How can I change the color? – alhelal Sep 10 '17 at 16:29
  • @BandaMuhammadAlHelal You need to use the actual path that exists on your system: try zcat '/usr/share/man/man3/malloc.3.gz' | tbl | groff -T ps -m mandoc > malloc.ps. Also, as I said in a comment on the question, the colours are determined by your terminal, not by man. So if you need different colours, you should probably convert to an intermediate format like html (or dvi if you're familiar with that format) and hack it to display "bold" in a different colour for instance. – ShreevatsaR Sep 10 '17 at 16:36
  • give the full image not only for description part. Your answer is preferred . – alhelal Sep 11 '17 at 05:07
  • yes, underlining is usually considered poor typography. But, I don't know how to show italic in terminal. – alhelal Sep 11 '17 at 05:26
  • Some terminals support italics just fine, man just doesn't know how to use that yet... @ShreevatsaR, does the tool you mention (mandoc) support both the man and mdoc macro packages? Other sites say it's optimized for BSD mdoc, and handles Linux man poorly. – user1686 Sep 11 '17 at 07:17
  • If groff is installed, mandb man (the one usually found on Linux systems) can be used as a frontend to groff (man -Tps, man -Tdvi, man --html/man -H, etc.). More convenient that way. – muru Sep 11 '17 at 08:46
  • @ShreevatsaR your yet another option is better. if you provided a short script finally that do all for yet another option, it would helpful for beginner like me. Thank you. – alhelal Sep 16 '17 at 16:42
  • @ShreevatsaR italic is better over underlined. If you will provide me a image with italic then I will edit my question with your image. – alhelal Sep 16 '17 at 16:44
  • @BandaMuhammadAlHelal The “script” is partly manual: (1) In a terminal, run man malloc. It will show the man page. Hit s and a filename, say malloc.man.less. Quit man with q. (2) Save the above Python script (the one in the answer) to a file, say replaceFormatting.py. If you prefer italic, replace \underline in the Python script with \textit. (3) Run the Python script on that file, as python replaceFormatting.py malloc.man.less malloc.man.less.py2. (4) Compile the above file with xelatex (after replacing Consolas with your preferred monospaced font, and adjusting paper size). – ShreevatsaR Sep 16 '17 at 17:08
  • @ShreevatsaR my man pages color are vanished. bold text are not shown. Why this is happened I don't understand? see this – alhelal Sep 18 '17 at 12:57
  • And text aren't spreaded over the terminal. – alhelal Sep 18 '17 at 13:59
  • 2
    @BandaMuhammadAlHelal That's not a question about TeX/LaTeX :-) But if I had to guess, I'd say your terminal's pager settings have changed; try restarting your terminal or computer or whatever. Beyond that I don't know, sorry. – ShreevatsaR Sep 18 '17 at 17:24
  • @ShreevatsaR yes it is not TeX/LaTeX question. But I guess it is happened by doing things provided in these comments and answers of this question. – alhelal Sep 19 '17 at 00:42
  • @ShreevatsaR using the mancolours.tex solution for manpage of ber_alloc_t results error. see this . This don't support unicode. If we support unicode is there any problem? – alhelal Dec 28 '17 at 08:36
  • 1
    @alhelal It's just U+2002 (en space); you can just write a definition for it, or replace the character in the .tex file. Or just compile with xelatex or lualatex (instead of pdflatex). Any of the standard solutions for dealing with Unicode characters will work; I don't think there's any special problem here. – ShreevatsaR Dec 28 '17 at 08:43
  • @ShreevatsaR would you enter in chat? – alhelal Dec 28 '17 at 08:56
  • @ShreevatsaR Your solution create COLOPHON section, but man man describes that COLOPHON is not man page section. see this – alhelal Dec 28 '17 at 09:07
  • @ShreevatsaR You don't need renewcommand emph as underline is bad typography. Please, edit your mancolours.tex. You can add \hypersetup{breaklink=false} also. – alhelal Dec 28 '17 at 17:17
  • 1
    @alhelal I have given you some basic ideas that answer the question originally asked; it is somewhat unreasonable to provide “support” here for every single thing that you may encounter during your project. E.g. Unicode has nothing particularly to do with man pages; it's a general question about TeX. Similarly, nothing in this answer inserts a “colophon” section; if you get such a section, it's there because it's there in your manpages. Finally, about underlines, I already mentioned in the answer that it's “considered poor typography”—just shown how you can achieve them if you really want. – ShreevatsaR Dec 28 '17 at 17:25
  • @ShreevatsaR actually I say you because of I always use this as reference and copy the mancolours.tex from here. And, obviously I forget to remove renewcommand of emph each time. For this my test compilation time is lost. Thank you. – alhelal Dec 28 '17 at 17:30
  • 1
    @alhelal If you think this workflow (for generating PDF from man pages, in terminal style) will be useful to others (or even yourself in future), how about writing a blog post (or your own answer to this question), in which you describe all the changes you made, for all the assorted things that came up, etc.? You could keep editing it as you change or mind or encounter new concerns. Then there would be a better reference for you and for anyone who wanted to do the same thing. You could copy from that reference (your answer or blog post) instead of copying from this (old) answer. – ShreevatsaR Dec 28 '17 at 17:41
14

May be man2html plus pandoc could be a simple good start:

$ zcat '/usr/share/man/man3/malloc.3.gz' | man2html > malloc.html
$ pandoc -s -f html -t latex  malloc.html -o malloc.tex

But if you do not need modify the LaTeX source, you can export directly to PDF:

$ pandoc -f html -t latex malloc.html -o malloc.pdf

mwe

Then, with a new preamble (and removing some first lines):

mwe

\documentclass[10pt]{hitec}
\usepackage[tmargin=.5in]{geometry}
\usepackage[english]{babel}
\settextfraction {1}
\setlength\leftmarginwidth{4em}
\setlength\textwidth{.84\paperwidth}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[colorlinks]{hyperref}
\usepackage{longtable,booktabs}
\usepackage{parskip}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\setlength{\emergencystretch}{3em}  % prevent overfull lines
\setcounter{secnumdepth}{0}
\usepackage{xcolor}
\definecolor{texto}{HTML}{801c35}
\definecolor{fondo}{HTML}{FDF6F3}
\definecolor{textob}{HTML}{BB8B04}
\pagecolor{fondo}\color{textob}
\let\oldbfseries\bfseries
\def\bfseries{\color{texto}\oldbfseries}
\def\textbf#1{\textcolor{texto}{\oldbfseries #1}}
\pagestyle{empty}
\title{Man page of MALLOC}

\begin{document}

\section{MALLOC}\label{malloc}
\subsection{NAME}\label{name}

malloc, free, calloc, realloc - allocate and free dynamic memory

... % remaining text is not changed 

\end{document}
Fran
  • 80,769
1

man -w man returns the path to the file that stores the man page of the command man.

On my machine , man -w man returns /usr/share/man/fr/man1/man.1.gz

so you can easily combine this command with others

man -l -Tdvi $(man -w man) > man.dvi && dvipdf man.dvi && rm -f man.dvi && xpdf man.pdf

explanation: 1. man -l -Tdvi $(man -w man) > man.dvi Ask man to tell groof to produce a dvi file of the manpage man

2.dvipdf man.dvi We tell dvipdf to create man.pdf for the man.dvi file

  1. rm -f man.dvi We remove the tempory man.dvi file

  2. xpdf man.pdf We use xpdf to display the file. I've used it here cause I guess most systems will have it by default.

you can use okular on kde system okular man.pdf

gilles
  • 11