5

I have to generate a number of tables from CSV data which span a couple of hundred pages or more. So, I need some kind of automation for this.

Please note the following salient points:

  1. The data has two main parts in each row, some kind of name in the first column followed by data in the rest of the columns.
  2. The number of columns in the CSV data varies from one table to another. But the number remains fixed in a single table.
  3. The data is generated automatically in a software application. The number of rows in a table is variable and may span multiple pages. Hence, I need to use longtable.
  4. A variable number of columns are grouped under the main categories. That is the number of sub-categories under each category is not fixed.
  5. Since the application generating this data is also local (own-built), the data format can be customized (up to a certain extent).
  6. The number of rows in a table will vary from table to table and may (vertically) exceed one page.
  7. A number of leftmost columns will have to be repeated when the table is split horizontally.

The CSV file will look something like this.

Name,4:Category 1,3:Category 2,6:Category 3,8:Category 4,5:Category 5,7:Category 6
Sub-category,Sc11,Sc12,Sc13,Sc14,Sc21,Sc22,Sc23,Sc31,Sc32,Sc33,Sc34,Sc35,Sc36,Sc41,Sc42,Sc43,Sc44,Sc45,Sc46,Sc47,Sc48,Sc51,Sc52,Sc53,Sc54,Sc55,Sc61,Sc62,Sc63,Sc64,Sc65,Sc66,Sc67
Name 1,11,12,13,14,21,22,23,31,32,33,34,35,36,41,42,43,44,45,46,47,48,51,52,53,54,55,61,62,63,64,65,66,67
Name 2,11,12,13,14,21,22,23,31,32,33,34,35,36,41,42,43,44,45,46,47,48,51,52,53,54,55,61,62,63,64,65,66,67

Each number, m preceding the Category n indicates the number of sub-categories under that category. As such, 4:Category 1 indicates that there are 4 sub-categories under Category 1.

Though the above one is more or less a pure CSV input, putting some kind of predefined macro command around each row, and an environment around the whole table is also possible. So, the input for one table could be like:

\begin{csvwidetable}{1}
  \csvwidetableheader{Name,4:Category 1,3:Category 2,6:Category 3,8:Category 4,5:Category 5,7:Category 6}
  \csvwidetablescheader{Sub-category,Sc11,Sc12,Sc13,Sc14,Sc21,Sc22,Sc23,Sc31,Sc32,Sc33,Sc34,Sc35,Sc36,Sc41,Sc42,Sc43,Sc44,Sc45,Sc46,Sc47,Sc48,Sc51,Sc52,Sc53,Sc54,Sc55,Sc61,Sc62,Sc63,Sc64,Sc65,Sc66,Sc67}
  \csvwidetablerow{Name 1,11,12,13,14,21,22,23,31,32,33,34,35,36,41,42,43,44,45,46,47,48,51,52,53,54,55,61,62,63,64,65,66,67}
  \csvwidetablerow{Name 2,11,12,13,14,21,22,23,31,32,33,34,35,36,41,42,43,44,45,46,47,48,51,52,53,54,55,61,62,63,64,65,66,67}
\end{csvwidetable}

Here 1 as environment argument indicates that 1 left most column will have to be repeated.

Please see the code below which I believe should be able to explain the effect I am trying to achieve.

\begin{document}

\newlength{\ncw}
\setlength{\ncw}{10.00mm}


If we resort to putting each row of CSV in a single row of the table,
it will not fit. Moreover, we will need to find a way to put those
headers like `Category 1` over multiple columns.

\begin{longtable}[l]{|l|*{33}{C{\ncw}|}}
  \hline
  Name&\multicolumn{4}{|c|}{Category 1}&\multicolumn{3}{c|}{Category 2}&\multicolumn{6}{c|}{Category 3}&\multicolumn{8}{|c|}{Category 4}&\multicolumn{5}{c|}{Category 5}&\multicolumn{7}{c|}{Category 6}\\\hline
  Sub-category&Sc11&Sc12&Sc13&Sc14&Sc21&Sc22&Sc23&Sc31&Sc32&Sc33&Sc34&Sc35&Sc36&Sc41&Sc42&Sc43&Sc44&Sc45&Sc46&Sc47&Sc48&Sc51&Sc52&Sc53&Sc54&Sc55&Sc61&Sc62&Sc63&Sc64&Sc65&Sc66&Sc67\\\hline
  \endhead
  \hline
  \endfoot
  Name 1&11&12&13&14&21&22&23&31&32&33&34&35&36&41&42&43&44&45&46&47&48&51&52&53&54&55&61&62&63&64&65&66&67\\\hline
  Name 2&11&12&13&14&21&22&23&31&32&33&34&35&36&41&42&43&44&45&46&47&48&51&52&53&54&55&61&62&63&64&65&66&67\\
\end{longtable}

As of this moment, we employ some  pre-fixed heuristics in the
application to break columns at  convenient points. We need to
repeat the left column. See below for examples.

In this table, I can accommodate up to Category 3.

\begin{longtable}[l]{|l|*{13}{C{\ncw}|}}
  \hline
    Name&\multicolumn{4}{|c|}{Category 1}&\multicolumn{3}{c|}{Category 2}&\multicolumn{6}{c|}{Category 3}\\\hline
  Sub-category&Sc11&Sc12&Sc13&Sc14&Sc21&Sc22&Sc23&Sc31&Sc32&Sc33&Sc34&Sc35&Sc36\\\hline
  \endhead
  \hline
  \endfoot
  Name 1&11&12&13&14&21&22&23&31&32&33&34&35&36\\\hline
  Name 2&11&12&13&14&21&22&23&31&32&33&34&35&36\\
\end{longtable}

The rest (Category 4--Category 6) goes in this table.

\begin{longtable}[l]{|l|*{20}{C{\ncw}|}}
  \hline
    Name&\multicolumn{8}{|c|}{Category 4}&\multicolumn{5}{c|}{Category 5}&\multicolumn{7}{c|}{Category 6}\\\hline
  Sub-category&Sc41&Sc42&Sc43&Sc44&Sc45&Sc46&Sc47&Sc48&Sc51&Sc52&Sc53&Sc54&Sc55&Sc61&Sc62&Sc63&Sc64&Sc65&Sc66&Sc67\\\hline
  \endhead
  \hline
  \endfoot
  Name 1&41&42&43&44&45&46&47&48&51&52&53&54&55&61&62&63&64&65&66&67\\\hline
  Name 2&41&42&43&44&45&46&47&48&51&52&53&54&55&61&62&63&64&65&66&67\\
\end{longtable}

\end{document}

Here is the output from the above code.

enter image description here

So, what we are trying to achieve is to generate long tables from CSV data, break columns at convenient columns when the width exceeds text width (without breaking at mid of any Category n), and (optionally) repeat some the left-most column.

(Please ignore the issue of vertical lines. I know that vertical lines in table did not have much good time in this forum.)

Masroor
  • 17,842
  • Related: https://tex.stackexchange.com/questions/134381/ – Dr. Manuel Kuehner Jan 19 '18 at 06:40
  • @Dr.ManuelKuehner No, the solution provided there manually splits the table into two tables. For my more than hundred tables, I am looking for a solution which splits at convenient points when the width exceeds the current text width. – Masroor Jan 19 '18 at 08:18
  • 1
    Yes, it's related but not a duplicate. I wanted to provide a small contribution after you did not get any response for over 12 hours. Nothing more. And maybe the manual solutions activate your creativity and help to find a solution that fits your needs. – Dr. Manuel Kuehner Jan 19 '18 at 08:52
  • @Dr.ManuelKuehner Thanks for your input. Actually, I am facing this problem for more than year. The data I mentioned need to be processed on a regular basis. And I am not liking the manual solution I have got to provide every time – Masroor Jan 20 '18 at 02:42
  • IMO for machine-generated output there is not much to gain from using the meant-for-humans conveniences of LaTeX (as opposed to doing the typesetting directly with TeX commands). So I disagree with “Hence, I need to use longtable”. BTW, (1) are you willing to use LuaTeX, and (2) the decision on which columns to keep and where to split — are you willing to do it manually? – ShreevatsaR Jan 21 '18 at 06:03
  • @ShreevatsaR I am confused about your comment on longtable. When there more than hundred rows in a table, what else do you propose I use. About LuaTeX, no problem if the code can be embedded inside my LaTeX file. (Though, my coding experience with LuaTeX is zero.) About the second one, if by columns to keep you mean the leftmost columns to be repeated, yes, we can do that manually. About where to split, no we need to do that automatically. Actually, the last issue is the whole essence of my question. – Masroor Jan 21 '18 at 06:39
  • 1
    You should imho do it with boxes and not with tabular or longtable. – Ulrike Fischer Jan 21 '18 at 18:31
  • @UlrikeFischer The actual scenario where it will be used will require a number of tweaks and last minute adjustment about appearance. If the same (longtable level) can be achieved with boxes, why not? The bottom line is, the solution needs to be flexible and modular enough. – Masroor Jan 22 '18 at 00:54
  • I agree with @UlrikeFischer, and that's what I was getting at as well. Also, as @Harald Hanche-Olsen said at the other question, this looks like something that can be done especially with LuaTeX, but I estimate a few days of effort for me (may be less for the experts), which unfortunately I am coming to realize (as I get older) is worth more than a few points on an online website. :-( – ShreevatsaR Jan 22 '18 at 03:01
  • These ones seem related: https://tex.stackexchange.com/questions/148421/auto-break-wide-table-vertically-not-horizontally https://tex.stackexchange.com/questions/93808/column-wise-break-of-extra-wide-tables/93810#93810 – Flinston Jan 23 '18 at 08:24
  • 1
    One possible solution is read the CSV file as a data frame in R, were with the xtable, kable or similar packages can export to nice latex tables of selected columns and rows (also in longtable format at least withxtable ) Code it all in the own LateX source code with knitr is straightforward. The main problem are the main multicolumns categories that do not fit the structure of a data frame, but they can be added to a particular sub data frame. – Fran Jan 24 '18 at 10:38
  • @Fran Can the whole process be automated? Or does it require any manual intervention? I have more than hundred tables. – Masroor Jan 24 '18 at 14:28

2 Answers2

5

This is a partial solution because it doesn't automate the building of the headers, but I think it could be useful because, at least, you don't have to re-type your data.

If you manage to put a # or a % before the first row of your .csv, you could try this:

\documentclass{article}

\usepackage{longtable}
\usepackage{array}
\newcolumntype{C}[1]{>{\centering\arraybackslash\hspace{0pt}}p{#1}}

\usepackage[legalpaper,landscape,left=25.0mm,right=25.0mm]{geometry}

\parindent 0.0mm

\usepackage{pgfplotstable}
\pgfplotsset{compat=1.14} 
\usepackage{filecontents}

% The following code lines are added only to create myfile.csv. Of course, you don't need them in your code
\begin{filecontents*}{myfile.csv}
    #Name,4:Category 1,3:Category 2,6:Category 3,8:Category 4,5:Category 5,7:Category 6
    Sub-category,Sc11,Sc12,Sc13,Sc14,Sc21,Sc22,Sc23,Sc31,Sc32,Sc33,Sc34,Sc35,Sc36,Sc41,Sc42,Sc43,Sc44,Sc45,Sc46,Sc47,Sc48,Sc51,Sc52,Sc53,Sc54,Sc55,Sc61,Sc62,Sc63,Sc64,Sc65,Sc66,Sc67
    Name 1,11,12,13,14,21,22,23,31,32,33,34,35,36,41,42,43,44,45,46,47,48,51,52,53,54,55,61,62,63,64,65,66,67
    Name 2,11,12,13,14,21,22,23,31,32,33,34,35,36,41,42,43,44,45,46,47,48,51,52,53,54,55,61,62,63,64,65,66,67
\end{filecontents*}
% end code to create myfile.csv

\begin{document}

    \newlength{\ncw}
    \setlength{\ncw}{10.00mm}

    \pgfplotstableread[col sep=comma]{myfile.csv}{\mytable}

    \pgfplotstabletypeset[
        begin table={\begin{longtable}},
        begin table/.add={}{[l]},
        every head row/.append style={before row={%
                \hline
                Name&\multicolumn{4}{c|}{Category 1}&\multicolumn{3}{c|}{Category 2}&\multicolumn{6}{c|}{Category 3}\\
                \hline
                \endhead
            },
        },      
        after row=\hline,       
        end table={\end{longtable}},
        every first column/.style={column type={|l|}},
        every column/.style={column type={C{\ncw}|}},
        columns={[index]0,[index]1,[index]2,[index]3,[index]4,[index]5,
            [index]6,[index]7,[index]8,[index]9,[index]10,[index]11,
            [index]12,[index]13},
        string type
        ]{\mytable}

    \pgfplotstabletypeset[
        begin table={\begin{longtable}},
        begin table/.add={}{[l]},
        every head row/.append style={before row={%
                \hline
                 Name&\multicolumn{8}{|c|}{Category 4}&\multicolumn{5}{c|}{Category 5}&\multicolumn{7}{c|}{Category 6}\\
                \hline
                \endhead
            },
        },      
        after row=\hline,       
        end table={\end{longtable}},
    every first column/.style={column type={|l|}},
    every column/.style={column type={C{\ncw}|}},
    columns={[index]0,[index]14,[index]15,[index]16,[index]17,[index]18,
        [index]19,[index]20,[index]21,[index]22,[index]23,[index]24,
        [index]25,[index]26,[index]27,[index]28,[index]29,[index]30,[index]31,
        [index]32,[index]33},
    string type
    ]{\mytable}
\end{document}

enter image description here

CarLaTeX
  • 62,716
  • How does your solution takes care of, 1. Automatic splitting of table at appropriate column? I see that you are manually putting Category 4-6 in the second table. 2. Variable number of columns in the CSV file? The total number of data columns in not always 33. It may be more, it may be less. – Masroor Jan 19 '18 at 08:16
  • 1
    @Masroor I said this is only a partial solution, just to avoid to retype the data. I'm afraid a complete solution is above my level of knowledge. Maybe you could start a bounty, since I saw nobody else answered. – CarLaTeX Jan 19 '18 at 08:22
3

Here's a possible solution. It uses datatooltk to split off the first line from the CSV file. Either run datatooltk before LaTeX or run LaTeX with -shell-escape. (Version 1.8 is needed for the option --csv-skiplines.)

Assumptions:

  1. The first column (Name) doesn't have sub-categories and always appears at the start of each table block.
  2. The number of rows doesn't exceed the page height. (Replace tabular with longtable if that occurs.)

MWE:

\documentclass{article}    

\usepackage[a4paper]{geometry}       
\usepackage{datatool}       

\immediate\write18{datatooltk --name datablocks --csv testdata.csv --nocsv-header --truncate 1 -o datablocks.dbtex}
\immediate\write18{datatooltk --name data --csv testdata.csv --csv-header --csv-skiplines 1 -o data.dbtex}

\DTLloaddbtex{\datablocks}{datablocks.dbtex}
\DTLloaddbtex{\data}{data.dbtex}

% split <n>:header

\def\parseblockheader#1:#2\endparseblock{%
 \def\blockspan{#1}%
 \def\blockheader{#2}%
}

\newcount\blockidx
\newcount\colidx

% First column is the name, which is a special case.

\blockidx=1

\DTLgetvalue{\currentvalue}{\datablocks}{1}{\blockidx}
\cslet{blockheader\number\blockidx}{\currentvalue}
\csdef{blockrange\number\blockidx}{1}
\csdef{blockspan\number\blockidx}{1}

\colidx=1

\loop
  \advance\blockidx by 1\relax
  \ifnum\blockidx>\DTLcolumncount\datablocks
  \else
    \DTLgetvalue{\currentvalue}{\datablocks}{1}{\blockidx}%
    \ifdefempty\currentvalue
    {% empty columns caused by discrepancy between column count on
     % line 1 of CSV being less than column count of remaining lines
      \edef\totalblocks{\number\numexpr\blockidx-1}%
     % break loop
      \blockidx=\DTLcolumncount{\datablocks}%
    }%
    {%
      \expandafter\parseblockheader\currentvalue\endparseblock
      \cslet{blockheader\number\blockidx}{\blockheader}%
      \cslet{blockspan\number\blockidx}{\blockspan}%
      \edef\endrange{\number\numexpr\colidx+\blockspan}%
      \global\advance\colidx by 1\relax
      \csedef{blockrange\number\blockidx}{\number\numexpr\colidx}%
      {%
        \loop
          \global\advance\colidx by 1
          \csxappto{blockrange\number\blockidx}{,\number\numexpr\colidx}%
        \ifnum\colidx<\endrange
        \repeat
      }%
    }%
  \fi
\ifnum\blockidx<\DTLcolumncount{\datablocks}
\repeat

\makeatletter
% iterate over columns in given block
\newcommand*{\forblock}[2]{%
 \letcs{\rangelist}{blockrange\number#1}%
 \@for\thiscol:=\rangelist\do{#2}%
}

% iterate over columns in given table
\newcommand*{\fortableblock}[2]{%
 \letcs{\blockrangelist}{blocktablecolspan\number#1}%
 \@for\thiscol:=\blockrangelist\do{#2}%
}

% iterate over blocks in given table
\newcommand*{\fortableblockrange}[2]{%
 \letcs{\blockrangelist}{blocktablerange\number#1}%
 \@for\thisblock:=\blockrangelist\do{#2}%
}
\makeatother

% find maximum widths (including \tabcolsep)

\newcommand*{\defandsetlength}[2]{%
  \global\newlength{#1}%
  \global\setlength{#1}{#2}%
}

\newcount\rowidx

% in case header needs to use a different font:
\newcommand{\headerfont}[1]{#1}

\newcommand{\computeblockwidths}{%
% compute header widths
  \dtlforeachkey(\thiskey,\thiscol,\thistype,\thisheader)\in\data\do
  {%
    \settowidth{\dimen0}{\headerfont\thisheader}%
    \dimen0=\dimexpr\dimen0+2\tabcolsep\relax
    \expandafter\defandsetlength
      \csname columnwidth\number\thiscol\endcsname{\dimen0}%
    % save header
    \cslet{columnheader\number\thiscol}{\thisheader}%
  }%
  \loop
    \advance\rowidx by 1\relax
    {%
      \blockidx = 0\relax
      \loop
       \advance\blockidx by 1\relax
       \forblock{\blockidx}{%
         \DTLgetvalue{\currentvalue}{\data}{\rowidx}{\thiscol}%
         \settowidth{\dimen0}{\currentvalue}%
         \dimen0=\dimexpr\dimen0+2\tabcolsep\relax
         \ifcsdef{columnwidth\number\thiscol}
         {%
           \ifnum\dimen0>\csname columnwidth\number\thiscol\endcsname
             \csname columnwidth\number\thiscol\endcsname=\dimen0
           \fi
         }%
         {%
           \expandafter\defandsetlength
             \csname columnwidth\number\thiscol\endcsname{\dimen0}%
         }%
       }%
      \ifnum\blockidx<\totalblocks
      \repeat
    }%
  \ifnum\rowidx<\DTLrowcount{\data}
  \repeat
  % compute block widths
  \blockidx=0
  \loop
   \advance\blockidx by 1\relax
   \expandafter\newlength\csname blockwidth\number\blockidx\endcsname
   \forblock{\blockidx}{%
     \advance\csname blockwidth\number\blockidx\endcsname by
       \csname columnwidth\number\thiscol\endcsname
   }%
   % check if block headers are wider
   \settowidth{\dimen0}%
     {\headerfont{\csname blockheader\number\blockidx\endcsname}}%
   \dimen0=\dimexpr\dimen0+2\tabcolsep\relax
   \ifdim\dimen0>\csname blockwidth\number\blockidx\endcsname
     \csname blockwidth\number\blockidx\endcsname=\dimen0\relax
   \fi
  \ifnum\blockidx<\totalblocks
  \repeat
}

% create table code

\newlength\currentwidth
\newcount\currenttable
\newcount\maxtables

\newcommand{\createtable}[1]{%
  \csgdef{blocktablecolspec\number#1}{l}%
  \csxdef{blocktablecolspan\number#1}{\csuse{blockrange1}}%
  \csxdef{blocktablerange\number#1}{1}%
}

\newcommand{\dotable}{%
 \computeblockwidths
 % first column always present
 \global\currentwidth=\csname blockwidth1\endcsname
 \global\currenttable=1\relax
 \createtable{1}%
 % loop over remaining blocks
 \blockidx=1
 \loop
  \advance\blockidx by 1\relax
  \global\currentwidth=\dimexpr\currentwidth
    +\csname blockwidth\number\blockidx\endcsname\relax
  \relax
  \ifdim\currentwidth>\linewidth
    \global\currentwidth=\dimexpr\csname blockwidth1\endcsname
      +\csname blockwidth\number\blockidx\endcsname\relax
    \global\advance\currenttable by 1\relax
    \createtable\currenttable
  \fi
  \csgappto{blocktablecolspec\number\currenttable}{|}%
  {%
    \colidx=0
    \loop
      \advance\colidx by 1\relax
      \csgappto{blocktablecolspec\number\currenttable}{l}%
    \ifnum\colidx<\csname blockspan\number\blockidx\endcsname
    \repeat
  }%
  \csxappto{blocktablecolspan\number\currenttable}%
    {,\csuse{blockrange\number\blockidx}}%
  \csxappto{blocktablerange\number\currenttable}{,\number\blockidx}%
 \ifnum\blockidx<\totalblocks
 \repeat
 % save table count
 \maxtables=\currenttable
 % create table code
 \currenttable=0\relax
 \loop
  \advance\currenttable by 1\relax
  % tabular setup
  \def\currenttablecode{\begin{tabular}}%
  \eappto\currenttablecode{%
   {|\csname blocktablecolspec\number\currenttable\endcsname|}%
   \noexpand\hline}%
  % block headers
  \fortableblockrange{\currenttable}{%
    \ifnum\thisblock>1
     \appto\currenttablecode{&}%
    \fi
    \eappto\currenttablecode{%
     \noexpand\multicolumn
     {\csname blockspan\number\thisblock\endcsname}
     {|c|}%
     {%
       \noexpand\headerfont{%
          \expandonce{\csname blockheader\number\thisblock\endcsname}}%
     }%
    }%
  }%
  % column sub-headers
  \appto\currenttablecode{\\}%
  \fortableblock{\currenttable}{%
    % get header
    \letcs{\thisheader}{columnheader\number\thiscol}%
    \ifnum\thiscol>1
     \appto\currenttablecode{&}%
    \fi
    \eappto\currenttablecode{\noexpand\headerfont{\expandonce\thisheader}}%
  }%
  % table body
  {%
   \rowidx=0
   \loop
     \advance\rowidx by 1\relax
     \gappto\currenttablecode{\\\hline}%
     \fortableblock{\currenttable}{%
       \DTLgetvalue{\currentvalue}{\data}{\rowidx}{\thiscol}%
       \ifnum\thiscol>1
         \gappto\currenttablecode{&}%
       \fi
       \xappto\currenttablecode{\expandonce\currentvalue}%
     }%
   \ifnum\rowidx<\DTLrowcount{\data}
   \repeat
  }%
  % tabular end
  \appto\currenttablecode{\\\hline\end{tabular}}%
  % do table
  \bigskip\par\noindent\currenttablecode
 \ifnum\currenttable<\maxtables
 \repeat
}

\begin{document} 

\dotable

\end{document}

Result:

image of result

Nicola Talbot
  • 41,153