1

I have a CSV file that has various columns. One of the columns is called summits. This column contains multiple summits in the following form:

Summit name (elevation m)

For example

Mount Everest (8848m)

What I'm trying to do is to to create a table that lists all unique summits in descending order. For example like this:

enter image description here

I was already able to create a list with unique entries using the datatool package.

\DTLforeach*{tours}{\Summit=summits}{%
  \expandafter\DTLifinlist\expandafter{\Summit}{\uniquesummit}%
  {}
  {%
    \ifdefempty{\uniquesummit}%
    {\let\uniquesummit\Summit}% 
    {% 
      \eappto\uniquesummit{,\Summit}%
    }%
  }%
}

What is left is:

  • Splitting the name of the summit and the elevation
  • Sorting it
  • Creating a table

I thought I figured out creating the table with this answer, but in my case, all the summits are just in the first row and not split around multiple rows. I guess the reason is, that my list is in a slightly different form? But why?

Here is a MWE that should better describe my problem. Please note that in the actual CSV file, the ; is a tab. I couldn't get it working with a tab in the MWE:

\documentclass{article}
\usepackage{filecontents}
\usepackage{datatool}

\begin{filecontents}{summits.csv} title;summits;description test1;Matterhorn (4500m);description1 test2;Mount Everest (8848m);description2 test3;K2 (8611m), Kangchenjunga (8586m);description3 test4;Mount Everest (8848m);description4 \end{filecontents}

\begin{document} \DTLsetseparator{;}% \DTLloaddb{tours}{summits.csv}

\newcommand*{\uniquesummit}{}

\DTLforeach*{tours}{\Summit=summits}{%
    \expandafter\DTLifinlist\expandafter{\Summit}{\uniquesummit}%
    {}{
        \ifdefempty{\uniquesummit}%
        {\let\uniquesummit\Summit}%
    {%
        \eappto\uniquesummit{,\Summit}%
        }%
    }%
}

% Shows that I can find the unique summits List of unique summits: \uniquesummit.

\bigskip

% This I would like to generate based on the summits.csv \begin{tabular}{l l l} 1 &Mount Everest & 8848\ 2 & K2 & 8611\ 3 & Kangchenjunga & 8586\ 4 & Matterhorn & 4500\ \end{tabular}

\end{document}

SCM
  • 57

1 Answers1

1

The following is using expl3 for the data processing. It works in the following way:

First read in the first line of the file and parse the header. Find the column which is called summits, so that in further processing we know which column is the interesting one.

Next read the file line by line and extract the $n$-th column (the one named summits). Split that column on commas (to get all the summits in that line), and then extract the name and height of each summit via a regular expression. Put the found values into a single seq variable.

Afterwards we remove duplicates from this seq, and sort for the height of the mountains (in descending order).

Then output the table. Each seq element in the now sorted table is given to \readmountains_output_single:n which formats every line.

This is all done by the macro \readmountains, which takes an optional and a mandatory argument. The mandatory argument is the file name, the optional one is the column delimiter (defaulting to a Tab, which is ^^I if you want to input it directly from within TeX, but the optional argument wants the delimiter escaped, so you'd provide \^^I for a Tab, which is the default).

(there is a bit of error checking going on, so that

\documentclass[]{article}

\begin{filecontents}{\jobname.csv} title;summits;description test1;Matterhorn (4500m);description1 test2;Mount Everest (8848m);description2 test3;K2 (8611m), Kangchenjunga (8586m);description3 test4;Mount Everest (8848m);description4 \end{filecontents}

\ExplSyntaxOn \ior_new:N \g_readmountains_file_ior \seq_new:N \l_readmountains_mountains_seq \seq_new:N \l_readmountains_extract_seq \int_new:N \g_readmountains_output_int \int_new:N \l_readmountains_col_int \int_new:N \l_readmountains_catcode_int \int_new:N \l_readmountains_line_int \seq_new:N \l_readmountains_csv_seq \scan_new:N \s_readmountains_end \msg_new:nnn { readmountains } { no-summits } { Column ~ summits' ~ not ~ found. ~ Aborting. } \msg_new:nnn { readmountains } { short-line } { Short ~ line ~ encountered ~ in ~ line ~ \int_use:N \l_readmountains_line_int. ~ Aborting. } \msg_new:nnn { readmountains } { malformatted-summit } { Malformatted ~ summit ~ encountered ~ in ~ line ~ \int_use:N \l_readmountains_line_int. ~ Aborting. } \NewDocumentCommand \readmountains { O{\^^I} m } { \readmountains:Nn #1 {#2} } \cs_new_protected:Npn \readmountains:Nn #1#2 { \readmountains_read_file:NnN #1 {#2} \l_readmountains_mountains_seq \seq_remove_duplicates:N \l_readmountains_mountains_seq \readmountains_sort:N \l_readmountains_mountains_seq \readmountains_output:N \l_readmountains_mountains_seq \use_none:n \s_readmountains_end } \cs_new_protected:Npn \readmountains_read_file:NnN #1#2#3 { \int_set:Nn \l_readmountains_catcode_int { \char_value_catcode:n {#1 } } \char_set_catcode_other:N #1 \ior_open:Nn \g_readmountains_file_ior {#2} \seq_clear:N #3 \ior_get:NN \g_readmountains_file_ior \l_tmpa_tl \exp_args:NNx \seq_set_split:NnV \l_readmountains_csv_seq { \char_generate:nn { #1 } { 12 } } \l_tmpa_tl \int_zero:N \l_readmountains_col_int \seq_map_inline:Nn \l_readmountains_csv_seq { \int_incr:N \l_readmountains_col_int \str_if_eq:nnT { summits } {##1} { \seq_map_break: } } \str_if_eq:eeF { \seq_item:Nn \l_readmountains_csv_seq \l_readmountains_col_int } { summits } { \msg_error:nn { readmountains } { no-summits } \readmountains_use_none_delimit_by_s_end:w } \int_set:Nn \l_readmountains_line_int \c_one_int \ior_map_inline:Nn \g_readmountains_file_ior { \int_incr:N \l_readmountains_line_int \exp_args:NNx \seq_set_split:Nnn \l_readmountains_csv_seq { \char_generate:nn {#1 } { 12 } } {##1} \int_compare:nNnT \l_readmountains_col_int > { \seq_count:N \l_readmountains_csv_seq } { \msg_error:nn { readmountains } { short-line } \ior_map_break:n { \readmountains_use_none_delimit_by_s_end:w } } \exp_args:Nnnx \seq_set_split:Nnn \l_readmountains_csv_seq {,} { \seq_item:Nn \l_readmountains_csv_seq \l_readmountains_col_int } \seq_map_inline:Nn \l_readmountains_csv_seq { \tl_if_empty:nF { ####1 } { \regex_extract_once:nnNF { (.*)\s((\d+)m) } { ####1 } \l_readmountains_extract_seq { \msg_error:nn { readmountains } { malformatted-summit } \readmountains_use_none_delimit_by_s_end:w } \seq_push:Nx #3 { { \seq_item:Nn \l_readmountains_extract_seq {2} } { \seq_item:Nn \l_readmountains_extract_seq {3} } } \use_none:n \s_readmountains_end } } } \ior_close:N \g_readmountains_file_ior \char_set_catcode:nn { `#1 } \l_readmountains_catcode_int } \cs_new_protected:Npn \readmountains_sort:N #1 { \seq_sort:Nn #1 { \int_compare:nNnTF { \use_ii:nn ##1 } < { \use_ii:nn ##2 } \sort_return_swapped: \sort_return_same: } } \cs_new_protected:Npn \readmountains_output:N #1 { \int_gzero:N \g_readmountains_output_int \begin { tabular } { r l r } \seq_map_function:NN #1 \readmountains_output_single:n \end { tabular } } \cs_new_protected:Npn \readmountains_output_single:n #1 { \int_gincr:N \g_readmountains_output_int \int_use:N \g_readmountains_output_int . & \use_i:nn #1 & \use_ii:nn #1 \ } \cs_new:Npn \readmountains_use_none_delimit_by_s_end:w #1 \s_readmountains_end {} \ExplSyntaxOff

\begin{document} % since the file was written with ; as the delimiter use that \readmountains[;]{\jobname.csv} \end{document}

enter image description here

Skillmon
  • 60,462
  • Wow that's awesome. I just tried it an it works perfectly! Also thanks a lot for the explanation. That is very helpful.

    I hope with this explanation I can figure out the last bit that I forget to put into the MWE. Sometimes, the summits row can also be empty. At the moment this code then fails, because it does not match the regex in the readmountains_read_file.

    – SCM Jan 16 '21 at 21:18
  • @SCM I've changed the code such that it ignores empty summits (while still throwing an error on malformatted summits). Also, now after throwing the error for malformatted summits the macro only ignores this one entry. (I didn't test these changes, but only done them directly in this answer, they should work but poke me if they don't) – Skillmon Jan 16 '21 at 21:30
  • @SCM also, feel free to leave a comment if you need more explanations (or want me to add comments to the code with more explanations), if time permits I'd add them. – Skillmon Jan 16 '21 at 21:32
  • I just tested it. Works perfectly! Thanks a lot! – SCM Jan 16 '21 at 21:32
  • @SCM if you want to add a table headrow, make these changes in \readmountains_output:N (keep in mind that a space must be added as ~ in between of \ExplSyntaxOn ...\ExplSyntaxOff). To change the formatting of individual rows change the definition of \readmountains_output_single:n (use \use_i:nn #1 to access the name, \use_ii:nn for the height). – Skillmon Jan 16 '21 at 21:42
  • Thanks! Your code was so readable that I was already able doing that on my own. Thansk! – SCM Jan 16 '21 at 22:21