9

I have a huge CSV file with about 25.5k records that (structurally) looks like the sample below. In essence, I want to display the values of the columns U,V,W,X,Y,Z of each row as a colored box. The color of each box should represent its value, i.e. the value dictates what color along some gradient is to be drawn. E.g.: The larger the value, the darker the box.

Now, there are a few things that I'd like to highlight using colors:

  • Group names should be written to the left of the item names, turned 90 degrees, once per group. (items are sorted by group)
  • The background colors of groups (and their items) should alternate
  • Within a group, item's background colors should alternate
  • There is a column special, which when TRUE/1 should change the item's bg-color(map) to a special/alternative color for the current group's bg-color. In total there are two possible group bg-colors, and four possible item bg-colors.
  • Also, when special is TRUE/1, the value color should be picked from a special/alternative gradient. In total there are two gradients/colormaps to pick from.
  • The labels/columns U,V,W,X,Y,Z at the bottom should also be colored with alternating bg-colors.
  • the labels/columns at the bottom should at a 90 degree angle, and aligned to the right (or top, in absolute terms).

Since the number of rows is so large, and I want the complete drawing to fit on a single page, the rows will need to be quite thin. (The columns should fit without compressing). Never the less, I do want the group and item text in there, because the dedicated reader should be able to zoom into the digital version of the final document, while regular/paper readers can obtain sufficient information from the alternating group colors.

Sadly, I have no idea how to start solving this problem.

conceptual rendering (done in a spreadsheet):
This is roughly what I hope to achieve. (Minus the grid lines.)

enter image description here

sample csv:

group,item,special,U,V,W,X,Y,Z
a,a1,0,0.2,,0.2,,,
a,a2,0,,0.1,,,0.4,1
a,a3,0,,0.5,,,,
a,a4*,1,0.1,,0.8,,,
a,a5*,1,,,,,0.5,0.5
a,a6,0,,,,0.3,,
b,b1,0,,0.1,,,,
b,b2,0,0.6,,,0.4,,
b,b3*,1,,,0.4,,,
c,c1*,1,,,,,,0.1
c,c2*,1,,0.2,,,0.3,0.2
c,c3,0,,,0.7,,,
c,c4,0,,0.6,,0.3,,
c,c5,0,,,,,,0.7
...

columns explained:

  • group: String
  • item: String
  • special: Boolean
  • U,V,W,X,Y,Z: real number in the range 0..1
derabbink
  • 1,640

1 Answers1

9

EDIT: After the OP stressed again that he want to use existing colormap from pgfplots, I think it might be better for him to use \pgfplotstabletypeset with a bit of preprocessing of csv.

However, if you're like me who is more comfortable with tikz/pgf and python, then the answer below should provide more flexibility. The one thing I'm not satisfied with this answer is that the table will not auto-adjust if group label is too long.


in.csv:

group,item,special,U,V,W,X,Y,Z
a,a1,0,0.2,,0.2,,,
a,a2,0,,0.1,,,0.4,1
a,a3,0,,0.5,,,,
a,a4*,1,0.1,,0.8,,,
a,a5*,1,,,,,0.5,0.5
a,a6,0,,,,0.3,,
b,b1,0,,0.1,,,,
b,b2,0,0.6,,,0.4,,
b,b3*,1,,,0.4,,,
c,c1*,1,,,,,,0.1
c,c2*,1,,0.2,,,0.3,0.2
c,c3,0,,,0.7,,,
c,c4,0,,0.6,,0.3,,
c,c5,0,,,,,,0.7

main.py:

import csv

# read csv into matrix
with open('in.csv') as csvf:
    f = csv.reader(csvf)
    a = [r for r in f]

# separate the header
header = a[0]
a = a[1:]

# separate by group
def sep_group(a):
    cur_group = None
    groups = []
    for r in a:
        # new group?
        if cur_group != r[0]:
            groups.append([r])
            cur_group = r[0]
        else:
            groups[-1].append(r)
    return groups
a = sep_group(a)

# calculating shade based on value in csv
def shade(x):
    return 0 if x == '' else str(int(float(x)*100))

# output while traversing data
with open('out.tex','w') as f:
    f.write('\\matrix [nodes={cell}] {\n')
    for gi,g in enumerate(a):
        for ri,r in enumerate(g):
            # empty node to place group label upon later
            f.write('\\node{}{} {{}}; &\n'.format(
                ' [alias=g{}begin]'.format(gi) if ri == 0 else '',
                ' [alias=g{}end]'.format(gi) if ri == len(g)-1 else ''))
            # item label
            f.write('\\node [group{}{}/item{},minimum width=3em] {{{}}}; &\n'.format(gi%2,'/special' if r[2] == '1' else '',ri%2,r[1]))
            # cells in the row
            f.write(' &\n'.join(
                '\\node [fill={}!{}] {{}};'.format('Emerald' if r[2]=='1' else 'black',shade(c))
                for c in r[3:]))
            f.write(' \\\\\n')
    # footer
    f.write('&')
    for ci in range(len(g[0])-3):
        f.write('&\n\\node [footer{},rotate=90] {{{}}}; '.format(ci%2,chr(ci+ord('U'))))
    f.write('\\\\\n')
    # end matrix
    f.write('};\n')
    # now overlay the group labels
    for gi,g in enumerate(a):
        f.write('\\node (last) [inner sep=0,group{},fit=(g{}begin) (g{}end)] {{}};\n'.format(gi%2,gi,gi))
        f.write('\\node [rotate=90,anchor=mid] at (last) {{{}}};\n'.format(g[0][0]))

running main.py produces out.tex:

\matrix [nodes={cell}] {
\node [alias=g0begin] {}; &
\node [group0/item0,minimum width=3em] {a1}; &
\node [fill=black!20] {}; &
\node [fill=black!0] {}; &
\node [fill=black!20] {}; &
\node [fill=black!0] {}; &
\node [fill=black!0] {}; &
\node [fill=black!0] {}; \\
\node {}; &
\node [group0/item1,minimum width=3em] {a2}; &
\node [fill=black!0] {}; &
...

out.tex is used in main.tex:

\documentclass{article}
\usepackage[dvipsnames]{xcolor}
\usepackage{tikz}
\usepackage[active,tightpage]{preview}
\usetikzlibrary{matrix}
\usetikzlibrary{fit}

\PreviewEnvironment{tikzpicture}
\setlength\PreviewBorder{5pt}

\begin{document}
\tikzset{
    cell/.style={outer sep=0pt, minimum size=2em},
    group0/.style={fill=blue!30},
    group0/.cd,
        item0/.style={fill=blue!30},
        item1/.style={fill=blue!60},
        special/.cd,
            item0/.style={fill=Emerald!30},
            item1/.style={fill=Emerald!60},
    %
    /tikz/.cd,
    group1/.style={fill=Dandelion!30},
    group1/.cd,
        item0/.style={fill=Dandelion!30},
        item1/.style={fill=Dandelion!60},
        special/.cd,
            item0/.style={fill=LimeGreen!30},
            item1/.style={fill=LimeGreen!30},
    %
    /tikz/.cd,
    footer0/.style={fill=YellowGreen!60},
    footer1/.style={fill=RedOrange!60},
}

\begin{tikzpicture}
    \input{out.tex}
\end{tikzpicture}
\end{document}

result:

output v3

This shows the matrix being painted in shades according to the values in CSV file. I try to use as similar color to OP as possible.

  • edits to add header and row labels to this answer are welcomed, as I might be too busy to modify this. – Apiwat Chantawibul Jan 15 '14 at 16:03
  • I also haven't alternate the color between adjacent rows as well. – Apiwat Chantawibul Jan 15 '14 at 16:09
  • updated, only 1 thing left: group label – Apiwat Chantawibul Jan 15 '14 at 16:40
  • Looks good so far. Does it work with pgfplots's colormaps as well? N.B.: I only need two color(map)s to pick from for the value cells. For the item column, there should be four, though. I updated the question to reflect this clarification. – derabbink Jan 15 '14 at 17:29
  • Actually, I have not used pgfplots before, but since pgfplots's based on tikz/pgf which is what I am using here. I assumed that it would work. Can you send a link to documentation/example of colormap to me? – Apiwat Chantawibul Jan 15 '14 at 17:40
  • I am adding the group label, but you raised a good point colormap which should be checked before I get any deeper into this. I'm waiting. – Apiwat Chantawibul Jan 15 '14 at 17:42
  • found colormap documentation, http://www.bakoma-tex.com/doc/latex/pgfplots/pgfplots.pdf. Seeing those examples, it should work together. I'm experimenting. – Apiwat Chantawibul Jan 15 '14 at 17:47
  • actually, this question is very relevant: Drawing heatmaps using TikZ – Apiwat Chantawibul Jan 15 '14 at 17:59
  • The only thing missing from \pgfplotstabletypeset (as I see in those questions) is how to do multirow cell for group label. the complex coloring scheme in this question can be achieved by defining custom colormap with some discontinuous points to account for 'special & normal' coloring with values in csv being preprocessed (maybe by python) accordingly. This seem like a better solution than mine, but it has gone outside my comfort zone. So, I'm going to leave this where it is now. – Apiwat Chantawibul Jan 15 '14 at 18:16
  • Since this answer is still incomplete I will not accept it yet. However, to recognize your effort I have awarded you the bounty. – derabbink Jan 22 '14 at 13:21
  • @derabbink I got time to come back and finished this off. What do you think? – Apiwat Chantawibul Feb 06 '14 at 22:34