Simple way to concatenate Dataset columns

Question

Consider these two datasets:

d1 = Transpose@Dataset[<|"a" -> Range[5]|>];
d2 = Transpose@Dataset[<|"b" -> Range[5, 1, -1]|>];
{d1, d2}

Mathematica graphics

What's the simplest way to concatenate them "horizontally" and get the following?

Mathematica graphics

Is there a way without explicitly extracting the contents of the datasets, i.e. resorting to Normal?

I would have expected Join[d1, d2, 2] to work, but it doesn't. Dataset@Join[Normal@d1, Normal@d2, 2] works but it's complicated. Transpose@Join[Transpose[d1], Transpose[d2]] is also complicated. For plain old matrices (lists of lists) I'd just use ArrayFlatten, which doesn't work on Datasets.

I have the same question for the case where the rows are labelled too:

d1 = Dataset[<|"x" -> <|"a" -> 1|>, "y" -> <|"a" -> 2|>, "z" -> <|"a" -> 3|>|>];
d2 = Dataset[<|"x" -> <|"b" -> 4|>, "y" -> <|"b" -> 5|>, "z" -> <|"b" -> 6|>|>];
{d1, d2}

Mathematica graphics

Assume an identical number of rows and identical row labels between d1 and d2.

+1. I regard the Association[.] as the more fundamental data structure and I use Dataset[.] as a mere wrapper to limit large outputs. — Romke Bontekoe, May 19 '15 at 14:31
By the way I think it could be considered a bug that Join[d1, d2, 2] does not work given that Join otherwise does. Have you filed a report? — Mr.Wizard, May 20 '15 at 08:02
@RomkeBontekoe, Association:brick ::Dataset:building. The functionality is sophisticated, for example, I wrote 1-line recursive Trie constructor Query to index (reconstruct) a variable-depth file system tree. — alancalvitti, May 26 '15 at 21:48

Mr.Wizard · Answer 1 · 2015-05-20T14:03:57.267

12

This looks nicer in a Notebook:

Join[d1\[Transpose], d2\[Transpose]]\[Transpose]

Unfortunately transposing a Dataset is very slow. Gordon Coale's alternative is much faster, but the original Dataset@Join[Normal@d1, Normal@d2, 2] is more than an order of magnitude faster than that.

edited May 20 '15 at 14:03

answered May 19 '15 at 14:12

Mr.Wizard

271,378
34
587
1,371

2

So I didn't overlook anything then, and this is really the shortest way. I'll just make a function for it. This problem came up when processing a dataset in two different ways, ending up with two datasets of compatible shapes but different contents (columns). It is often inconvenient and sometimes very cumbersome to redo the calculations in a way that produces a single dataset in one go. It's simpler to just continue using what I have and combine them to a single dataset. – Szabolcs May 19 '15 at 18:17
1

@Szabolcs Damn, I missed Transpose@Join[Transpose[d1], Transpose[d2]] in the question and I'm guessing five voters did too. Should I just delete this? – Mr.Wizard May 19 '15 at 18:56
1

I upvoted based on the "looks nicer part" ;) – Gordon Coale May 20 '15 at 08:00
@Gordon Okay :-) – Mr.Wizard May 20 '15 at 08:00
@Mr.Wizard No, keep it. – Szabolcs May 20 '15 at 08:58
I am actually surprised that Join works on Dataset-s, since the documentation states that Join works Association objects (in Details). There is no mention of Dataset-s. – Romke Bontekoe May 20 '15 at 15:01
1

@Romke Much of the new functionality still in development and largely undocumented. The best course of action is, in my opinion, to simply try stuff and see what works. – Mr.Wizard May 20 '15 at 15:32

score 9 · Answer 2 · answered May 20 '15 at 08:54

This is fugly but fulfils the need of staying in the Dataset domain and is much quicker for large datasets. Basically if we use the analogy of a dataset being a SQL table - I do what I would do in the same situation. Create a dummy key on each, join, then drop the dummy key. Personally for small Datasets I prefer the @Mr.Wizard approach from a readability perspective :D

d1 = Transpose@Dataset[<|"a" -> Range[50000]|>];
d2 = Transpose@Dataset[<|"b" -> Range[50000, 1, -1]|>];

{
JoinAcross[d1[MapIndexed[Append[#1, "dummy" -> First@#2] &]], 
d2[MapIndexed[Append[#1, "dummy" -> First@#2] &]], "dummy"][All, {"a", "b"}] // AbsoluteTiming, 
(* Transpose approach *)
Join[d1\[Transpose], d2\[Transpose]]\[Transpose] // AbsoluteTiming
}

Concatentate cols

This demonstrates some interesting functionality but now that I test it Szabolcs's original Dataset@Join[Normal@d1, Normal@d2, 2] is more than an order of magnitude faster on your own example. — Mr.Wizard, May 20 '15 at 14:01

score 5 · Answer 3 · answered Sep 29 '19 at 20:35

5

In version 12 Join[d1, d2, 2] seems to work, albeit a bit slower than Szabolcs's Dataset@Join[Normal@d1, Normal@d2, 2].

answered Sep 29 '19 at 20:35

wigg0t

290
2
6

Simple way to concatenate Dataset columns

3 Answers3

Linked