Split dataset based on the first column

Question

I have a dataset which I want to split based on the first column.

The data looks like this

AA =
 {{"Symbol", "Full Name", "Date", "Open", "High", "Low", "Close", 
  "Adj Close", "Volume", "Type"},
 {"KO", "The Coca-Cola Company", {2013, 7, 19, 0, 0, 0.}, 40.88, 41.1,
   40.79, 41.09, 41.09, 1.14011*10^7, "Stock"},
 {"KO", "The Coca-Cola Company", {2013, 7, 18, 0, 0, 0.}, 40.86, 
  41.07, 40.74, 40.81, 40.81, 9.7088*10^6, "Stock"},
 {"KO", "The Coca-Cola Company", {2013, 7, 17, 0, 0, 0.}, 40.55, 
  40.98, 40.31, 40.84, 40.84, 1.85131*10^7, "Stock"},
 {"KO", "The Coca-Cola Company", {2013, 7, 16, 0, 0, 0.}, 39.78, 40.5,
   39.5, 40.23, 40.23, 3.35756*10^7, "Stock"},
 {"KO", "The Coca-Cola Company", {2013, 7, 15, 0, 0, 0.}, 41.05, 
  41.25, 40.93, 41.01, 41.01, 1.14184*10^7, "Stock"},
 {"KO", "The Coca-Cola Company", {2013, 7, 12, 0, 0, 0.}, 41.03, 
  41.13, 40.73, 41.03, 41.03, 1.06864*10^7, "Stock"}
 {"MCD", "McDonald", {2013, 7, 19, 0, 0, 0.}, 100.2, 100.41, 99.53, 
  100.27, 100.27, 4.5083*10^6, "Stock"},
 {"MCD", "McDonald", {2013, 7, 18, 0, 0, 0.}, 100.48, 100.77, 99.99, 
  100.18, 100.18, 3.4016*10^6, "Stock"},
 {"MCD", "McDonald", {2013, 7, 17, 0, 0, 0.}, 100.05, 100.35, 99.3, 
  100.1, 100.1, 5.3774*10^6, "Stock"},
 {"MCD", "McDonald", {2013, 7, 16, 0, 0, 0.}, 100.18, 101.12, 99.47, 
  100.88, 100.88, 4.4062*10^6, "Stock"},
 {"MCD", "McDonald", {2013, 7, 15, 0, 0, 0.}, 101.6, 101.73, 100.7, 
  100.75, 100.75, 4.4807*10^6, "Stock"},
 {"MCD", "McDonald", {2013, 7, 12, 0, 0, 0.}, 100.59, 101.81, 100.5, 
 101.58, 101.58, 4.7668*10^6, "Stock"},
 {"MCD", "McDonald", {2013, 7, 11, 0, 0, 0.}, 100.75, 100.96, 99.76, 
  100.79, 100.79, 4.0641*10^6, "Stock"}}

I've been able to do it:

Split[AA, First[#1] === First[#2] &]

But I do not understand the theory behind it. This is how far I got

Split[list, test] treats pairs of adjacent elements as identical whenever applying the function test to them yields True.

From this I get that Split is a function where AA is my list, and that it splits the data into groups when the test yields true.

But I get confused here.

First[#1] === First[#2] &

First[{a, b, c}]

a

Based on this I get that it picks the first element of the list within the list, in my case "KO" This means that when First[#1] equals First[#2], it yields true. However, I don't understand the function of #.

How does this predicate function work? Does it take

Row 1 column 1 = Row 2 column 1 --> False
Row 2 column 1 = Row 3 column 1 --> True 
Row 3 Column 1 = Row 4 column 1 --> True 
...

and then group all the True and all the False together and create seperate lists?

Same function again:

I'm just not sure what #1 does in the list and what #2 does in the list and why & needs to be added at the end.

Same function as above ---> Split[AA, First[#1] === First[#2] &]

Hello. # is the Slot. You may want to take a look at Slot and PureFunction in documentation. Also, this part of common pitfalls awaiting new users is going to help you. Next time please check this in order to know how to format your question. — Kuba, Jul 22 '13 at 06:28
Thank you for the replay! I have looked at the reference, always do before I ask a question. I do not understand what the reference is stating, is it correct my assumtion about how the true false function work and what does slot actually state? — ALEXANDER, Jul 22 '13 at 06:48
No you are not right, but close. False given by test are not grouped at the end, those are marks where list is splitted. Notice that test "is applied" n-1 times, where n=Length@list. For example: Split[{1,1,2,2},#1==#2&] gives the test sequence {true,false,true}, now you see that it is not going to "group" all the sequences of False and True. Also, notice that === is not Equal, if you are new to Mathematica you will need to spend some time with Documentation. We are certainly going to help you but with more complicated problems. — Kuba, Jul 22 '13 at 07:03
In cases like this one I'll prefer to use GatherBy: GatherBy [AA, First] — bobknight, Jul 22 '13 at 08:10

score 6 · Accepted Answer · answered Jul 22 '13 at 08:08

You seem to be on the right track but having some trouble with the anonomous function syntax. I highly recommend you read up on this as it's a very useful concept. Essentially #1+#2& is shorthand that translates into Function[{arg1,arg2},arg1+arg2]. So a good place to start is the documentation for Function.

Now to the problem at hand. I understand it you want to sort your list such that you get one list containing all entries starting with "KO", and one with all the entries starting with "MCD", and so forth. In that case you should not be using Split on it's own. Since it only sorts out runs of identical elements. eg Split[{a,a,b,a,a}]=> {{a,a},{b},{a,a}} So what you need to do is to first sort your list, and then split it. Now it happens to be the case that there are convinient functions for sorting and splitting based on the output of a function, so you don't actually need the First[#1]==First[#2]& syntaxt that you find confusing. Still I would again highly recommend you read up on it. Anyways, here's an example of code that solves your problem:

 sortedAA = SortBy[AA, First];
 splitAA  = SplitBy[sortedAA, First];

First we sort the list so that all entries with the same first element apear in order eg. SortBy[{a,a,b,a,a},First] => {a,a,a,a,b}, then we split this into sublists, eg. Split[{a,a,a,a,b}] => {{a,a,a,a},{b}}.

Split dataset based on the first column

1 Answers1

Linked