Extracting lists from list based on length

Question

I have a list looking like this;

{{"ADS GY Equity", "", "", "ALV GY Equity", "", "", "BAS GY Equity", 
"", "", "BAYN GY Equity", "", "", "BEI GY Equity", "", "", 
"BMW GY Equity", ""}, {"Date", "PX_LAST", "", "Date", "PX_LAST", "",
"Date", "PX_LAST", "", "Date", "PX_LAST", "", "Date", "PX_LAST", 
"", "Date", "PX_LAST"}, {"addidas", 17.925, "", "alvac", 287.292, 
"", "basse", 24.875, "", "bayern", 42.34, "", "begge", 23.667, "", 
"BMW", 29.49}, {{2000, 1, 4, 0, 0, 0.}, 17.5, 
"", {2000, 1, 4, 0, 0, 0.}, 285.934, "", {2000, 1, 4, 0, 0, 0.}, 
23.925, "", {2000, 1, 4, 0, 0, 0.}, 41.216, 
"", {2000, 1, 4, 0, 0, 0.}, 21.333, "", {2000, 1, 4, 0, 0, 0.}, 
28.3}, {{2000, 1, 5, 0, 0, 0.}, 17.5, "", {2000, 1, 5, 0, 0, 0.}, 
294.078, "", {2000, 1, 5, 0, 0, 0.}, 23.375, 
"", {2000, 1, 5, 0, 0, 0.}, 40.176, "", {2000, 1, 5, 0, 0, 0.}, 21.,
 "", {2000, 1, 5, 0, 0, 0.}, 27.74}, {{2000, 1, 6, 0, 0, 0.}, 18.25,
 "", {2000, 1, 6, 0, 0, 0.}, 297.698, "", {2000, 1, 6, 0, 0, 0.}, 
24.015, "", {2000, 1, 6, 0, 0, 0.}, 41.356, 
"", {2000, 1, 6, 0, 0, 0.}, 21.867, "", {2000, 1, 6, 0, 0, 0.}, 
27.65}, {{2000, 1, 7, 0, 0, 0.}, 18., "", {2000, 1, 7, 0, 0, 0.}, 
305.977, "", {2000, 1, 7, 0, 0, 0.}, 25., 
"", {2000, 1, 7, 0, 0, 0.}, 43.08, "", {2000, 1, 7, 0, 0, 0.}, 
22.33, "", {2000, 1, 7, 0, 0, 0.}, 27.6}, {{2000, 1, 10, 0, 0, 0.}, 
18.272, "", {2000, 1, 10, 0, 0, 0.}, 307.742, 
"", {2000, 1, 10, 0, 0, 0.}, 25.11, "", {2000, 1, 10, 0, 0, 0.}, 
44.644, "", {2000, 1, 10, 0, 0, 0.}, 22.667, 
"", {2000, 1, 10, 0, 0, 0.}, 28.7}, {{2000, 1, 11, 0, 0, 0.}, 
18.103, "", "", "", "", {2000, 1, 11, 0, 0, 0.}, 23.995, 
"", {2000, 1, 11, 0, 0, 0.}, 43.323, "", {2000, 1, 11, 0, 0, 0.}, 
21.883, "", {2000, 1, 11, 0, 0, 0.}, 
28.6}, {{2000, 1, 12, 0, 0, 0.}, 18., "", "", "", 
"", {2000, 1, 12, 0, 0, 0.}, 24., "", {2000, 1, 12, 0, 0, 0.}, 
41.45, "", {2000, 1, 12, 0, 0, 0.}, 21.717, 
"", {2000, 1, 12, 0, 0, 0.}, 28.19}, {{2000, 1, 13, 0, 0, 0.}, 
17.462, "", "", "", "", {2000, 1, 13, 0, 0, 0.}, 23.75, 
"", {2000, 1, 13, 0, 0, 0.}, 41.169, "", {2000, 1, 13, 0, 0, 0.}, 
22.163, "", {2000, 1, 13, 0, 0, 0.}, 
27.4}, {{2000, 1, 14, 0, 0, 0.}, 16.837, "", "", "", 
"", {2000, 1, 14, 0, 0, 0.}, 23.75, "", {2000, 1, 14, 0, 0, 0.}, 
42.152, "", {2000, 1, 14, 0, 0, 0.}, 22.583, 
"", {2000, 1, 14, 0, 0, 0.}, 27.2}, {{2000, 1, 17, 0, 0, 0.}, 17., 
"", "", "", "", {2000, 1, 17, 0, 0, 0.}, 23.935, "", "", "", 
"", {2000, 1, 17, 0, 0, 0.}, 23.33, "", {2000, 1, 17, 0, 0, 0.}, 
27.28}, {{2000, 1, 18, 0, 0, 0.}, 17., "", "", "", 
"", {2000, 1, 18, 0, 0, 0.}, 23.975, "", "", "", 
"", {2000, 1, 18, 0, 0, 0.}, 22.667, "", {2000, 1, 18, 0, 0, 0.}, 
27.}, {{2000, 1, 19, 0, 0, 0.}, 16.65, "", "", "", 
"", {2000, 1, 19, 0, 0, 0.}, 24.92, "", "", "", 
"", {2000, 1, 19, 0, 0, 0.}, 23.3, "", {2000, 1, 19, 0, 0, 0.}, 
27.59}}

I want to extract every list which has length 14 or 15. This should leave me with eight lists, because I also want to drop the lists/columns with no values in it.

This is how the output should look like after extraction

  {{{"ADS GY Equity", "", "BAS GY Equity", "", "BEI GY Equity", "", 
 "BMW GY Equity", ""}, {"Date", "PX_LAST", "Date", "PX_LAST", 
 "Date", "PX_LAST", "Date", "PX_LAST"}, {"addidas", 17.925, "basse",
  24.875, "begge", 23.667, "BMW", 29.49}, {{2000, 1, 4, 0, 0, 0.}, 
 17.5, {2000, 1, 4, 0, 0, 0.}, 23.925, {2000, 1, 4, 0, 0, 0.}, 
 21.333, {2000, 1, 4, 0, 0, 0.}, 28.3}, {{2000, 1, 5, 0, 0, 0.}, 
 17.5, {2000, 1, 5, 0, 0, 0.}, 23.375, {2000, 1, 5, 0, 0, 0.}, 
 21., {2000, 1, 5, 0, 0, 0.}, 27.74}, {{2000, 1, 6, 0, 0, 0.}, 
 18.25, {2000, 1, 6, 0, 0, 0.}, 24.015, {2000, 1, 6, 0, 0, 0.}, 
 21.867, {2000, 1, 6, 0, 0, 0.}, 27.65}, {{2000, 1, 7, 0, 0, 0.}, 
  18., {2000, 1, 7, 0, 0, 0.}, 25., {2000, 1, 7, 0, 0, 0.}, 
 22.33, {2000, 1, 7, 0, 0, 0.}, 27.6}, {{2000, 1, 10, 0, 0, 0.}, 
 18.272, {2000, 1, 10, 0, 0, 0.}, 25.11, {2000, 1, 10, 0, 0, 0.}, 
 22.667, {2000, 1, 10, 0, 0, 0.}, 28.7}, {{2000, 1, 11, 0, 0, 0.}, 
 18.103, {2000, 1, 11, 0, 0, 0.}, 23.995, {2000, 1, 11, 0, 0, 0.}, 
 21.883, {2000, 1, 11, 0, 0, 0.}, 28.6}, {{2000, 1, 12, 0, 0, 0.}, 
 18., {2000, 1, 12, 0, 0, 0.}, 24., {2000, 1, 12, 0, 0, 0.}, 
 21.717, {2000, 1, 12, 0, 0, 0.}, 28.19}, {{2000, 1, 13, 0, 0, 0.}, 
 17.462, {2000, 1, 13, 0, 0, 0.}, 23.75, {2000, 1, 13, 0, 0, 0.}, 
 22.163, {2000, 1, 13, 0, 0, 0.}, 27.4}, {{2000, 1, 14, 0, 0, 0.}, 
 16.837, {2000, 1, 14, 0, 0, 0.}, 23.75, {2000, 1, 14, 0, 0, 0.}, 
 22.583, {2000, 1, 14, 0, 0, 0.}, 27.2}, {{2000, 1, 17, 0, 0, 0.}, 
 17., {2000, 1, 17, 0, 0, 0.}, 23.935, {2000, 1, 17, 0, 0, 0.}, 
 23.33, {2000, 1, 17, 0, 0, 0.}, 27.28}, {{2000, 1, 18, 0, 0, 0.}, 
 17., {2000, 1, 18, 0, 0, 0.}, 23.975, {2000, 1, 18, 0, 0, 0.}, 
 22.667, {2000, 1, 18, 0, 0, 0.}, 27.}, {{2000, 1, 19, 0, 0, 0.}, 
 16.65, {2000, 1, 19, 0, 0, 0.}, 24.92, {2000, 1, 19, 0, 0, 0.}, 
 23.3, {2000, 1, 19, 0, 0, 0.}, 27.59}}}

If I where to explain simply what has been done, It only extracted the companies date and closing price where the number of closing prices was equal to the number of longest number of closing prices for any of the companies.

If I delete all fields with value "" none of the lists are length 14 or 15. With your data as dat Length /@ DeleteCases[dat, "", {2}] yields {6, 12, 12, 12, 12, 12, 12, 12, 10, 10, 10, 10, 8, 8, 8} — Mr.Wizard, Aug 31 '13 at 05:56
To make it simple I want to pick out every stock, all its input. But I want to drop the empty columns as well as the stocks with dates not as late as the latest date. — ALEXANDER, Aug 31 '13 at 06:03
This means that it should drop ALV GY Equity, its column with dates and close price, the same with BAYN GY Equity. This is because the last date in this lists is not equal to 19/01/00. In other words they do not have equal amount of rows. — ALEXANDER, Aug 31 '13 at 06:05
The list is going to grow larger and larger, so I want a function that I can run to extract the values automatically. — ALEXANDER, Aug 31 '13 at 06:06
I'm sorry, but I still don't understand. Could you please give a somewhat shorter example and the exact output you desire? — Mr.Wizard, Aug 31 '13 at 06:09

score 4 · Accepted Answer · edited Apr 13 '17 at 12:55

4

Starting with your data assigned to dat please try this:

tdat = Partition[dat\[Transpose], 2, 3];

newdat = Join @@ DeleteCases[tdat, {{__, ""}, _}]\[Transpose];

I am going by the assumption that any primary column ending with "" means that it is short and should be pared from the table.

Sample:

newdat // MatrixForm

enter image description here

Explanation

As requested here is an explanation of my code. Due to the format of tables in Mathematica it is usually much easier to operate on rows rather than columns therefore my first step is to Transpose the data. (\[Transpose] is a special transpose symbol; see the linked documentation.)

When looking at the data a structure becomes apparent: there are two columns of data in each group, and a spacing or dividing column containing nothing but "". (Indicentially these dividing columns are completely omitted in my output; if you need them, ask, and I'll put them back in.) I therefore Partition the data into groups of two columns, moving three places between each group and thereby skipping the "empty" dividing columns. I call this data tdat.

Now that the data is in a row-based form and grouped we can use fairly simple pattern matching to extract only the full-length sections. It is simpler in this case to delete the unwanted ones so I use DeleteCases. My pattern is:

{{__, ""}, _}

This pattern will be matched to each group of two rows (originally columns). It is a literal list with two elements. The second is _ which is short for Blank[] and matches any single expression; I use this because I want to allow any second column. The first element is {__, ""} which matches a list (here row, originally column) that starts with any series of elements (__, short for BlankSequence[]) and ends with literal "". We therefore match any data group the first row (column) of which ends with "" and delete these.

Finally, the data groups are joined into a single table using Join and @@ (short for Apply), then transposed back to column form. Note that the low precedence of the \[Transpose] operator means that this transpose is done after the Join.

By the way, if you are doing this on a large volume of data there will be faster methods than pattern matching, perhaps using Pick in place of the DeleteCases step. However, I find pattern matching more versatile and concise, and therefore the best place to start unless speed or large data is a priority.

edited Apr 13 '17 at 12:55

Community

1

answered Aug 31 '13 at 06:40

Mr.Wizard

271,378
34
587
1,371

Thank you! Could you explain step by step the logic behind? – ALEXANDER Aug 31 '13 at 06:45
When you say paired from the table does it mean that you are dropping every row with "" in it or does it mean that you are joining them? – ALEXANDER Aug 31 '13 at 06:58
@ALEXANDER I am dropping them from the table. – Mr.Wizard Aug 31 '13 at 07:02
Then its exactly how I want it. – ALEXANDER Aug 31 '13 at 07:02
btw \ does not work when I am writing it. It becomes a small t – ALEXANDER Aug 31 '13 at 07:09
When I use classroom assistance I am able to put in "" but I get an error message when running the function. – ALEXANDER Aug 31 '13 at 07:13
@ALEXANDER Please see my explanation. When you paste \[Transpose] into Mathematica it should indeed become a small capital T -- it is a special transpose operator. If you do not like this you can use e.g. newdat = Transpose[ Join @@ DeleteCases[tdat, {{__, ""}, _}] ]; – Mr.Wizard Aug 31 '13 at 07:18
Could you give me the first function in this output format as well? – ALEXANDER Aug 31 '13 at 07:25
@ALEXANDER it would be: tdat = Partition[Transpose[dat], 2, 3]; -- sorry for the confusion. – Mr.Wizard Aug 31 '13 at 07:28
No its great, the more confused I get and the more I understand after the explanation, usually means that I have learned more! Not only did I learn how extract, but also what small capital T means. Thank you very much! – ALEXANDER Aug 31 '13 at 07:33
Do you know where I can go to see good examples for how to extract and manipulate lists? – ALEXANDER Aug 31 '13 at 07:35
@ALEXANDER I think this site itself has many good examples. You can search for questions with a specific tag by enclosing the tag name in square brackets; here is a search for the word "extract" within the list-manipulation tag: search. See also the basic tutorial on this topic. – Mr.Wizard Aug 31 '13 at 07:53
I am still struggling understanding the pattern matching. If we divide; {{, ""}, _}.. Does { - Represent column 1, "" - Represent column 2, and __ - Represent column 3 and 4 – ALEXANDER Aug 31 '13 at 11:03
@ALEXANDER In my use there is no column 3 or 4; there is only a group of two columns. {__, ""} represents the first column and _ the second. – Mr.Wizard Aug 31 '13 at 11:38
does that mean that the last part _) --> represents group two? – ALEXANDER Aug 31 '13 at 11:39
@ALEXANDER Each group is separately checked against the entire pattern {{__, ""}, _} -- if it matches it is discarded, if it does not match it is kept. I'm going to bed so I won't be able to answer more questions today. – Mr.Wizard Aug 31 '13 at 11:53
Okey! That means that in some groups the pattern doesnt match, would you give an example of how one row where the pattern does not match look like. Looking forward to hear from you! – ALEXANDER Aug 31 '13 at 12:24
@ALEXANDER please meet me in Chat. – Mr.Wizard Sep 01 '13 at 10:40
@ALEXANDER I guess I missed you. I'll try again later. – Mr.Wizard Sep 01 '13 at 12:34

Extracting lists from list based on length

1 Answers1

Explanation

Linked