4

I have two lists, age and score, that I want to find correlation on for 50 people. For some people, the score is blank.

Example data:

age={60., 21., 24., 63., 66., 62., 56., 54., 62.,...}
score={720., 880., 980., 820., , 820., 970., 950., 170.,...}

If I simply use DeleteCases to remove the " " from the score list, I change the length and cannot correlate the two lists of different lengths.

Suggestions?

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
Shannon
  • 41
  • 1
  • 1
    I think you can only compare those which has both data so: Pick[age, NumericQ/@score] and then compare with score/.""->Sequence[], or DeleteCases like with your approach. – Kuba Jul 25 '13 at 15:29

2 Answers2

6

First, let's generate some sample data. In your list you actually have an implicit Null rather than the string " " which you also mention. I'll use Null in mine, but the method should be the same whichever it is: Null, " ", "", etc.

age = 1 / Range[10];
score = 10 / age;
score[[{2, 6, 7, 10}]] = Null;

You can pair and filter the lists as Anon showed. I would use Cases or DeleteCases myself, rather than Sequence[]:

DeleteCases[{age, score}\[Transpose], {_, }]
{{1, 10}, {1/3, 30}, {1/4, 40}, {1/5, 50}, {1/8, 80}, {1/9, 90}}
Cases[{age, score}\[Transpose], {_, _?NumberQ}]
{{1, 10}, {1/3, 30}, {1/4, 40}, {1/5, 50}, {1/8, 80}, {1/9, 90}}

Kuba recommended Pick and DeleteCases in a comment:

Pick[age, NumericQ /@ score]
DeleteCases[score, Null]
{1, 1/3, 1/4, 1/5, 1/8, 1/9}

{10, 30, 40, 50, 80, 90}

You could also make use of Position:

pos = Position[score, _?NumberQ];
Extract[#, pos] & /@ {age, score}
{{1, 1/3, 1/4, 1/5, 1/8, 1/9}, {10, 30, 40, 50, 80, 90}}

A different method using Pick:

Pick[{age, score}, {#, #}] &[NumericQ /@ score]
{{1, 1/3, 1/4, 1/5, 1/8, 1/9}, {10, 30, 40, 50, 80, 90}}

Performance

SetAttributes[timeAvg, HoldFirst]
timeAvg[func_] := Do[If[# > 0.3, Return[#/5^i]] & @@ Timing@Do[func, {5^i}], {i, 0, 15}]

age = 1/Range[100000];
score = 10/age;
score[[ RandomSample[Range@100000, 10000] ]] = Null;

Transpose[{age, score}] /. {__, } -> Sequence[]                       // timeAvg
Cases[{age, score}\[Transpose], {_, _?NumberQ}]                       // timeAvg
DeleteCases[{age, score}\[Transpose], {_, }]                          // timeAvg
(pos = Position[score, _?NumberQ]; Extract[#, pos] & /@ {age, score}) // timeAvg
{Pick[age, NumericQ /@ score], DeleteCases[score, Null]}              // timeAvg
Pick[{age, score}, {#, #}] &[NumericQ /@ score]                       // timeAvg
0.03432

0.04244

0.02684

0.05616

0.01872

0.01872

Pick appears to be the fastest on this data; DeleteCases comes in second place.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
4

One way to get rid of the missing elements and the corresponding values from the score list would be this:

tmp=Transpose[{age, score}] /. {__, " "} -> Sequence[]

Then to update age and score you could do this:

{age, score} = Transpose[tmp]
C. E.
  • 70,533
  • 6
  • 140
  • 264