4

Sometimes when dealing with arrays with intermittent non-numerical values (e.g., NaNs imported from external sources), the common arithmetic functions (e.g., Median) may break annoyingly. Although it's possible to replace those NaNs with Indeterminate and then carefully remove them before applying the arithmetic functions, such operations are rather tedious compared to other computing environments (e.g., numpy) where similar functions would quietly ignore those NaNs and produce results.

I'm wondering if we can create a similar pure numeric environment in Mathematica that can do such jobs more easily?

For example, for an arbitrary array generated using the code below:

arNaN = Array[
  RandomChoice[{RandomReal[], Indeterminate}] &, {4, 2, 3, 5}]

How can we apply the common arithmetic functions (e.g., Median, Quartiles, etc.) without deliberately removing the non-numerical items?

For people who are also familiar with numpy/pandas, I would like something similar there like numpy.nanmedian/pandas.DataFrame.median which can quietly ignore NaN values.

user64494
  • 26,149
  • 4
  • 27
  • 56
sunt05
  • 4,367
  • 23
  • 34
  • Replace those values with Missing[] rather than Indeterminate, then combine those numeric functions with DeleteMissing. For instance, instead of Median try Median@*DeleteMissing. – MarcoB Apr 14 '21 at 19:56
  • Yes, this could be one approach. But I said in the question, the Missing-based approach usually needs one to deliberately specify the levels where DeleteMissing should be applied and would create ragged arrays, leading to failure of common arithmetic functions. So it's not an ideal approach for dealing with regular arrays of arbitrary dimensions. – sunt05 Apr 14 '21 at 20:01
  • how about Block[{Indeterminate = Nothing}, arNaN /. a : {___?NumericQ} :> Median[a]]? – kglr Apr 14 '21 at 20:54
  • thanks @kglr, but this approach is not ideal either as it only applies to the second to last level. – sunt05 Apr 14 '21 at 21:29

3 Answers3

4

Query adds automatic handling for Missing to many built-in functions. It's not always fool-proof, but it deals with easy cases quite well:

Query[Median] @ {1., 2., Missing[]}

1.5

Sjoerd Smit
  • 23,370
  • 46
  • 75
0

Define first the test data:

arNaN = Array[RandomChoice[{RandomReal[], Indeterminate}] &, {4, 2, 3, 5}];

Now we can redefine Indeterminate e.g. as Sequence what has the effect, that all Indeterminate disappear:

Unprotect[Indeterminate]
Indeterminate = Sequence[]
Protect[Indeterminate]

You can now say e.g.

arNaN

enter image description here

and use arNaN as a purly numeric array.

Daniel Huber
  • 51,463
  • 1
  • 23
  • 57
  • 1
    unfortunately this doesn't work and will break the system definition of Indeterminate: all code using Indeterminate will lose the their calculation nature. more dangerously, arNaN = Array[RandomChoice[{RandomReal[], Indeterminate}] &, {4, 2, 3, 5}]; won't produce the required array for testing with mixed numerical and non-numerical values but always produces a regular numerical array. – sunt05 Apr 14 '21 at 20:41
0

Edits:

  1. sorry the code was done in a rush; a correction is now added.
  2. tests for normal numerical arrays are added.
  3. release the restriction on ArrayQ test so the function can be applied to Associations as well.

After several trials, I seem to find a way to deal with this (not very elegant though):

Clear[nanFun]
nanFun[x_/; ArrayDepth[x] == 1, fun_] := 
 fun[Select[x, NumericQ]]
nanFun[x_/; ArrayDepth[x] >= 2, fun_] := 
 nanFun[#, fun] & /@ Transpose[x]
nanFun[fun : (_Function | _Symbol)] := nanFun[#, fun] &

Some tests for arNaN = Array[RandomChoice[{RandomReal[], Indeterminate}] &, {4, 2, 3, 5}] (results may vary):

  1. nanFun[Median][arNaN]: {{0.545609, 0.78447, 0.343546}, {0.293374, 0.243323, 0.183473}, {0.283449, 0.459367, 0.615839}, {0.686463, 0.157145, 0.412199}};

  2. nanFun[Quartiles][arNaN]// Dimensions: {4, 3, 3}.

Then in scenarios where only numeric values should be appreciated, one just needs to replace the normal fun with nanFun[fun] (e.g., Median -> nanFun[Median]).

Also, for pure numerical arrays, nanFun[fun] and fun should produce identical results. More tests:

arNum = RandomReal[10, {2, 4, 5}];
With[{fun = #}, fun[arNum] == nanFun[fun][arNum]] & /@ {Median, 
  Quartiles, Skewness}
(*out: {True, True, True}*)

Comments/suggestions are welcome.

sunt05
  • 4,367
  • 23
  • 34