1

I would like a discrete distance measure between two binary vectors (or strings). Like HammingDistance but I want the vectors to be considered closer if they have more matches that are separated by zeros (or a default value).

For example: given the four vectors and distance measure thedistancemeasure

           vec1={1,0,0,0,0,1,0,1};
           vec2={1,0,1,0,0,1,0,0};

           vec3={1,0,0,0,1,1,0,0};
           vec4={0,1,0,0,1,1,0,0};

such that.

  thedistancemeasure[vec1,vec2]< thedistancemeasure[vec3,vec4]

True

The measure likes small group of matches that are well separated versus a large group of matches that are "connected" or less seperated.

The amount of zeros shouldn't matter, but if it does, I prefers more zeros to give a smaller measure. The more separated the better.

If possible I also want the measure to give even closer distances for higher count of well separated correctly matched ones, for example.

            vec5={1,0,0,1,0,1,0,1};
            vec6={1,0,0,1,0,0,0,1};

would give.

 thedistancemeasure[vec1,vec2]>thedistancemeasure[vec5,vec6]

True

The size of the vectors would always be fixed.

It might be possible using the output from ListCorrelate since it should give the position correlations between lists.

lalmei
  • 3,332
  • 18
  • 26

2 Answers2

2
ClearAll[distF1, distF2]
distF1 = With[{p = Intersection @@ (Flatten@ SparseArray[#]["NonzeroPositions"]&/@ #)}, 
         -Length @ p] &;
distF2 = With[{p = Intersection @@ (Flatten@SparseArray[#]["NonzeroPositions"]&/@#)},
         -Total[Differences@p]] &;

Example:

vec1 = {1, 0, 0, 0, 0, 1, 0, 1};
vec2 = {1, 0, 1, 0, 0, 1, 0, 0};
vec3 = {1, 0, 0, 0, 1, 1, 0, 0};
vec4 = {0, 1, 0, 0, 1, 1, 0, 0};
vec5 = {1, 0, 0, 1, 0, 1, 0, 1};
vec6 = {1, 0, 0, 1, 0, 0, 0, 1};
vecs = {vec1, vec2, vec3, vec4, vec5, vec6};
pairs = Partition[vecs, 2];
plabels = {"v1v2", "v3v4", "v5v6"};

Sort pairs lexicographically in ascending order using the distance function distF1 and breaking ties with the distance function distF2:

SortBy[pairs, {distF1, distF2}] /. Thread[pairs -> plabels]
{"v5v6", "v1v2", "v3v4"}
kglr
  • 394,356
  • 18
  • 477
  • 896
1

I have trouble following your examples but perhaps you can Split your vectors and compare length:

vec1 = {1, 0, 0, 0, 0, 1, 0, 0};
vec3 = {0, 0, 0, 0, 1, 1, 0, 0};
vec5 = {1, 0, 0, 1, 0, 0, 0, 1};

Length /@ Split /@ {vec1, vec3, vec5}
{4, 3, 5}

Combine this with HammingDistance or alternatives with whatever weighting you prefer.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • I changed the example so they are not comparing two identical vectors, thanks. This looks like the way forward! – lalmei May 18 '15 at 10:30