4

I have a list of long directory names. For each directory name, I need a label just long enough to distinguish itself from the rest. Each label should be a FileNameJoin of one or more components of the corresponding directory name.

You can assume that all the directories names have a common beginning and are unique. They tend to differ by the last one or two components.

Example list: {"common/a/b/c", "common/b/c", "common/x/y/z"}

Result: {"a/b/c", "common/b/c", "z"} or equivalent.

Failing to find any system function that does this directly, I'm thinking along the lines of a recursive GroupBy, e.g.

f[dirs_List] := If[Length[dirs] == 1,
    FileNameTake @@ dirs,
    Normal @ GroupBy[dirs, FileNameTake -> FileNameDrop, f]]

f[{"common/a/b/c", "common/b/c", "common/x/y/z"}]

(*Out: {"c" -> {"b" -> {"a" -> "common", "common" -> ""}}, "z" -> "y"} *)

I'm stuck at how to collect from such a structure. Also I'd like to hear your ideas before going further :)

asterix314
  • 1,325
  • 8
  • 19

2 Answers2

3

I'm not quite sure if your proposed nomenclature will be clear enough for end users. Anyway, here is a non optimal way:

dirs = {"\\\\server\\share\\path\\last", 
        "\\\\server\\share\\path\\last2\\last1", 
        "\\\\server\\share\\path\\last2\\last\\last1"};

(* remove the common header*)
rcp = NestWhile[Rest /@ # &, FileNameSplit /@ dirs, Length@Union@#[[All, 1]] == 1 &]

(* Reverse, so to get the shortest distinguishable cutoff *)
r = Reverse /@ rcp

(* Form all possible nomenclatures*)
allcombs = FoldList[Join[#1, #2] &, List /@ #] & /@ r

(* Select the shorter one  (should be improved) *)
FileNameJoin /@  Reverse /@ 
  First@SortBy[Select[Tuples@allcombs, Length@Union@# == Length@# &], Length@Flatten@# &]

(* {"last", "last1", "last\\last1"} *)
Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453
  • Directory names like "\server\share\path" are completely removed from the result. Moreover, the Tuples@allcombs grows exponentially with the length of the directory list... – asterix314 Oct 09 '14 at 07:41
  • @asterix314 The removal of the common header was done on purpose at the first line. If you have files hanging directly on it, just don't do the removal. Your second consideration is the sad truth. – Dr. belisarius Oct 09 '14 at 07:57
1

The “Suffix trie” provides a good way to think about the problem. However, the following implementation uses a more straightforward approach under the same principle. Given a list of directory names, we are to produce the mapping {"dir" -> "label" ...}, where directories with distinctive last n components are labeled by these components concatenated, for smallest n.

Algorithm: first pick out from the directory list those distinguishable by the last component. For the rest, pick by the last 2 components, and so on.

Recursive implemention: let's define labelRules[dirs, n] to give the list of rules {"dir" -> "label" ...} where each label consists of at least n components: pick out from the dirs those distinguishable by exactly n components, augmented by labelRules[rest, n+1]. The recursion stops at labelRules[{}, _] = {}. The final result is given by labelRules[dirs, 1].

labelRules[{}, _Integer] := {}
labelRules[dirs : {__String}, n_Integer: 1] := Module[{s},
    s = Join @@
            Select[
                GatherBy[dirs, FileNameTake[#, -n] &],
                Length[#] == 1 &];
    Map[# -> StringReplace[
            FileNameTake[#, -n],
            $PathnameSeparator -> "."] &, s]
    ~Join~
    labelRules[Complement[dirs, s], n + 1]]


labelRules[{"common/a/b/c", "common/b/c", "common/x/y/z"}]

(*Out: {"common/x/y/z" -> "z", "common/a/b/c" -> "a.b.c", "common/b/c" -> "common.b.c"} *)

Note that the above does not always give the shortest possible label. It will give {"x/a/c" -> "a.c", "x/y/c" -> "y.c"} instead of e.g. {"x/a/c" -> "c", "x/y/c" -> "y.c"}.

asterix314
  • 1,325
  • 8
  • 19