7

Say I have a number of files in a directory at some location (C:\dir1\dir2\dir3...), and I pull an array of strings corresponding to the file names with the command:

fileList = FileNames["*", "C:\\dir1\\dir2\\dir3"]

Here, the files are named with consecutive integers (1 through N) and the output is automatically sorted like the following:

C:\dir1\dir2\dir3\10.tif
C:\dir1\dir2\dir3\11.tif
C:\dir1\dir2\dir3\12.tif
C:\dir1\dir2\dir3\13.tif
C:\dir1\dir2\dir3\14.tif
C:\dir1\dir2\dir3\15.tif
C:\dir1\dir2\dir3\16.tif
C:\dir1\dir2\dir3\17.tif
C:\dir1\dir2\dir3\18.tif
C:\dir1\dir2\dir3\19.tif
C:\dir1\dir2\dir3\1.tif
C:\dir1\dir2\dir3\20.tif
C:\dir1\dir2\dir3\2.tif
C:\dir1\dir2\dir3\3.tif
C:\dir1\dir2\dir3\4.tif
C:\dir1\dir2\dir3\5.tif
C:\dir1\dir2\dir3\6.tif
C:\dir1\dir2\dir3\7.tif
C:\dir1\dir2\dir3\8.tif
C:\dir1\dir2\dir3\9.tif

For any value N, is there an ordering function p I can apply to fileList (using Sort[fileList,p]) to arrange the entries in fileList to obey to proper integer ordering of the file names? Can I supply this ordering function directly to FileNames?

Kuba
  • 136,707
  • 13
  • 279
  • 740
user9564
  • 73
  • 3

3 Answers3

8

You can do this in many ways. For example:

SortBy[filenames, ToExpression@FileBaseName[#] &]

Before @Leonid's comment I used StringDrop[FileNameTake[#], 4]& to get rid of ".tif". Now it is more general FileBaseName.

Kuba
  • 136,707
  • 13
  • 279
  • 740
3

You can extract the numbers from each string using StringCases and ToExpression and sort according to those:

numbersInString[s_] := ToExpression@StringCases[s, NumberString]
SortBy[{"str2", "str10", "str1", "str11"}, numbersInString]
(* {"str1", "str2", "str10", "str11"} *)

However since this only extracts numbers it ignores all other characters in the strings.

Another way to do it is to pad all integers with 0's so they have the same length, and reorder the original list according to that:

StringPadLeft[s_String, n_, x_: " "] := StringJoin@PadLeft[Characters[s], n, x]

Attributes[padStringIntegers] = {Listable};
padStringIntegers[s_String, n_] := 
 StringReplace[s, i : (DigitCharacter ..) :> StringPadLeft[i, n, "0"]]

integerAwareStringSort[l_List] := Module[{
   max = Max[{1, ToExpression@StringCases[l, (DigitCharacter ..)]}]
   },
  l[[ Ordering@padStringIntegers[l, Ceiling@Log[10, max]] ]]
  ]

integerAwareStringSort@{"a10", "b3", "a1", "b20", "3a40", "1a1", "1a50"}
(* {"1a1", "1a50", "3a40", "a1", "a10", "b3", "b20"} *)
ssch
  • 16,590
  • 2
  • 53
  • 88
  • I like the padding. About your first approach, we have to pay attention because OP does't say that dirNumber matters so it could have affect if we take: {"C:\\dir1\\dir3\\dir3\\4.tif", "C:\\dir1\\dir2\\dir3\\10.tif"} – Kuba Sep 17 '13 at 11:46
1

Let's suppose that you know that you won't have more than 1 000 000 different files, a nice trick is to name the i-th file as :

 "C:\\dir1\\dir2\\dir3\\"<>ToString[1000000+i]<>".tif"

Thanks to this, the lexicographic order of the file names works fine !

ssch
  • 16,590
  • 2
  • 53
  • 88
jibe
  • 852
  • 6
  • 12