9

In order to avoid conflict with file names produced by some code running in multiple instances of Mathematica I append the date and a random string at the end hoping to create a unique filename.

The date is good enough most of the time but the random integer is there for the (unlikely) event that two files get created in the same second. I create it with:

Random[Integer,10^5]

Now lately I found out this is completely useless, because it seems the multiple instances of Mathematica will use exactly the same data to create a random number creating exactly the same random integer.

What is a simple way to get a better random integer that is not the same when multiple instances of Mathematica run the same code at almost exactly the same time?

rhermans
  • 36,518
  • 4
  • 57
  • 149
Kvothe
  • 4,419
  • 9
  • 28
  • 1
    One possibility would be to use SeedRandom (for example, use the time of day to set the seed at the start of the process). Or maybe to create the file names sequentially rather than randomly. – bill s Jul 04 '19 at 14:17
  • 3
    Why not use unique information such as $ProcessID or even better $SessionID ? – rhermans Jul 04 '19 at 14:27
  • 2
    Related: https://mathematica.stackexchange.com/a/99795/12 I thought recent versions will never start with the same seed, even if they start at the same time. What you describe just should not happen. – Szabolcs Jul 04 '19 at 14:52
  • @Szabolcs, ah that depends on how recent it needs to be. Indeed I should have added the version where I encountered this which is indeed a bit outdated (11.1.1 for Linux x86 (64-bit)). – Kvothe Jul 04 '19 at 16:37
  • @bill, how would I do this? I might need to read the documentation on SeedRandom more clearly but it seems it would not help here. The idea is that the code should produce different output for the same nb (and possibly the same time or very close to it on the system clock), without having to change anything in the .nb, such as the n in SeedRandom manually. – Kvothe Jul 04 '19 at 16:42
  • @rhermans, $SessionID seems to be exactly the thing I need (and magic the way it is described in the documentation, since it leaves out that only probabilistically it will be different from a $SessionID on a different system). – Kvothe Jul 04 '19 at 16:46
  • @Kvothe glad that helped, I posted it as an answer. – rhermans Jul 04 '19 at 17:13
  • Just as an extra remark: if you need to create unique temporary files, CreateFile[] is a good method. You can change the directory where it is created by changing/Blocking the value of $TemporaryDirectory. – Sjoerd Smit Jul 05 '19 at 09:55

3 Answers3

15

It might be best to use CreateUUID.

CreateUUID[]
 "73ccc27c-687f-4eca-8214-ceeb8a8b7773"

The Properties & Relations section shows a way to express this string as an integer if that's what you're after:

FromDigits[StringReplace[CreateUUID[], "-" -> ""], 16]
296740835687065620982102887154699649600
Greg Hurst
  • 35,921
  • 1
  • 90
  • 136
  • 1
    I don't think this is a very good idea, adding an empty-digit number to a file name. UUIDs have their uses, but this shouldn't be one of them. – High Performance Mark Jul 04 '19 at 15:55
  • In my previous comment I wrote umpty-digit number, which got autocorrected to something meaningless. – High Performance Mark Jul 04 '19 at 16:52
  • 2
    Why do you think it is a bad idea, @HighPerformanceMark? – ktm Jul 04 '19 at 17:59
  • 2
    In this case the question seeks a way to disambiguate file names generated by different processes at approximately the same time; about 90% of the digits of the UUID are wasted for this. More generally, UUIDs are a non-human-friendly way of identifying resources. Quickly now, are the following two UUIDs the same ... 7adba397-30fa-45a4-9bd3-6283712a942c, 7adba397-30fa-45a4-9bd3-6283712a942c ? – High Performance Mark Jul 04 '19 at 18:55
  • 2
    @HighPerformanceMark In practice, it is almost always enough to just look at the first couple of characters or the last. The situation you describe is extremely unlikely. StringTake[#, 3] & /@ Table[CreateUUID[], {1000}] // DeleteDuplicates // Length gave me 118 duplicates and that's just for the first three characters. With four characters it dropped to ten duplicates. – C. E. Jul 05 '19 at 04:07
7

This should give you strings that are unique. It uses $SessionID and $ProcessID which are a unique combination by definition, either hashed (almost unique) or plain (unique by design).

Short name, almost unique.

Hash in "Base36String" to keep the string short (28 characters). It has the date and $KernelID in plain text for easy identification.

StringJoin[
 Riffle[
  {
   Hash[{$SessionID , $ProcessID}, "CRC32", "Base36String"],
   DateString[{"Year", "MonthNameShort", "Day", "Hour24", "Minute", 
     "Second", "MillisecondShort"}],
   ToString[$KernelID]
   }, "-"]
 ]

"02y4q0o-2019Jul04180113431-0"

The probability of collision for a "CRC32" hash is extremely low, $(n - 1) / 2^{32}$ and way better than your $(n - 1) / 10^5$, and occurring at the same time (within a millisecond) and in the same kernel is in practical terms imposible.


Long names, absolutely unique

If you can afford extremely long names, you could leave $SessionID and $ProcessID unhashed.

You can shorten the string length also using IntegerString with "Base64" encoding.

StringJoin[
 Riffle[
  Flatten@{
    StringDelete[
     IntegerString[{$SessionID , $ProcessID}, "Base64"], {"+", "/", 
      "="}],
    DateString[{"Year", "MonthNameShort", "Day", "Hour24", "Minute", 
      "Second", "MillisecondShort"}],
    ToString[$KernelID]
    }, "-"]
 ]

Otherwise use them to define folder (directory) names.

FileNameJoin[
 ToString /@ {
   $MachineName,
   $SessionID ,
   $ProcessID,
   DateString[
    {"Year",
     "MonthNameShort",
     "Day",
     "Hour24",
     "Minute",
     "Second",
     "MillisecondShort"
     }]
   }
 ]
rhermans
  • 36,518
  • 4
  • 57
  • 149
0

If the process that created the file is important information then I would just disambiguate the filenames by adding the process name to the file name.

processname+datetime

If you have multiple instances of the same process (by way of multi threading/multitasking), only then would I generate a small unique id at the beginning of the process to distinguish between the writing process.

processname+shortuuid+datetime

Alternatively, if it’s possible in Mathematica, name your multiple threads and use their name to distinguish. This will be useful for forensics.

  • Hi, care to share some code to do what you suggest? – rhermans Jul 05 '19 at 09:25
  • @rhermans Unfortunately, I do not know Mathematica. I could spend some time learning it to try to provide more specificity, but mostly I wanted to offer an alternative to using a UUID, which generates unnecessarily long and difficult to parse names. I see that since I wrote my answer you have provided a more in-depth answer, one which I quite like! – Connor McCormick Jul 05 '19 at 18:15