20

When an important process needs to be kept alive, there are monitoring tools to restart them if they die (e.g. god tasks in ruby), in my case I have an overnight scraping task that I need to have done by the morning. My code maintains state so all that is required is a watchful eye and a few shift-enters, but not if I'm asleep!

My question: Is there any way to detect when a Kernel dies and automatically restart that Kernel and run specific code or perhaps enqueue specific cells to evaluate?

More Details:

My cell is running happily in the notebook, and then at random intervals for unknown reasons the notebook's kernel silently dies (you can tell because all the symbol colorings change), the notebook itself is fine, but the symbol table is empty. However my code maintains state, so to resume all I have to do to resume processing where I left off is to shift enter a single cell.

Kuba
  • 136,707
  • 13
  • 279
  • 740
M.R.
  • 31,425
  • 8
  • 90
  • 281

2 Answers2

17

One approach would be to run the evaluation in a second kernel which is controlled from a main kernel through MathLink/WSTP. Then your main kernel can detect if the MathLink connection dies.

You can implement this manually (a lot of work), or you can try to do it using the parallel computing tools, where much of the groundwork is already laid down.

In fact it turns out that the parallel tools already have the feature to re-launch dead subkernels:

enter image description here

Here's a demonstration of how it works. I killed a subkernel manually using my operating system's process manager. Notice that none of the evaluations were missed; the dead kernel's task was resubmitted for the relaunched one.

enter image description here

If you use this approach, it is still up to you to write your code in a way that is usable with parallel tools. You can, if desired, limit the number of a subkernels to one. SetSharedVariable and SetSharedFunction could be used to maintain some of the state in the main kernel, but for safety it would be good to minimize the interaction between the subkernel and main kernel as much as possible. For example, if your code is crawling websites, then the subkernel can process a complete page without any main kernel interaction, then send back the new links to be followed to the main kernel in one step.

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
13

Assuming FrontEnd survives, prepare 3 cells:

(*init cell, won't be needed later*)

state = CurrentValue[EvaluationNotebook[], {"TaggingRules", "state"}] = 0;

SetOptions[ #, {CellTags -> {"Procedure"}, ShowCellTags -> True} ]& /@ {NextCell[], NextCell @ NextCell[]};

CurrentValue[$FrontEndSession, "ClearEvaluationQueueOnKernelQuit"] = False;


(*main procedure cell*)

Print["cell init session id: ", $SessionID];

Do[ CurrentValue[EvaluationNotebook[], {"TaggingRules", "state"}] = i; Print[i]; If[ MemberQ[{2, 3, 4}, i], Quit[] ] , {i, state + 1, 5} ]


(* restarting procedure *)

If[

< 5

, state = #; Print["procedure was interrupted at state: ", #]; NotebookLocate["Procedure"]; SelectionEvaluate @ EvaluationNotebook[]; ] & @ CurrentValue[EvaluationNotebook[], {"TaggingRules", "state"}]

Select them all and evaluate. TaggingRules are not important, it's just minimal example of preserving state.

cell init session id: 25310486074412977156

1

2

procedure was interrupted at state: 2

cell init session id: 25310486139919804003

3

procedure was interrupted at state: 3

cell init session id: 25310486231460654483

4

procedure was interrupted at state: 4

cell init session id: 25310486323762607669

5

Kuba
  • 136,707
  • 13
  • 279
  • 740