1

I'm trying the Torque scheduler. I have previous experience with the LSF scheduler, and I quite like being able to use 'bsub -I make -j 12' to compile a program really quickly, i.e. running an 'interactive' job, and seeing any errors as they happen.

Now, there is an interactive option in Torque, but it's really, really interactive: you actually have to type the commands by hand, all the time tieing up the machine, and being billed.

So, I'm trying to avoid using Torque's 'interactive' facility, but I'd still like to be able to see the output of a job in real time, eg by doing 'tail -f' on the output file. As far as I can tell, there doesn't appear to be any way of causing the output file to be flushed in real-time, as things happen, in Torque? Or is there?

Summary of goal: - be able to simulate the '-I' option from LSF in Torque

Summary of sub-goal: - be able to request the output file to be flushed to disk continuously, whilst a job is running, in Torque

Hugh Perkins
  • 373
  • 1
  • 13

3 Answers3

2

From the Torque website:

Spooling can be disabled by using the qsub '-k' option. With this option, job output and error streams can be sent directly to a file in the job's working directory bypassing the intermediate spooling step. This is useful if the job is submitted from within a parallel filesystem or if the output is small and the user would like to view it in real-time. If the output is large and the remote working directory is not available via a high performance network, excessive use of this option may result in reduced cluster performance.

GertVdE
  • 6,179
  • 1
  • 21
  • 36
  • I know it's an old post, but I just wish to comment just in case anyone met the same trouble. I tried Gert's answer with -k option but it still changes nothing. Indeed, the log was created once the job starts, but it's edited once and for all when the job terminates. I tried tail -f the file to monitor but even on the execution host (in my case they have a shared filesystem with the control node) this is the case. Still wondering how to read real-time output of qsub. – Evergreen.F Mar 10 '20 at 20:51
1

I've marked Gert's answer as the correct answer given the title of my question. However, for completeness, note that what I did in the end was add an option -x. It looks like -I -x in Torque is approximately equivalent to -I in LSF.

Reference to the -x option: http://docs.adaptivecomputing.com/torque/4-1-3/help.htm#topics/commands/qsub.htm :

"By default, if you submit an interactive job with a script, the script will be parsed for PBS directives but the rest of the script will be ignored since it's an interactive job. The -x option allows the script to be executed in the interactive job and then the job completes.

"For example:

script.sh 
#!/bin/bash 
ls 
---end script--- 
qsub -I script.sh 
qsub: waiting for job 5.napali to start 
dbeer@napali:# 
<displays the contents of the directory, because of the ls command> 
qsub: job 5.napali completed

"

Hugh Perkins
  • 373
  • 1
  • 13
  • Strange. I can't find any docs explaining the -x option... Could you add a reference to a man page or doc page from the Torque website? – GertVdE Jun 26 '13 at 18:59
  • Sure! It's here: http://docs.adaptivecomputing.com/torque/4-1-3/help.htm#topics/commands/qsub.htm – Hugh Perkins Jun 27 '13 at 00:45
  • Thanks for the pointer. We must be running an older version here. I'll talk to our sysadmin ;-) – GertVdE Jun 27 '13 at 06:48
1

If auto connect to nodes is enabled you should be able to do this for batch jobs:

qpeek <jobid> 

This will dump the output log locally.

Usage:  qpeek [options] JOBID

Options:
  -c      Show all of the output file ("cat", default)
  -h      Show only the beginning of the output file ("head")
  -t      Show only the end of the output file ("tail")
  -f      Show only the end of the file and keep listening ("tail -f")
  -e      Show the stderr file of the job
  -o      Show the stdout file of the job

  -ssh               Use the ssh command rather than rsh to remote access the mother superior node
  -spool=<spool_loc> Specifiy the location of the spool directory, defaults to /var/spool/torque/spool
  -host=<host>       The name of the host to use in the filename for the jobs stdout or stderr

  -help|? Display help message
internetscooter
  • 447
  • 2
  • 6