What software is good to use for parallel debugging?

Question

I'm not running any parallel code right now, but I anticipate running parallel code in the future using a hybrid of OpenMP and MPI. Debuggers have been invaluable tools for me when running serial projects.

Can anyone recommend a parallel debugger (or multiple debuggers) to use for debugging parallel software? Free software would be preferable, but don't hesitate to mention effective commercial software.

I don't see how the answers here will differ significantly from http://stackoverflow.com/questions/329259/how-do-i-debug-an-mpi-program. MPI is the hard part here, not the OpenMP. In any case, debugging race conditions in threaded programs is borderline unsolvable right now. — Jeff Hammond, Jul 23 '13 at 06:44
ThreadSanitizer is a good solution for debugging race conditions in threaded programs, though I know of nobody who's tried to add MPI to the mix! — mabraham, Aug 26 '15 at 21:06

score 18 · Accepted Answer · answered Dec 14 '11 at 04:58

18

There are basically two major, commercial choices out there: DDT from Allinea (which is what we use at TACC) and Totalview (as mentioned in the other comment). They have comparable features, are both actively developed, and are direct competitors.

Eclipse has their Parallel Tools Platform, which should include MPI and OpenMP programming support and a parallel debugger.

answered Dec 14 '11 at 04:58

Bill Barth

10,905
1
21
39

I've never heard of anyone using the PTP parallel debugger. I'm not sure what that means... – Jeff Hammond Jul 23 '13 at 06:50
I have a few colleagues that have tried, but I've never played with it myself. – Bill Barth Jul 23 '13 at 19:31
For parallel programs on the GPU, there's also CUDA-GDB. – Anderson Green Dec 17 '20 at 03:48

Matt Knepley · Answer 2 · 2011-12-14T21:12:18.757

17

I must give the curmudgeon answer. My productivity has never been improved by any of the suggestions above. They are slow and expensive compared to my preferred option in parallel: one gdb session per process. Each gdb can connect to an MPI process and sit in an xterm (this happens automatically in PETSc using -start_in_debugger). I have used this for 15 years, happily. Objections:

1) I can't look at global data

Since MPI is a shared-none model, there is no global data, only local data

2) This strategy does not scale to lots of processes

Neither do bugs. Bugs happen on individual processes, maybe with input from 1 or 2 neighbors. YOu can easily spawn gdb only on the participating processes (in PETSc you use -debugger_nodes 0,5,17 for example). Also, the above systems give up a lot when the run on every process, which makes them slow. The gdb method is, in fact, much more scalable.

gdb is also very portable. It runs everywhere, understands C++ and Fortran, and allows you to execute arbitrary code inside the run. I have written special functions to easily display data when running in it.

edited Dec 14 '11 at 21:12

answered Dec 14 '11 at 15:10

Matt Knepley

4,269
24
23

4

Hey Coward, if you downvote, leave a comment. – Matt Knepley Dec 14 '11 at 16:48
5

I wasn't the down vote, but I do disagree to some extent. I've encountered a few bugs at scale that didn't show at small sizes, and using a parallel debugger was an efficient way to find them. I do most of my debugging with printf and attaching to individual processes with gdb, but I have seen the benefit of having a parallel debugger. – Bill Barth Dec 14 '11 at 17:08
3

The only time I have ever encountered a bug at scale was a performance bug due to an improper collective communication algorithm being chosen. Then again, my view is even more extreme than Matt's, as the closest thing to a debugger that I ever use is valgrind. – Jack Poulson Dec 14 '11 at 17:19
1

@BillBarth I know you are right that bugs exist on 1000 proces that do not show up on smaller problems (Dinesh had a famous PETSc one that lasted for months that only showed up on 82 procs). My point was more to counter the prevailing wisdom. I think parallel debuggers are a good last resort, not first resort. – Matt Knepley Dec 14 '11 at 17:20
3

I downvoted you. Your answer not what was asked. – aterrel Dec 14 '11 at 21:08
+1 for gdb. I was going to suggest printf() for C/C++ and write() in Fortran, but didn't want to get yelled at. :) – milancurcic Dec 15 '11 at 03:20
@aterrel I disagree. Just because N instances of GDB is trivially parallel does not mean that it is incorrect to say that GDB is parallell debugging software. Let's just call Matt's approach the Monte Carlo of parallel debugging. – Jeff Hammond Jul 23 '12 at 02:08
I agree with @Matt. I do not have massive experience with heavy asynchronous communication, but Long time ago we tried TotalView, but at the end, gdb/lldb per process, with some initial scripts is the quickest/simplest way to work. – likask May 28 '17 at 09:54
Is be interested to see a more complete explanation of how to get the gdb processes running. – Richard Aug 24 '19 at 17:26

score 7 · Answer 3 · answered Jul 23 '13 at 06:46

7

I use only two debuggers for serial and parallel programs:

The Kernighan debugger, i.e. judicious print statements and careful thinking.
Multiple instances of GDB as described o http://www.open-mpi.org/faq/?category=debugging#serial-debuggers.

In the case where (2) is not sufficiently scalable, I refer to (1b).

answered Jul 23 '13 at 06:46

Jeff Hammond

2,116
16
22

1

I've never heard the name "Kernighan debugger", but I approve, as it is how I always debug. – Jack Poulson Jul 23 '13 at 07:26

score 4 · Answer 4 · answered Dec 14 '11 at 05:12

There is Intel Parallel Studio which includes a parallel debugger. I've never worked with it but I've seen it used in a few demos. Here's a video tutorial that shows some of the features.

I've also seen a few wrappers around gdb that worked reasonably well in certain cases.

Yann · Answer 5 · 2011-12-14T03:33:23.470

3

Totalview. It's a commercial debugger. It's very easy to view the stack on each processor. You can see variable values (and change them) across processors/threads. You can plot vectors or matracies to visualize variable values. Apparently scripting is possible too (Tk/Tcl), for sofisticated watch point analysis, though I've never worked with this myself.

edited Dec 14 '11 at 03:33

answered Dec 14 '11 at 03:28

Yann

852
6
17

On the subjective side, when my university's HPC center installed this I thought it was overkill. Then I found out how easy it was to do very complicated debugging. It's really a great program. – Yann Dec 14 '11 at 03:30
I second totalview too. I have used it in many instances and it is extremely powerful, albeit highly expensive... – BlaB May 30 '17 at 17:10

score 1 · Answer 6 · edited Aug 24 '19 at 15:20

1

I wonder why no one mentioned Padb (Parallel Application Debugger) which is open source and free software as the OP prefers, but not as powerful as commercial counterparts for example: TotalView for HPC

edited Aug 24 '19 at 15:20

Anton Menshov

8,672
7
38
94

answered Aug 24 '19 at 13:48

Hefnawi

11
2

score 1 · Answer 7 · edited May 28 '17 at 07:11

1

For a couple simple ways to debug parallel codes, we've collected a few answers in the deal.II FAQs in the section on debugging: https://github.com/dealii/dealii/wiki/Frequently-Asked-Questions#debugging-dealii-applications

edited May 28 '17 at 07:11

Jakub Klinkovský

442
3
22

answered Dec 16 '11 at 04:39

Wolfgang Bangerth

55,373
59
119

score -1 · Answer 8 · answered Dec 14 '11 at 15:27

-1

Here's a digest of some answers given to me previously :

OpenMP has timing functions: omp_get_wtime() and omp_get_wtick() -- online docs

Google has a CPU profiler

There's Scalasca that does OpenMP and MPI profile and analysis

Then there's Tau and vtune which I haven't used.

Good luck!

answered Dec 14 '11 at 15:27

Mikhail

107
1

I don't think the question is about timing, but I might be wrong. Good suggestions though... – Yann Dec 14 '11 at 15:44
This answer is more about profiling than debugging... – mbq Dec 14 '11 at 15:44
I have found that profiling tools make good substitutes for parallel debuggers. I find it often the case that parallel bugs are related to performance problems, such as logjam in MPI. Performance tools will often reveal this. TAU's memory profiler is good for figuring out why random segfaults may occur. – Jeff Hammond Jul 23 '12 at 02:10

What software is good to use for parallel debugging?

8 Answers8