9

How can one import the coordinates stored in a binary CHARMM/NAMD DCD file?

This is the structure of the file:

HDR     NSET    ISTRT   NSAVC   5-ZEROS NATOM-NFREAT    DELTA   9-ZEROS
`CORD'  #files  step 1  step    zeroes  (zero)          timestep  (zeroes)
                        interval
C*4     INT     INT     INT     5INT    INT             DOUBLE  9INT
==========================================================================
NTITLE          TITLE
INT (=2)        C*MAXTITL
                (=32)
==========================================================================
NATOM
#atoms
INT
==========================================================================
X(I), I=1,NATOM         (DOUBLE)
Y(I), I=1,NATOM         
Z(I), I=1,NATOM         
==========================================================================

A sample pdb and dcd file is here.

A C plugin for reading the DCD can be found here. As with any file format without a rigid formal specification there is a lot of variation and different edge cases...


Surprisingly googling did not turn up any ready made recipes. I'm hoping somebody has something on the shelf:)

Of course this is trivial to implement using BinaryReadList, so if nobody else answers, I will answer this question myself, there by still making it useful to the community.

Ajasja
  • 13,634
  • 2
  • 46
  • 104
  • Is the dcd format related to the pdb format? – rcollyer Apr 10 '12 at 15:06
  • Well, normally one needs a combination of a protein structure file - psf (http://www.lrz.de/~heller/ego/manual/node88.html) or pdb file, which contains the atom types and connectivity and a dcd file which contains the coordinates at different times (the trajectory). But the internal structure of the two formats is completely different. – Ajasja Apr 10 '12 at 17:50

1 Answers1

6

I ported most of the matlab package to Mathematica. Here is the result

(*Some utility functions*)
SetAttributes[MapShowIt, {HoldAll, Listable}];
MapShowIt[code__] := MapShowIt[{code}];
MapShowIt[code_] := With[{y = code}, Defer[code = y]]

ClearAll[Puts]
SetAttributes[Puts, HoldAll];
Options[Puts] = {DisplayFunction -> Shallow};
$VerbosePrint = False;
Puts[msg_, data_, opt : OptionsPattern[]] := 
  If[$VerbosePrint, Print[msg, OptionValue[DisplayFunction][data]]];
Puts[msg_List] := If[$VerbosePrint, Print[MapShowIt[msg]]];
Puts[msg_] := If[$VerbosePrint, Print[msg]];
(*warning this is still unsafe ... it should be done as \
answered  herehttp://mathematica.stackexchange.com/a/4189/745*)

Unprotect[Dot];
SetAttributes[Dot, HoldRest];
Dot[h_, key_Symbol | key_String] := 
  System`Utilities`HashTableGet[h, ToString[Unevaluated[key]]];
Dot /: Set[Dot[h_, key_Symbol | key_String], value_] := (
   Quiet[
      System`Utilities`HashTableRemove[h, ToString[Unevaluated[key]]]
    , System`Utilities`HashTableRemove::norem];
   System`Utilities`HashTableAdd[h, ToString[Unevaluated[key]], 
    value]);
Protect[Dot];
On[Assert];

ClearAll[readDCDHeader];
Options[readDCDHeader] = {"Verbose" -> False};
readDCDHeader::nonintframes = 
  "Number of frames calculated from files size (`1`) is not an \
integer!";
readDCDHeader::diffframes = 
  "Header claims `1` frames, but there are `2` frames!";
readDCDHeader[fileName_, opts : OptionsPattern[]] :=
  Module[{str, magicnum, h, i, newsize, newsize1, numlines, fint}, 
   Block[{$VerbosePrint = OptionValue["Verbose"]},
      h = System`Utilities`HashTable[];

      str = OpenRead[fileName, BinaryFormat -> True];


      fint = BinaryRead[str, "Integer32"];
      (*if we don't read an 84 then try to reverse the endianes
      If[fint=!=84,(*then*)
        Close[str];
       str= OpenRead[fileName,BinaryFormat -> True,
    ByteOrdering->-$ByteOrdering];]*)
      Assert[fint == 84, "First integer must be 84"];
      h.stream = str;
      magicnum = BinaryReadList[str, "Character8", 4];

      Assert[magicnum == {"C", "O", "R", "D"}, "CORD not present"];

      h.nset = BinaryRead[str, "Integer32"];
      h.istart = BinaryRead[str, "Integer32"];
      h.nsavc = BinaryRead[str, "Integer32"];

      (*read free indexes*)
      SetStreamPosition[str, 40];
      h.numfree = BinaryRead[str, "Integer32"];

      Puts[{h.nset, h.istart, h.nsavc, h.numfree}];

      (*find out if is charmm DCD*)
      SetStreamPosition[str, 84];
      i = BinaryRead[str, "Integer32"];
      If[i == 0, (*then*)
         h.charmm = False;
       ,(*else*)
       h.charmm = True;

       (* check for extra block*)
       SetStreamPosition[str, 48];
       i = BinaryRead[str, "Integer32"];
       h.charmm$extrablock = (i == 1);

                     SetStreamPosition[str, 52];
                      i = BinaryRead[str, "Integer32"];
       h.charmm$4dims = (i == 1);
       ];

      Puts[{h.charmm, h.charmm$extrablock, h.charmm$4dims}];

      (*read the timestep*)  
      SetStreamPosition[str, 44];
      If[h.charmm, (*then*)
       h.DELTA = BinaryRead[str, "Real32"];
       ,(*else*)
       h.DELTA = BinaryRead[str, "Real64"];];
      h.step = h.DELTA;

      (*get the title*)
      SetStreamPosition[str, 92];
      newsize = BinaryRead[str, "Integer32"];
      numlines = BinaryRead[str, "Integer32"];
      (*TODO check for curoupted Ntitle values*)
      h.title = 
     StringJoin@BinaryReadList[str, "Character8", numlines*80];
      newsize1 = BinaryRead[str, "Integer32"];
      Assert[newsize == newsize1];
      Puts[h.title];
      i = BinaryRead[str, "Integer32"]; 
    Assert[i == 4, "4 must be read before num of atoms"];
      h.numatoms = BinaryRead[str, "Integer32"];
      h.N = h.numatoms;
      i = BinaryRead[str, "Integer32"]; 
    Assert[i == 4, "4 must be read after num of atoms"];
      Puts[{h.DELTA, h.N}];

      (*love this comment from the original matdcd package*)
      (*stuff with freeindexes.  Just smile and nod.*)
      If[ h.numfree =!= 0, (*then*)
          i = BinaryRead[str, "Integer32"];  (* should be N-
     NAMNF*4*)

     h.freeindexes = BinaryReadList[str, "Integer32", h.N - h.numfree];
          i = BinaryRead[str, "Integer32"];  (* should be N-
     NAMNF*4*)
       ];

    h.headerend = StreamPosition[str];  
    (*calculate one frame size in bytes*)
    h.framesize = 3*(h.numatoms*4 + 2*4(*for the blocksize*))
        + If[h.charmm$extrablock, 12*4 + 2*4, 0]
        + If[h.charmm$4dims, +h.numatoms*4 + 4*2, 0];

    h.numframes = (FileByteCount[fileName] - h.headerend)/h.framesize;
    (* Warn if noninteger frame number and if the actuaal frames \
differ from h.nset*)
    If[Head[h.numframes] =!= Integer,(*then*)
       Message[readDCDHeader::nonintframes, h.numframes]];
    If[Head[h.numframes] != h.nset,(*then*)
       Message[readDCDHeader::diffframes, h.nset, h.numframes]];

    Return[h];
    ]];


ClearAll[readDCDStep];
Options[readDCDStep] = {"Verbose" -> False,

   "Atoms" -> All(*or a one based list of atoms to take*)};
readDCDStep[h_System`Utilities`HashTable, opts : OptionsPattern[]] :=


  Module[{x, y, z, str, blocksize, ind}, 
   Block[{$VerbosePrint = OptionValue["Verbose"]},
      ind = OptionValue["Atoms"];
      Assert[h.numfree == 0, 
     "Fixed atoms anad free indices are not supported"];
      str = h.stream;

      If[h.charmm && h.charmm$extrablock, (*then*)
       (*unit cell info*) 
         blocksize = BinaryRead[str, "Integer32"];
         Puts[
      "Skipping unit info cords. (blocksize: " <> 
       ToString[blocksize] <> ")"];
         Skip[str, "Byte", blocksize];

         Assert[blocksize == BinaryRead[str, "Integer32"], 
      "Wrong blocksize in extra block "];
       ];
     (* Get x coordinates *)
      blocksize = BinaryRead[str, "Integer32"];
      Puts[
     "Getting x cords. (blocksize: " <> ToString[blocksize] <> ")"];
      x = BinaryReadList[str, "Real32", blocksize/4]; 
      If[Head[ind] == List, x = Part[x, ind]];
      Puts["x:\n", x];
      Assert[blocksize == BinaryRead[str, "Integer32"], 
     "Wrong blocksize in x coords"];

      (* Get y coordinates *)
      blocksize = BinaryRead[str, "Integer32"];
      Puts[
     "Getting y cords. (blocksize: " <> ToString[blocksize] <> ")"];
      y = BinaryReadList[str, "Real32", blocksize/4];
      If[Head[ind] == List, y = Part[y, ind]];
      Puts["y:\n", y];
      Assert[blocksize == BinaryRead[str, "Integer32"], 
     "Wrong blocksize in y coords"];

      (* Get z coordinates *)
      Puts[
     "Getting z cords. (blocksize: " <> ToString[blocksize] <> ")"];
      blocksize = BinaryRead[str, "Integer32"];
      z = BinaryReadList[str, "Real32", blocksize/4];
      If[Head[ind] == List, z = Part[z, ind]];
      Puts["z:\n", z];
      Assert[blocksize == BinaryRead[str, "Integer32"], 
     "Wrong blocksize in z coords"];

      (*skip 4th dimension if it exists*)
      If[h.charmm && h.charmm$4dims,(*then*)
         Puts["Skipping w cords."];   
         blocksize = BinaryRead[str, "Integer32"];
                       Skip[str, "Byte", blocksize];

     Assert[blocksize == BinaryRead[str, "Integer32"], 
      "Wrong blocksize in 4th dim"];
       ];
    Assert[Length[x] == Length[y] == Length[z], 
     "Wrong size of x or y or z"];  
    Return[Developer`ToPackedArray[Transpose@{x, y, z}]];
    ]];

ClearAll[CloseDCD];
CloseDCD[h_System`Utilities`HashTable] := Close[h.stream];

ClearAll[ImportDCD];
Options[ImportDCD] = Evaluate[Options[readDCDStep]];
ImportDCD[fileName_, options : OptionsPattern[]] :=
  Module[{h, data, ropts, opts}, 
   Block[{$VerbosePrint = OptionValue["Verbose"]},
      opts = 
     DeleteDuplicates[{options}~Join~Options[ImportDCD], 
      ToString[First[#1]] == ToString[First[#2]] &];
      Puts[opts];  
      h = readDCDHeader[fileName, Verbose -> OptionValue["Verbose"]];

      ropts = Evaluate[FilterRules[opts, Options[readDCDStep]]];
      data = (readDCDStep[h, ropts])
                  & /@ Range[h.numframes];
      CloseDCD[h];

      Return[data]
    ]];

Some example usage:

  • Read the header

    h = readDCDHeader["wat.dcd", Verbose -> True]; CloseDCD[h];

  • Read one step

    h = readDCDHeader["wat.dcd", Verbose -> True]; data = readDCDStep[h, Verbose -> True]; Short[data] CloseDCD[h];

  • Import the whole trajectory

    data = ImportDCD["wat.dcd", Verbose -> False]; Dimensions[data] (*and for fun plot it:*)

    Manipulate[ pdata = Sphere[#] & /@ data[[i]]; Graphics3D[pdata] , {i, 1, Length[data], 1}]

Mathematica graphics

Its also possible to specify a list of one based indexes of which atoms to load using the "Atoms" option.

I really should put this on github and convert it into a package...

rcollyer
  • 33,976
  • 7
  • 92
  • 191
Ajasja
  • 13,634
  • 2
  • 46
  • 104
  • 1
    Have you thought about adapting it to the ImportConverter paradigm? With the functionality written, it shouldn't be terribly difficult to do. – rcollyer Apr 26 '12 at 15:30
  • A good idea and it should be fairly trivial to implement. – Ajasja Apr 27 '12 at 08:54
  • 2
    You should host this at a more suitable place, such as GitHub. Very useful package, works as advertised! – Szabolcs Dec 02 '13 at 23:40
  • @Szabolcs indeed, good idea! Do you know if this nice code ever made it to GitHub? It still works ! How amazing and discouraging that nearly 10 years later, in 13.0.1 , MemberQ[$ImportFormats, "DCD"] still gives False .. – Rolf Mertig Mar 19 '22 at 23:04
  • 1
    I put those Dot changes above into a function changeDot[], made up a restoreDot[]:= ( Unprotect[Dot]; ClearAttributes[Dot, HoldRest]; DownValues[Dot] = {}; UpValues[Dot] = {}; Protect[Dot]; ); and added changeDot[] at the beginning in ImportDCD and restoreDot[] at the end, and then no side effects will occur. – Rolf Mertig Mar 21 '22 at 10:46