7

I have a set of data read in from a larger CSV file. I want it to match the format I have from another analysis. It is structured as a series of strings with letters and numbers Here it is copy-pasted from Mathematica

{"{H, 1}", "{H, 2}", "{H, 3}", "{Mg, 1}", "{Mg, 1}", "{Mg, 1}", "{C, \
1}", "{C, 1, H, 1}", "{N, 1}", "{N, 1, H, 1}"}

I want to convert it to:

{{"H", "1"}, {"H", "2"}, {"H", "3"}, {"Mg", "1"}, {"Mg", "1"}, {"Mg", 
  "1"}, {"N", "1"}, {"N", "1", "H", "1"}}

I've tried Read and StringToStream but I haven't been able to do what I need.


I've also solved it. I wasn't on the right track, once I looked at the simple string manipulation tools after posting the question the solution jumped out at me.

data = Map[StringTrim[#, ("{" | "}") ...] &, data];
data = Map[StringSplit[#, ","] &, data]
data = StringTrim/@data
rm -rf
  • 88,781
  • 21
  • 293
  • 472
s0rce
  • 9,632
  • 4
  • 45
  • 78

3 Answers3

7

Another solution using StringSplit:

list = {"{H, 1}", "{H, 2}", "{H, 3}", "{Mg, 1}", "{Mg, 1}", "{Mg, 1}",
   "{C, 1}", "{C, 1, H, 1}", "{N, 1}", "{N, 1, H, 1}"};

StringTrim /@ StringSplit[list, {"{", ",", "}"}]
{{"H", "1"}, {"H", "2"}, {"H", "3"}, {"Mg", "1"}, {"Mg", "1"}, {"Mg", "1"}, 
 {"C", "1"}, {"C", "1", "H", "1"}, {"N", "1"}, {"N", "1", "H", "1"}}
Heike
  • 35,858
  • 3
  • 108
  • 157
  • 1
    Nice, the two calls in mine were unnecessary. Note, you'll need the second delimiter to be ", " (note the space) for the elements in the result to not have a leading space. – rm -rf Feb 24 '12 at 10:22
  • @R.M I hadn't noticed the leading spaces. I decided to go for StringTrim to fix it. – Heike Feb 24 '12 at 10:27
2

Here is an approach using StringSplit:

strlist = {"{H, 1}", "{H, 2}", "{H, 3}", "{Mg, 1}", "{Mg, 1}", 
  "{Mg, 1}", "{C, 1}", "{C, 1, H, 1}", "{N, 1}", "{N, 1, H, 1}"}   

Flatten[StringSplit[strlist, "{" ~~ x__ ~~ "}" :> StringSplit[x, ", "]], 1]

(* Out[1]= {{"H", "1"}, {"H", "2"}, {"H", "3"}, {"Mg", "1"}, {"Mg", "1"}, 
            {"Mg", "1"}, {"C", "1"}, {"C", "1", "H", "1"}, {"N", "1"}, 
            {"N", "1", "H", "1"}} 
*)

Per my comment under Heike's answer, a slight modification of the StringSplit solution as:

StringSplit[strlist, {"{", ", ", "}"}]

would've given the desired result. I'm including it here since she went with StringTrim to get rid of the leading space.

rm -rf
  • 88,781
  • 21
  • 293
  • 472
  • 1
    +1. I like this approach more than those based on ToExpression, becuase they create new symbols H, Mg, etc, which is a side effect. – Leonid Shifrin Feb 24 '12 at 09:33
0

I think a more flexible answer is the following:

data = {"{H, 1}", "{H, 2}", "{H, 3}", "{Mg, 1}", "{Mg, 1}", "{Mg, 1}",
 "{C, 1}", "{C, 1, H, 1}", "{N, 1}", "{N, 1, H, 1}"}
newData=Map[ToExpression,data]

That doesn't assume that the elements are of string type.

Jens
  • 97,245
  • 7
  • 213
  • 499
  • This converts the whole string into an expression, this can result in some of the letters becoming numbers if they have had a value set. Ex. if Mg=4 somewhere then it would be replaced here. – s0rce Feb 24 '12 at 04:29
  • Sure. In that case, my suggestion would require you to prepare the input data in a way that is more consistent with Mathematica syntax. I.e., replace "{H, 1}" by "{"H", 1}" etc. – Jens Feb 24 '12 at 06:48
  • I guess I should have seen that pre-existing symbol definitions would be a problem for you - otherwise you would have just said ToExpression[data]. So I agree you need string manipulation after all... – Jens Feb 24 '12 at 07:07