4

I can't see what I'm doing wrong in the following - I'd like to import some XML as a Tree (for structure visualisation) and convert it to an Association for querying content:

xtree = ExpressionTree[Import["ExampleData/paintings.xml"], "XML"]

Out1

Which appears to work as it should.

I then use TreeExpression to obtain an Association but I get a nested list:

TreeExpression[xtree, "Association"]

Which returns this nested list and not an Association:

    {{{"Version" -> "1.0", 
   "Encoding" -> "ISO8859-1"}}, {{{"No.5, 1948"}, {"Jackson Pollock"}, {"1948"}, \
{"$140,000,000"}}, {{"Woman III"}, {"Willem de Kooning"}, {"1953"}, \
{"$137,500,000"}}, {{"Portrait of Adele Block-Bauer I"}, {"Gustav \
Klimt"}, {"1907"}, {"$135,000,000"}}, {{"Portrait of Dr. Gachet"}, \
{"Vincent van Gogh"}, {"1890"}, {"$82,500,000"}}, {{"Bal au moulin de \
la Galette, Montmartre"}, {"Pierre-Auguste Renoir"}, {"1876"}, \
{"$78,100,000"}}}, {}}

I've tried various options (e.g. Heads-> True makes no difference). Any suggestions?

$VersionNumber
14.
EstabanW
  • 700
  • 1
  • 4
  • 11
  • "Association" works only if the tree is already represented with associations, for example: TreeExpression[Tree[Null, <|a -> Tree[1, None], b -> Tree[Null, <|c -> Tree[2, None]|>]|>], "Association"], which yours is not (check the InputForm). For general instructions regarding XML, you can read XML Capabilities Tutorial. – Domen Jan 29 '24 at 15:09

1 Answers1

5

TreeExpression[ExpressionTree[expr, struct1], struct2] does not semantically transform an expression from structure struct1 to struct2, it converts expr to a Tree assuming the expression has structure struct1 and then converts that Tree to an expression assuming it has structure struct2. These assumptions don't hold generally. In particular, TreeExpression[tree, "Association"] is equivalent to TreeFold[tree, {data, children} |-> children].

TreeFold is what you want for querying a tree in a single depth-first traversal. For example:

In[1]:= xtree = 
  ExpressionTree[Import["ExampleData/paintings.xml"], "XML"];

In[2]:= root = TreeExtract[xtree, 2];

In[3]:= f[{prop : "TITLE" | "ARTIST" | "YEAR" | "PRICE", {}}, {value_}] := prop -> value f[{"SALE", {}}, rules_] := <|"SALE" -> <|rules|>|> f[{"PAINTINGS", {}}, associations_] := <|"PAINTINGS" -> associations|>

In[6]:= TreeFold[f, root]

Out[6]= <|"PAINTINGS" -> {<|"SALE" -> <|"TITLE" -> "No.5, 1948", "ARTIST" -> "Jackson Pollock", "YEAR" -> "1948", "PRICE" -> "$140,000,000"|>|>, <|"SALE" -> <|"TITLE" -> "Woman III", "ARTIST" -> "Willem de Kooning", "YEAR" -> "1953", "PRICE" -> "$137,500,000"|>|>, <|"SALE" -> <|"TITLE" -> "Portrait of Adele Block-Bauer I", "ARTIST" -> "Gustav Klimt", "YEAR" -> "1907", "PRICE" -> "$135,000,000"|>|>, <|"SALE" -> <|"TITLE" -> "Portrait of Dr. Gachet", "ARTIST" -> "Vincent van Gogh", "YEAR" -> "1890", "PRICE" -> "$82,500,000"|>|>, <|"SALE" -> <|"TITLE" -> "Bal au moulin de la Galette, Montmartre", "ARTIST" -> "Pierre-Auguste Renoir", "YEAR" -> "1876", "PRICE" -> "$78,100,000"|>|>}|>

In[7]:= Dataset[%]

Convert example paintings XML to a Dataset

You can likely modify this example and use other Trees functionality to query the content of the Tree directly without first converting to a Dataset, but this example shows how to convert the paintings example into associations. Note that the PAINTINGS key has a list value, since the SALE key is repeated.

Ian Ford
  • 521
  • 5
  • 3
  • Thanks @Ian that's s very clean and elegant (and a lot nicer than the repeated rule replacement thing I'd tried). I was trying to stay within the Trees framework but clearly need a bit more practice with it – EstabanW Jan 29 '24 at 21:13