0

Please, consider an XML document containing these fields:

...
<example>An example</example>
<project>A project</project>
<projectName>A project name</projectName>
<projectDate>A project date</projectDate>
...

To pick up one, this code suffices:

Cases[dataXML, XMLElement["project", __, __], Infinity]

But what if I need all fields whose name starts with "project" ?

None of these is appropriate:

Cases[dataXML, XMLElement["project" ~~ _, __, __], Infinity]
Cases[dataXML, XMLElement["project" ~~ __, __, __], Infinity]
Cases[dataXML, XMLElement["project" ~~ ___, __, __], Infinity]

and, similarly for regular expressions, too.

An obvious, although a bit deceptive, escamotage is:

data = ToString @ dataXML;
ptr = Shortest @ RegularExpression["XMLElement\\[project[^\\]]*\\]"];
StringCases[data, ptr]

nevertheless, I would like to understand the motives of the former failure and if it teaches a broader lesson. Bye !

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
mitochondrial
  • 1,843
  • 10
  • 16

2 Answers2

3

You should use Condition with StringMatchQ:

Cases[dataXML, 
 XMLElement[tag_String /; StringMatchQ[tag, "project*"], __], Infinity]

because Cases doesn't support string patterns.

As to why it is designed in such a way, I would cite Leonid Shifrin:

I would say that the reason is dead simple <…>. Cases and DeleteCases work on parsed expressions, while string functions work on strings. These are just so different that mixing them together would be a very wrong design decision IMO.

A more detailed discussion you can find in this answer by WReach and in the comments under the answer by R. M..

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
0

That's not really an answer, the main point is left totally ununderstood, but it's an useful workaround.

Suppose that xml data are:

<test>
<example>An example</example>
<project>A project</project>
<projectName>A project name</projectName>
<projectDate>A project date</projectDate>
</test>

This code:

tagsList = Import[fileIn, {"XML", "Tags"}]
requestedTags = 
  Select[tagsList, StringMatchQ[#, RegularExpression["project.*"]] &];
Cases[dataXML, XMLElement[#, __, __], Infinity] & /@ requestedTags

accomplishes the goal:

{{XMLElement["project", {}, {"a project"}]}, {XMLElement[
   "projectDate", {}, {"a project date"}]}, {XMLElement[
   "projectName", {}, {"a project name"}]}}
mitochondrial
  • 1,843
  • 10
  • 16