8

I am trying to generate an Excel-compatible XML file from Mathematica. First, I created a test XML file by saving a Microsoft Excel workbook as "XML Table". Then I Imported this file:

XMLdata = Import["test.xml", "IncludeNamespaces" -> True]

Now I am Exporting this back to XML:

Export["test_exported.xml", XMLdata, "AttributeQuoting" -> "\""]

The generated file cannot be opened by Excel. Comparing it with the original file shows that most of namespace prefixes are dropped although they were present in XMLdata. Why does this happen? How can I correctly Export an XML file which will be compatible with Excel?

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
  • 2
    Could you provide more details about the XML export out of Excel? I see two options there, save as Spreadsheet 2003 XML and save as Data XML. The simple spreadsheet that I tried can only be save using the former and not the latter format (Excel complains about missing XML mappings). I can't get Mathematica to import the spreadheet 2003 XML file using your Import line. I can import it as text and then importing that using ImportString. – Sjoerd C. de Vries Jul 03 '12 at 19:57
  • 1
    The simple spreadsheet wasn't simple enough. It contained a few names and formulas. If I simplify it so that it only contains a 2x3 set of numbers, it imports fine, but the exported version cannot be read by Excel. I either get errors or Excel just hangs depending on the options I choose. – Sjoerd C. de Vries Jul 03 '12 at 20:12
  • @Sjoerd I use Excel 2003 and save a very simple spreadsheet (containing only one cell with number formatted as "Text") as "XML Table". On my system (Windows XP 32 bit, Mathematica 8.0.4) the file is imported without errors. I think of this file as of a template for generating such XML tables by Mathematica for Excel. – Alexey Popkov Jul 04 '12 at 04:53
  • 1
    I'm using Excel 2010 and don't see that XML option. Do you have reasons not to export from mma using Excel's native xls format? – Sjoerd C. de Vries Jul 04 '12 at 06:07
  • @Sjoerd I think that "Spreadsheet 2003 XML" is the same as my "XML Table" (I use localized version of MS Office and it is backward translation). The reason to use XML format is explained in the UPDATE section here. – Alexey Popkov Jul 04 '12 at 06:19
  • Have you looked at the options section of ref/format/XML? there is an option "IncludeNamespaces" for import and "NamespacePrefixes" for export which I would expect to probably help, although I haven't tried it with your example. – Albert Retey Jul 04 '12 at 13:17
  • @Albert I had tried them but without success. I suppose that tuning "NamespacePrefixes" is some way can solve the problem but my current knowledge is insufficient. From the other side, I really do not understand why does Export drop namespace prefixes? I see no reasons since these prefixes are present in XMLData after importing the file. – Alexey Popkov Jul 04 '12 at 15:33
  • @AlexeyPopkov: As usual one cannot be sure whether that is a bug or a more or less useful default setting. I don't understand the details either, but I remember that it is possible to define a (default) namespace per document or tag (?) so it even may be a correct transformation and the problem is with the other side. Anyway, finding a solution to this might need some experimenting which I can't do, not now and not in the near future. But probably you are lucky and find someone who knows and needs no guessing and experimenting... – Albert Retey Jul 04 '12 at 20:35

1 Answers1

7

I have found the solution. From the Documentation,

Some documents use names in a non-namespace-compliant fashion, because the XML namespace recommendation, which extends XML, was made after the initial XML recommendation. "IncludeNamespaces"->"Unparsed" is provided to allow parsing of these documents. The name is always represented as the exact single string that appears in the XML file. Unless absolutely necessary, this option value should not be used.

With the option "IncludeNamespaces"->"Unparsed" everything is nice and even the "AttributeQuoting" -> "\"" option is not needed:

XMLData = Import["test.xml", "IncludeNamespaces" -> "Unparsed"];
Export["test-from MMa.xml", XMLData]

The exported file is opened by Excel without warnings.

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
  • I'm glad you found a solution. Thanks for sharing it! – Mr.Wizard Aug 18 '12 at 05:19
  • The final file format is a XML file that Excel can open. I tryed to rename it to XLS or XLSX with no success (Excel can't recognize it as a XLS file). Do you know some way to convert this XML to a XLSX? Or some change in the structure of XML that can make it recognized? I noted that XLSX is a Zip file with a lot of XML, but no success in figure out how to transform the XML into XLSX. For me it's necessary for file distribution. Tks – Murta Jan 05 '14 at 14:55
  • @Murta XML is not identical to XLS but AFAICT it is equivalent: XML is a textual format while XLS is binary format. Probably only Excel can convert XML to XLS or XLSX. But I do not think that it is a big problem since on my machine the "XML Table" files saved from Excel are automatically opened by it when I double-click on them in Windows Explorer. – Alexey Popkov Jan 06 '14 at 07:35