0

I have a folder of about a million JSON files and I wrote this to convert them quickly to CSV, but the formatting is not correct for the software I'm feeding them into.

Here's the code I'm using:

Do[(Export["CSV_OUT_" <> FileBaseName@jsonlist[[i]] <> ".csv", 
Import[jsonlist[[i]]]]),{i, 1, Length[jsonlist]}];

The result looks like this: """from"" -> {""address_city"" -> ""MORGANVILLE"", ""address_country"" -> ""UNITED STATES"", ""address_line1"" -> ""123 TEST ST"", ""address_line2"" -> ""STE 1"", ""address_state"" -> ""NJ"", ""address_zip"" -> ""07751"", ""company"" -> """", ""name"" -> ""ELENA CROSS""}" """id"" -> ""sfm_c4kjaugl7u8psvqfatp0""" """imb_code"" -> ""897714123456789""" """mail_date"" -> """"" """mail_type"" -> ""usps_first_class""" etc.

The correct results would look like this in CSV format: from address_city,address_country,address_line1,address_line2,address_state,address_zip,company,name,id,imb_code,mail_date,mail_type,object,press_proof,size,target_delivery_date,,to address_city,address_country,address_line1,address_line2,address_state,address_zip,company,name MORGANVILLE,UNITED STATES,123 TEST ST,STE 1,NJ,7751,,ELENA CROSS,sfm_c4kjaugl7u8psvqfatp0,897714123456789,,usps_first_class,

Here are the contents of the JSON file:

{"from":{"address_city":"MORGANVILLE","address_country":"UNITED       STATES","address_line1":"123 TEST ST","address_line2":"STE 1","address_state":"NJ","address_zip":"07751","company":null,"name":"ELENA CROSS"},"id":"sfm_c4kjaugl7u8psvqfatp0","imb_code":"897714123456789","mail_date":null,"mail_type":"usps_first_class","object":"self_mailer","press_proof":"https://lob-assets.com/sid-self_mailers/sfm_c4kjaugl7u8psvqfatp0.pdf?version=v1&expires=1635274618&signature=GHUTiyDy11kHT8VMN5UqwDfGuZmamn3W80c5LAd7Le5Hebwne5q-N7zwtmwD0sOfxeekezdSefZLnscBwiajCw","size":"6x18_bifold","target_delivery_date":null,"to":{"address_city":"MORGANVILLE","address_country":"UNITED STATES","address_line1":"123 TEST ST","address_line2":"STE 1","address_state":"NJ","address_zip":"07751","company":null,"name":"ELENA CROSS"}}

How can I export correctly formatted CSV files like the one above quickly?

Nothingtoseehere
  • 4,518
  • 2
  • 30
  • 58
  • It would be helpful to know the contents of your JSON. Does the JSON import correctly? Have you tried to specify formats explicitly in your import and export calls? – MarcoB Oct 16 '21 at 19:45
  • @MarcoB I'll post the contents of the JSON file thanks. I was trying to keep the code quick given the large number of files I needed to process so I may have taken some shortcuts. – Nothingtoseehere Oct 16 '21 at 19:48
  • Explicit calls make no changes to the CSV returned. – Nothingtoseehere Oct 18 '21 at 12:50
  • CSV is does not support nesting. In your JSON, your "from" and "to" fields have nested members. You need to flatten this out. – Gustavo Delfino Oct 18 '21 at 14:51

1 Answers1

3

This is an example on how to convert your posted JSON to CSV. Notice that we need to flatten out the data to remove nested fields:

ImportString["{\"from\":{\"address_city\":\"MORGANVILLE\",\"address_country\":\"UNITED       STATES\",\"address_line1\":\"123 TEST ST\",\"address_line2\":\"STE 1\",\"address_state\":\"NJ\",\"address_zip\":\"07751\",\"company\":null,\"name\":\"ELENA CROSS\"},\"id\":\"sfm_c4kjaugl7u8psvqfatp0\",\"imb_code\":\"897714123456789\",\"mail_date\":null,\"mail_type\":\"usps_first_class\",\"object\":\"self_mailer\",\"press_proof\":\"https://lob-assets.com/sid-self_mailers/sfm_c4kjaugl7u8psvqfatp0.pdf?version=v1&expires=1635274618&signature=GHUTiyDy11kHT8VMN5UqwDfGuZmamn3W80c5LAd7Le5Hebwne5q-N7zwtmwD0sOfxeekezdSefZLnscBwiajCw\",\"size\":\"6x18_bifold\",\"target_delivery_date\":null,\"to\":{\"address_city\":\"MORGANVILLE\",\"address_country\":\"UNITED STATES\",\"address_line1\":\"123 TEST ST\",\"address_line2\":\"STE 1\",\"address_state\":\"NJ\",\"address_zip\":\"07751\",\"company\":null,\"name\":\"ELENA CROSS\"}}
", "RawJSON"] //
    Prepend[#, KeyMap["from_" <> # &, #from]] & //
   Append[#, KeyMap["to_" <> # &, #to]] & //
  KeyDrop["from"] //
 KeyDrop["to"] //
ExportString[{Values[#]}, "CSV", "TableHeadings" -> Keys[#]] &

produces this CSV output:

"from_address_city","from_address_country","from_address_line1","from_address_line2","from_address_state","from_address_zip","from_company","from_name","id","imb_code","mail_date","mail_type","object","press_proof","size","target_delivery_date","to_address_city","to_address_country","to_address_line1","to_address_line2","to_address_state","to_address_zip","to_company","to_name"
"MORGANVILLE","UNITED       STATES","123 TEST ST","STE 1","NJ","07751","","ELENA CROSS","sfm_c4kjaugl7u8psvqfatp0","897714123456789","","usps_first_class","self_mailer","https://lob-assets.com/sid-self_mailers/sfm_c4kjaugl7u8psvqfatp0.pdf?version=v1&expires=1635274618&signature=GHUTiyDy11kHT8VMN5UqwDfGuZmamn3W80c5LAd7Le5Hebwne5q-N7zwtmwD0sOfxeekezdSefZLnscBwiajCw","6x18_bifold","","MORGANVILLE","UNITED STATES","123 TEST ST","STE 1","NJ","07751","","ELENA CROSS"
Gustavo Delfino
  • 8,348
  • 1
  • 28
  • 58
  • Thanks for your response. Although the CSV is formatted the same as I have above, not the desired format removing Quotes and extra characters as I have shown as the desired result. – Nothingtoseehere Oct 18 '21 at 16:06
  • See https://mathematica.stackexchange.com/questions/41047/exporting-strings-without-the-quotes for that. You just need to add a "TextDelimiters" -> None option. – Gustavo Delfino Oct 18 '21 at 20:37
  • This code will not run on a million files in a folder. Can it be wrapped in Do? – Nothingtoseehere Oct 19 '21 at 12:48