Currently, I am working on a large time series dataset, which contains 928,586 rows and 35 time intervals in columns, in excel format (.xlsx) and using Import to read it in. Actually, I cannot use Import to handle this file since it gives me the Java heap error message: Import::nojmem: There was insufficient Java heap space for the operation. Try increasing the Java Virtual Machine heap size.
Here is only a part of my data after using Import.
data = {{"BRANCHNO", "Date", "JustDay", "JustDate", "JustMonth", "JustYear",
"DoM.Train", "DoM.Test", "Total TXN", "08:00:00", "08:30:00",
"09:00:00", "09:30:00", "10:00:00", "10:30:00", "11:00:00",
"11:30:00", "12:00:00", "12:30:00", "13:00:00", "13:30:00",
"14:00:00", "14:30:00", "15:00:00", "15:30:00", "16:00:00",
"16:30:00", "17:00:00", "17:30:00", "18:00:00", "18:30:00",
"19:00:00", "19:30:00",
"20:00:00"}, {"B0001", {2011, 1, 4, 0, 0, 0.}, "Tuesday", 4.,
"January", 2011., 4., 4., 1538., 47., 59., 71., 77., 115., 111.,
103., 75., 101., 121., 135., 113., 82., 88., 79., 72., 54., 13.,
21., 1., "", "", "", "", ""}, {"B0001", {2011, 1, 5, 0, 0, 0.},
"Wednesday", 5., "January", 2011., 5., 5., 1589., 20., 51., 33.,
68., 97., 82., 101., 78., 104., 119., 137., 109., 114., 142., 85.,
77., 62., 40., 51., 19., "", "", "", "",
""}, {"B0001", {2011, 1, 6, 0, 0, 0.}, "Thursday", 6., "January",
2011., 6., 6., 1399., 3., 39., 61., 59., 63., 108., 66., 85., 108.,
119., 114., 121., 94., 80., 62., 88., 56., 46., 16., 9., "", "", 2.,
"", ""}, {"B0001", {2011, 1, 7, 0, 0, 0.}, "Friday", 7., "January",
2011., 7., 7., 1510., 13., 55., 74., 70., 105., 95., 71., 82., 86.,
113., 124., 91., 98., 76., 112., 84., 78., 64., 18., 1., "", "",
"", "", ""}}
I am wondering that:
- How can I import this file ?
- What is the best way to handle excel file ?
- After importing the excel file, can we save the data in other format (like MATLAB can save in .mat file) so that next time we can import it faster ?
Thanks for help