The framework I developed here should be well-suited for this task. All you have to do is to write a converter from your papers into the list structure. But, it is very easy to do.
To load the framework, you could either grab all code pieces together from that post and evaluate, or (much simpler), call:
Import["https://gist.githubusercontent.com/lshifr/2696189/raw/largeData.m"]
Next, note that by default, a cross-platform but much slower option which uses Compress is used to store the list chunks. You may want to execute this, after you load the main code, but before you start your work:
$fileNameFunction = mxFileName;
$importFunction = mxImport ;
$exportFunction = mxExport ;
$compressFunction = Identity;
$uncompressFunction = Identity;
to switch to using .mx files, which are much faster (assuming that you won't frequently change machines / platforms). See the examples in the linked post for more details.
Assuming that you have that code loaded, you basically need something like (the papers variable is the same as you defined in the question):
initList[allpapers];
Do[appendTo[allpapers,Import[paper]],{paper, papers}];
storeMainList[allpapers, DestinationDirectory :> "your-directory-to-store-chunks"]
and then you can work with the allpapers variable as with a usual list, in many respects, just that it won't all be in-memory at once. See the mentioned link for examples of use, and the linked questions too.
To load the list in a different session, you have to load the framework's code and use
retrieveMainList[allpapers, DestinationDirectory :> "your-directory-to-store-chunks"]
If you later change the list (add more elements or delete some), you will have to call storeMainList again, for these changes to persist on disk.
allpapersvariable. It is not a usual list, however , a number of usual list operations work on it. You will have to use something likeFindList[allpapers[[12]], "Vol.", 1], if, for example, you are interested in a paper #12. Note also that, once you take the part, it resides in memory, so, if you no longer need it at the moment, it is a good idea to callreleasePart[allpapers, 12], if you don't need it any more. It will be again loaded when you callallpapers[[12]]next time. – Leonid Shifrin Apr 27 '14 at 21:58map[f_,lst_]:=Table[With[{result = f[lst[[i]]]},releasePart[lst,i]; result],{i,Length[lst]}], and then usemap[FindList[#,"Vol.", 1]&, allpapers]. And so on - it is quite easy to build your own iteration and other functions. You just have to understand how these structures work, and decide when you want to release the loaded parts from memory. – Leonid Shifrin Apr 27 '14 at 22:05wallpapers[[i]], thei-th part of the "list" is loaded from disk to memory and is returned every time you callallpapers[[i]]- residing in memory after you first call this. 2. When you callreleasePart[allpapers, i], it gets unloaded from memory, so that the next time you callallpapers[[i]], it will be loaded to memory again. And that's it - everything else you do with these parts, including iteration through them, or how many you want to keep in memory at once, etc, is completely up to you. – Leonid Shifrin Apr 27 '14 at 22:10Take,Drop,First,Last,Rest,Most, which I included and which work "out of the box", but generally you can always add your own. – Leonid Shifrin Apr 27 '14 at 22:12