3

Problem statement

I have a large .bib file that I would like to split into multiple smaller .bib files.

For this I am looking for a parser that allows me to extract, say, all the @book entries from the orginal .bib file, so I can save those to a new file. The entries themselves should basically remain untouched.

Just in case it helps: I am using biblatex & biber (under Debian GNU/Linux), which IIRC offer some parsing capabilities that I'd be happy to employ here.

Florian H.
  • 691
  • 3
  • 9
  • 3
    I would say you should not use LaTeX for that, but some external processing. Which platform (OS) are you on? – TeXnician Feb 16 '18 at 06:45
  • Debian (added this to the original question). I'd be happy to use Perl, sed, or whatnot. However, I remember using biblatex/biber some time ago to extract only the actually cited references from a paper into a new .bib file. I forgot how I did this, but it seemed to work well, so I was wondering whether it might be possible to do this without external tools (not a strict requirement, though). – Florian H. Feb 16 '18 at 06:53

1 Answers1

6

This is be possible with Biber's tool mode. Unfortunately, there was a tiny issue with the exact functionality you need in versions before 2.11. The issue has been reported (https://github.com/plk/biber/issues/212) and is resolved in Biber 2.11.

Create a onlybooks.conf with the contents

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <output_align>true</output_align>
  <output_fieldcase>lower</output_fieldcase>
  <sourcemap>
    <maps datatype="bibtex" map_overwrite="1">
      <map>
        <per_nottype>book</per_nottype>
        <map_step map_entry_null="1" />
      </map>
    </maps>
  </sourcemap>
</config>

Then call Biber with

biber --tool --configfile=onlybooks.conf <yourfile>.bib

and you should be presented with a file called <yourfile>_bibertool.bib that only contains the @book entries of <yourfile>.bib.

The caveat that Biber deletes fields not known in its data model of course also applies here, see Prevent `biber --tool` from removing non-standard fields in .bib files.


Alternatively, you can use bib2bib of bibtex2html. You can find out much more in the PDF documentation.

Use

bib2bib -c '$type = "BOOK"' -ob onlybooks.bib <yourfile.bib>

to obtain only the @book entries of <yourfile>.bib in onlybooks.bib.

The type must always be in all caps and must be enclosed in quotation marks. On Windows OSes the outer quotation marks should be double and the inner single -c "$type = 'BOOK'", while on Unix it should be the other way round -c '$type = "BOOK"'


You can also use bibtool:

bibtool --select{@book} all.bib -o some.bib

Writes only the @book entries of all.bib to some.bib

Some bibliography managers like JabRef also offer to filter .bib files, maybe that can be used here as well.

moewe
  • 175,683
  • 1
    moewe, I think you didn't mean onlybook.bcf in the second paragraph. – gusbrs Feb 16 '18 at 12:49
  • @gusbrs Ah yes, indeed. Thank you for spotting that. Edited. – moewe Feb 16 '18 at 13:21
  • I'd love to accept this answer, but might have to wait for the biber fix to try the first proposed solution. I tried the bib2bib and bibtool suggestions, but run into parsing errors with both: bibtool does not seem to be flexible enough to handle "unusual" types such as @online or @report. bib2bib also runs into parse errors, without providing useful information as to what exactly caused it. I'll try to remember to check in on this question after biber updates to v2.11 on my machine. – Florian H. Feb 24 '18 at 21:06
  • @FlorianH. At least bibtool can be made to accept all common biblatex types: https://tex.stackexchange.com/a/415044/35864. – moewe Feb 25 '18 at 06:39
  • I checked in on the first biber --tool solution (now that biber v2.11 is installed on my machine), and it works so beautifully I want to cry! The only flaw is that the output file entirely omits any custom output fields in any of the entries. For example, I sometimes add a custom field mynote = {This is a note}, to a bibliography entry, and biber, not recognizing mynote as a valid field, will ignore it in the output. Would there be a way to formally define the field mynote somehwere, such that biber does not ignore it? – Florian H. Jun 06 '18 at 23:36
  • 1
    @FlorianH. https://tex.stackexchange.com/q/415028/35864 – moewe Jun 07 '18 at 04:55
  • Very nice, I'll accept the answer. – Florian H. Jun 07 '18 at 13:00