2

Problem statement

I am looking for a way to use biber's "tool mode" to run a (CLI) command on a large .bib file in order to

  1. extract from it only those entries whose keywords = {} field includes a user-definable value (such as mykeyword in the example below), and
  2. write the resulting entries to a new .bib file.

Example

Imagine a .bib file with the following entries:

@article{First,
    [...]
    keywords = {foo, bla},
}

@article{Second, [...] keywords = {test, foo}, }

@article{Third, [...] keywords = {bla}, }

Running the "magic command" I am looking for on this .bib file would match the following entries:

  • for keyword foo: First and Second
  • for keyword bla: First and Third
  • for keyword test: Second

What I am not looking for

  • For a variety of reasons beyond the scope of this question, I would like to really use biber's "tool mode" to solve this problem (rather than other tools such as bibexport, bib2bib, or bibtool).
  • I am looking for a solution that uses a .conf file for biber's "tool mode" and does not rely on any .tex file in the process.

Minimal (non-)working example

The solution to a related problem uses biber's "tool mode" to extract .bib entries by document type (rather than by matched keywords). Specifically, it seems to use a customized biber .conf file to identify entries that are not books via <per_nottype> and then exclude those from the bibliography via <map_step map_entry_null="1" />.

However, no equivalent <per_notkeyword> tag seems to exist, which is why I am guessing my solution would have to rely on some combination of map_field_set and map_field_value. The following (very ad-hoc!) example does not work as expected, though (not even for the somewhat simpler 'reverse' case it tries to implement for now, i.e., removing those entries that include the mykeyword keyword):

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <output_align>true</output_align>
  <output_fieldcase>lower</output_fieldcase>
  <sourcemap>
    <maps datatype="bibtex" map_overwrite="1">
          <map>
                <map_step map_field_set="KEYWORDS" map_field_value="mykeyword"/>
                <map_step map_entry_null="1" />
          </map>
    </maps>
  </sourcemap>
</config>

Is there a way to amend the above .conf file for biber such as to get the desired result?

Bonus points (not really)

An added bonus would be a generic solution, which allows me to specify the target keyword(s?) as a command-line argument, while relying on only a single .conf file.

Florian H.
  • 691
  • 3
  • 9

1 Answers1

1

The following shell script solves all aspects of the problem, including the ability to pass the desired keyword as a command-line argument. Note that it only accepts a single keyword, which can, however, contain spaces (as long as the keyword is enclosed by double quotes in the script call).

Shell script bibextract.sh

#!/bin/bash

Write temporary config file

NOTE: Escapes " to &quot; and replaces $2 by second command-line argument

echo "<?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <config> <output_align>true</output_align> <output_fieldcase>lower</output_fieldcase> <sourcemap>

    &lt;!-- Main source: https://tex.stackexchange.com/a/616241/38212 --&gt;
    &lt;maps datatype=\&quot;bibtex\&quot;&gt;

        &lt;!-- Step 1: Remove all entries that do not contain any keywords whatsoever (necessary because later 'map_notmatch' is FALSE in that case) --&gt;
        &lt;map&gt;
            &lt;map_step map_notfield=\&quot;keywords\&quot; map_final=\&quot;1\&quot;/&gt;   &lt;!-- map_final=\&quot;1\&quot; means that next step is only executed if this condition is met --&gt;
            &lt;map_step map_entry_null=\&quot;1\&quot;/&gt;                        &lt;!-- remove entry from output --&gt;
        &lt;/map&gt;

        &lt;!-- Step 2: From remaining entries, remove any where keyword does /not/ match regular expression --&gt;
        &lt;map&gt;
            &lt;map_step map_field_source=\&quot;keywords\&quot; map_notmatch=\&quot;.*${2}.*\&quot; map_final=\&quot;1\&quot;/&gt;
            &lt;map_step map_entry_null=\&quot;1\&quot;/&gt;
        &lt;/map&gt;

    &lt;/maps&gt;

&lt;/sourcemap&gt;

</config>" > /tmp/by_keyword.conf

Run biber in "tool mode" using that temporary config file

biber --tool --configfile=/tmp/by_keyword.conf --output-file $3 $1

Clean up

rm $1.blg rm /tmp/by_keyword.conf

exit 0

  1. Save to ./bibextract.sh
  2. Make executable: chmod +x ./bibextract.sh

Usage

$ ./bibextract.sh input.bib  mykeyword output.bib
$ ./bibextract.sh input.bib  "my keyword with spaces" output.bib

Explanation

The script first writes a temporary configuration file for biber's "tool mode" to /tmp/by_keyword.conf, which implements a two-step mapping process inspired by this post:

  1. Remove all entries that do not contain any keywords whatsoever (necessary because later map_notmatch is FALSE in that case)
  2. From remaining entries, remove any where keyword does not match regular expression (essentially .*my keyword with spaces.*)

Biber is then run in "tool mode" using that config, after which the temporary config file (and a .blg file created in the process) are deleted again.

Example test.bib to test script with

@book{FooBla,
    title = {FooBla},
    author = {FooBla},
    keywords = {foo, bla},
}

@article{MooFoo, title = {MooFoo}, author = {MooFoo}, keywords = {moo, foo}, }

@misc{Bla, title = {Bla}, author = {Bla}, keywords = {bla}, }

@book{BlaThe-WordFoo, title = {BlaThe-WordFoo}, author = {BlaThe-WordFoo}, keywords = {bla, the word, foo}, }

@collection{NoKeywords, title = {NoKeywords}, author = {NoKeywords}, }

Testing routine

$ ./bibextract.sh test.bib foo foo.bib
$ ./bibextract.sh test.bib bla bla.bib
$ ./bibextract.sh test.bib moo moo.bib
$ ./bibextract.sh test.bib "the word" the-word.bib

Resulting .bib files

foo.bib

@book{FooBla,
  author   = {FooBla},
  keywords = {foo,bla},
  title    = {FooBla},
}

@article{MooFoo, author = {MooFoo}, keywords = {moo,foo}, title = {MooFoo}, }

@book{BlaThe-WordFoo, author = {BlaThe-WordFoo}, keywords = {bla,the word,foo}, title = {BlaThe-WordFoo}, }

bla.bib

@book{FooBla,
  author   = {FooBla},
  keywords = {foo,bla},
  title    = {FooBla},
}

@misc{Bla, author = {Bla}, keywords = {bla}, title = {Bla}, }

@book{BlaThe-WordFoo, author = {BlaThe-WordFoo}, keywords = {bla,the word,foo}, title = {BlaThe-WordFoo}, }

moo.bib

@article{MooFoo,
  author   = {MooFoo},
  keywords = {moo,foo},
  title    = {MooFoo},
}

the-word.bib

@book{BlaThe-WordFoo,
  author   = {BlaThe-WordFoo},
  keywords = {bla,the word,foo},
  title    = {BlaThe-WordFoo},
}
cfr
  • 198,882
Florian H.
  • 691
  • 3
  • 9
  • This is nice but note that it is unlikely to be secure or portable. (These things may not matter at all in your case, but they might if somebody else wants to use your script.) – cfr Mar 20 '24 at 19:15
  • I added syntax highlighting to the bash stuff. Please revert if you prefer. – cfr Mar 20 '24 at 19:25
  • Thanks for the syntax coloring, that was indeed the idea. Regarding portability & security: yes, I first had planned to only provide .conf file and biber command, without a shell script, but the latter was the only method I could think of for getting an argument ($2, that is, which represents the keyword) into that .conf file. I'd be happy to consider other ways to achieve that goal, though. – Florian H. Mar 21 '24 at 21:35
  • It wasn't really a criticism. Mostly in case somebody else wanted to use it. I believe that echo, for example, isn't very portable and a secure script would do more checking on arguments etc. But I don't really see why you should worry about those things so long as it works for you. – cfr Mar 22 '24 at 01:29