3

How can I generate a text-file (without any formatting) of the references used in a tex file? This is needed quite often when submitting scientific papers to journals, because they want a separate list of the bibliography. However, if I turn the pdf file to a docx document, there is the problem that "ff" and "fi" in the text is not copied properly, so that a lot of manual corrections would be needed.

Does anyone know of a way to get the list of references either from the latex file or the pdf document?

Bernard
  • 271,350
Hendrik
  • 173
  • Welcome to TeX.SX! You tagged this biblatex, so you are using a bib file, don't you? It is a text format without real formatting. But how should the references look like for submission? – TeXnician Jun 09 '18 at 08:22
  • Yes, I use a .bib file. However, it includes all references I have, not just the ones I use in the respective .tex file. Also, it contains all information, like publisher = {{XXX}}, etc. in a different fromat. I would like the text file to look like a reference list, in order, etc. however it should be in plain text without formatting – Hendrik Jun 09 '18 at 08:25
  • Well, but what should the plain text contain? Everything as output in the PDF, less, more? – TeXnician Jun 09 '18 at 08:28
  • Everything that is in the pdf file (that's why theoretically a tool for pdf --> docx would work, however it does not properly translate "ff" and "fi" text to docx. For some reason these letter combinations get deleted in the docx file). – Hendrik Jun 09 '18 at 08:28
  • 1
    About the ff and fi --> https://tex.stackexchange.com/questions/86614 – Dr. Manuel Kuehner Jun 09 '18 at 08:39
  • Thanks Manuel. Solving that allowed me to create the wanted list manually without too much hassle. Still, if someone knows how to create the bibliography as a .doc or .txt file without any hassle, I would greatly appreciate that. – Hendrik Jun 09 '18 at 12:26
  • I doubt there are converters out there that take the .bib file and output a .docx or .tex representation of what your biblatex bibliography would look like in a LaTeX document without the intermediate step converting a TeX document. Of course there is htlatex/tex4ht... – moewe Jun 09 '18 at 12:32
  • What about https://tex.stackexchange.com/q/23878/35864? – moewe Jun 09 '18 at 12:59

3 Answers3

3

There are several possibilities, but what works for you depends on the desired format of the text file. You say you want no formatting, but I doubt you would be happy with just the format you get in the .bib file.

From biblatex/BibTeX directly If you use biblatex there is no easy way to obtain the text of the bibliography as it is printed by \printbibliography. (See also Is it possible to mimic the compiled bibliography automatically?). But if you use BibTeX the .bbl file contains the bibliography with a bit of formatting and LaTeX commands. It is not impossible to change the .bst file that produced this output to only give plain text, but it does take some work. An example can be found in https://gist.github.com/moewew/50795d6f171269e949d71d8c4149468e. You would change your document to use the new bibliography style plain-plain and a compile run would give you a plain-text output in the .bbl file.

biblatex users can try biblatex2bibitem, which combines the copy-and-paste-from-PDF solution with the approach suggested here: Output bibliography as a standard itemized list in bib latex.

Copy and paste from the PDF A simple copy-and-paste job from the PDF (I take it that is what you tried) can be a good one-off solution, but that may require manual intervention to remove possibly unwanted line breaks, page numbers and other oddities. And you may also run into problems with the copied characters not being recognised as intended (you mention "fi" and "ff" ligatures - I have never had a problem with these, but that may well be font- and even viewer-dependent).

Remove LaTeX commands with OpenDetex detex can remove (La)TeX commands and produce a plain-text version of your document. See https://github.com/pkubowicz/opendetex

Convert to document to HTML et al. You could also convert your .tex file to HTML, .odt or another format to obtain an almost-plain-text version of your bibliography. Pandoc and htlatex spring to mind. There are also tools to extract plain text from .dvi or .pdf files, dvi2tty and pdftotext, see https://texfaq.org/FAQ-recovertex.

Convert the .bib to HTML Finally there are tools like bib2html to convert the entries in your .bib file to HTML directly, see also How to quickly convert a single BibTeX reference into a formatted reference?. Some reference managers like JabRef and Bibdesk also offer a pre-view feature of the .bib entries in certain styles that could be used to produce a plain-text bibliography.

moewe
  • 175,683
  • You see that my idea of having biblatex deliver somehow results in other formats, though extravagant, is not that outlandish. Is this the third of forth question in the last couple of weeks demanding something of the sort? – gusbrs Jun 17 '18 at 13:09
  • 1
    A note, pandoc does a reasonable job converting LaTeX in general (far from perfect though, in my view), but is not a faithful way to convert biblatex results in particular, for the simple reason that it does not use biblatex at all, but rather a CSL style file. Of course, this might well suffice for this particular OP. – gusbrs Jun 17 '18 at 13:13
  • 1
    @gusbrs Thanks for the comments. Yes, it is interesting to see that people seem to need this. It seems to be a symptom of publishers trying to accept LaTeX submission, but only half-heartedly. Of course biblatex is tricky for publishers if they want to convert the document into their internal format, but at least BibTeX's .bbl files seem doable if they can deal with LaTeX markup commands at all. – moewe Jun 17 '18 at 13:29
  • I think the problem is somewhat deeper. LaTeX was conceived and works beautifully to typeset text within a well defined "page". But, even when the final product is indeed a book which fits that bill, other outlets are frequently desired which do not have a well defined page (ebook, web page, you name it). Of course, publishers indeed have a number of hard to explain idiosyncrasies, but you don't have to be a publisher to feel that thrust. – gusbrs Jun 17 '18 at 13:38
  • 1
    JabRef has some inbuilt exporters for different formats (endnote,csv, custom html) and the closest one for plain-text seems to be ISO690 which produces a txt-file from the selected references. – Christoph S Jun 17 '18 at 14:33
1

Step 1: Get detex from https://github.com/pkubowicz/opendetex

detex paper.bbl > references.txt

Step 2:

Edit references.txt to remove header and footer so that only references remain. Remove any empty lines within references, so that each reference becomes a separate paragraph. There should be no empty lines before the first reference.

Step 3:

perl -00pe 's/^/\[$.\]/' references.txt | \
sed 's/^\(\[[0-9]\{1,\}\]\).*/\1/g' | \
perl -0777 -pe 's/\n(?=[^\n])//g' | \
sed 's/^\(\[[0-9]\{1,\}\]\)/\1 /g' > references_n.txt

The first command numbers the reference paragraphs, the second deletes the id string, the fourth removes line breaks in the reference, and the last one adds the space between the number in brackets and the reference text.

  • Welcome! Are you sure this is applicable to a Biblatex/Biber workflow as opposed to BibTeX? – cfr Oct 25 '19 at 23:04
  • I was looking for the functionality referenced in the original question at this page, found some clues here, and proposed a solution that worked for me. – user2585500 Oct 27 '19 at 06:15
  • But are you using Biblatex with Biber, as the OP is? – cfr Oct 27 '19 at 17:43
  • I thought this was TeX-LaTeX stack exchange. – user2585500 Oct 28 '19 at 20:11
  • 1
    Biblatex/Biber produces a rather different .bbl from BibTeX, which makes it more problematic when trying to do the kind of thing the OP wants. Your solution seems to be aimed at BibTeX, whereas the OP needs something for Biblatex/Biber. – cfr Oct 28 '19 at 21:28
  • Of course, the biblatex tag might be a mistake ,,, – cfr Oct 28 '19 at 21:29
-1

You may extract the references from the pdf using Nitro Pro. You need to purchase the software. However, they offer a limited time free trial.

In Nitro Pro, go to the "Convert" tab and click on "To Plain Text".

Amin
  • 1