How do I convert a Word file's references to .bib format? Is any tool available to convert it to .bib format?
- 206,688
- 2,030
-
1I think there isn't one. References are usually formatted in accordance with some standard. Can be alphabetic, numeric, several styles... Therefore it would be very much complicated to do so. Perhaps you'd be better off Copying the Contents to a spreasheet and separating the fields yourself (or -if you are familiar- something like regex or other programming language). – Guilherme Zanotelli Oct 20 '16 at 06:33
-
1@GuilhermeZ.Santos - I fully agree with your comments. In fact, I've just written a long-ish answer to expand on the thoughts you've expressed. :-) – Mico Nov 27 '16 at 08:18
4 Answers
I think that there two fundamental information-related issues which must be dealt with when performing a conversion of a formatted bibliography to a bib file:
Information that needs to be deleted. This includes almost all formatting-related information, such as the italicizing of journal names and book titles; placing single or double quotes (smart or dumb) around title fields; commas, colons, and periods used as separators between fields; etc. If you use
biblatexandbiber, you can leave "accented" characters such asö,é, andßas is. If you use BibTeX, though, you should replace all accented characters with their LaTeX representations. For the examples mentioned earlier, write\"o,\'e, and\ss, respectively. For more on this subject, see the posting How to write “ä” and other umlauts and accented letters in bibliography.This is the easier of the two issues!
Meta information that needs to be created. This is the much harder issue.
Among the crucial pieces of meta-information that must be supplied are
the entry type:
@article,@book,@misc, etc. I've seen "auto-converted" bib files in which the only entry type is@article, even though most of the entries should have been of type@book. Trying to clean up the resulting mess can be enormously frustrating.the "key" -- the item used in the argument of
\citeinstructions. It's helpful in the long run to employ a key system that's at least somewhat mnemonic. Don't just use "A", "B", "C", etc.the association of various pieces of information within a (formerly formatted) entry with fields. Suppose, for instance, that one has correctly identified a given entry as being of type
@articleand has settled on a key for this entry. One still has to decide which pieces of information copied from the formatted MS Word bibliography belong to theauthor,title,year,journal,volume,number,pages,urland (quite possibly!) further fields.Within the
authorandeditorfields, one has to make sure that (a) the keywordandis used correctly and consistently to separate individual authors and (b) that the first, von, surname, and junior components of every author have been identified correctly. For instance, Spanish names often contain a two-part surname that's separated by a space rather than a hyphen. It's important to notice that the surname of "Antonio Garcia Pascual" is "Garcia Pascual", not just "Pascual". (Movie actors frequently have two-part last names as well; cf. Kristin Scott Thomas and Helena Bonham Carter.) Do also be on the lookout for "corporate" authors. Anauthorfield of "National Aeronautic and Space Administration" may be correct in the narrow sense of not containing typos, but it'll confuse BibTeX into thinking that it's dealing with two separate authors separated by the keywordand: The first is named "National Aeronautic" and the second is named "Space Adminstration". (Ouch!) Be on the lookout for such cases and be prepared to insert extra pairs of curly braces around such corporate authors.
For a bib file to be really useful in the long run, one needs to take care of (at least) three further attributes, which I call the "three c's": completeness, correctness, and consistency.
Is the stuff that was obtained from a formatted external source (say, an MS Word file) complete? For instance, if only the initials of the authors' given names are provided in the source file, it's probably a very good idea to take some time to find out what the full first names are and to insert these full names in the bib file. That way, if at some future point you need to use a bibliographic style that prints out full first names rather than just initials, you needn't go back and find out what those first names may be.
Is the information obtained from the formatted external source correct? Are there mis-spellings of names and of words in the title, and is there missing information (e.g., volume numbers and page ranges) of journal articles? Are you taking care to use curly braces to encase words in
titlefields that shouldn't be converted to lowercase?Once you've assembled all the bib entries and are reasonably confident that the information is complete and correct, you should also check if the information is consistent across entries. Do you have "The Review of Economics and Statistics" as the
journalname in one entry and just "Review of Economics and Statistics" in another entry? Both versions are correct and complete, but they're not consistent with each other. Another consistency issue to look out for is the spelling of author names if their original spelling doesn't use the Latin alphabet. E.g., is it Chebyshef, Tschebysheff, Tschebishev, etc? Is it Ito, Itoh, Itou, etc? Is it Goto or Gotoh? Your readers will thank you if you provide consistency in these areas.
Unless you have a tool that (a) does a very good job handling the chores listed under "2" above and (b) provides some assistance with the final three bullet points, you may be better off performing the conversion entirely by hand. Just set up a few templates for entries of type @article, @book, etc. that contain the required and optional fields and take it from there.
-
2Great answer, there are some things you said here that I hadn't thought of, like the three c's. In my comment I was referring to the issue 2, that's definetly the big problem for this said tool: how does the computer know the entry type it's reading? Also which field (
authorand the likes) receives which value? Furthermore, there are so many other details requiring special treatments when creating the bib file that making a tool to do so is doomed to fail (IMO). – Guilherme Zanotelli Nov 27 '16 at 11:15
Online tool is available to convert the text to bib format: please check the site http://text2bib.economics.utoronto.ca/
- 2,030
Full disclosure: I developed the tool referred to below and am the founder of Scholarcy.
You could try this converter
https://ref.scholarcy.com/api/
which will parse .docx and pdf files and extracts any references it finds into a BibTeX or RIS file
- 317
-
To all the downvoters: How is this link-only answer worse than the other one? – Henri Menke Jul 04 '18 at 00:32
-
2When you are promoting your services, you have to disclose your affiliation. Otherwise this is considered spam. – Henri Menke Jul 04 '18 at 00:38
-
1
-
For what it is worth, this worked flawlessly for me and produced a sensible output when provided by a colleague's .docx file. Even the cite keys were sane. – Landak Jan 07 '22 at 10:00
-
- Install Mendeley and Word plugin
- Export to Mendeley
- Highlight the particular references in Mendeley, "Copy As" "Bibtex entry" and paste into a ".bib" file
- 21