Unfortunately, for reasons beyond my control, I am stuck with mendeley as my citation manager which provides an option to export its library to a bib file.
My mendeley library is organised into folders of relevent topics. I have a few identical entries in two folders, because there is a large overlap in the technical areas/topics pertinent to those folders, e.g. paper "x" may be in topic A (folder A), but also equally belongs to topic B (folder B). This results in duplicate entries in the exported bib file.
There is a point-n-click style deduplication facility provided within mendeley akin to similar functionality in jabref,endnote and the other such tools. However, I do not wish to use such a procedure for two reasons.
- My citation manager (in conjunction with its cloud interface) is the only way I organise my reference literature for later reading. This helps to provide a portable way to manage a large library across multiple devices. I wish to retain
paper "x"in bothfolder Aandfolder B, since at times I want to read ontopic Aand other times ontopic B, e.g. I sometimes use thesort-by-yearoption within a certain topic, and I don't want to miss out onpaper xjust because it has been de-duplicated. - Point and click interfaces are not conducive to seamless workflow. Currently, I am writing a thesis. As I collect more references in my library on the fly(a few simultaneously go to
topic Aandtopic B), they are continuously exported bymendeleyto abibfile, resulting in a mess of duplicates. This necessitates a manual point and click de-duplication procedure which defeats #1 above and is also tedious.
I understand that de-deduplication of bibtex entries is hard. I am only talking about de-duplication of absolutely identical entries. But given its utility, is there any script (shell, perl, python or others) that can handle this de-duplication gracefully. Again, invoking this script from the command-line interface is tedious, and so an automated solution to call this script from the main.tex document (perhaps a line of code in the preamble resembling \bibtexdedup{}) would be advantageous. I am thinking to something like makeindex or makeglossaries which use the shell-escape mechanism to do their job.
Is there any solution available that will achieve these goals?
bibtoolcould help you if the keys of duplicate entries could be guaranteed to coincide (see also https://tex.stackexchange.com/q/76420/35864 and https://tex.stackexchange.com/q/20027/35864), but I assume this is not the case. Then it of course gets much harder to de-duplicate entries. I believe JabRef has a command line interface, but I don't know if it can be used to de-duplicate. – moewe Jun 28 '18 at 20:54mendeley web importerplugin. I looked at jabref CLI, but this page is sparse. How would one call it from the source document? Is my case valid enough to somehow entice the elder geeks here? Using another full-blown citation manager doesn't sound elegant. Shouldn't we look for ado one thing wellsolution. We have abibfile in plain text. We have one task - dedup it. I thought sed, awk, or their elder brother perl might be upto the task with some seriously-heavy regex-fu. – Dr Krishnakumar Gopalakrishnan Jun 28 '18 at 21:02bibtool. – moewe Jun 28 '18 at 21:04bibtool. It does not mention the wordduplicateanywhere in its manual. Can you maybe point out specifically to the section that could potentially help me. How could I callbibtoolfrom within the preamble every time I compile my document? I am looking for something like\makeglossaries{}– Dr Krishnakumar Gopalakrishnan Jun 28 '18 at 21:09