34

I want to know if there is a way to optimize the order my citations are in, such that they output the minimum number of numbers in the text.

So, if I have citations \cite{A,B,C} and later \cite{A,C,D} it will realize that if I had instead done \cite{B,A,C} and \cite{A,C,D} it would give a shorter output. That is, the first case one of these will have (1-3) and the second (1,3,4). But if they are reordered you would instead get (1-3) and (2-4).

Another example: \cite{A,B,C}, \cite{D,E,F}, \cite{A,D,F} could be optimized to \cite{B,C,A}, \cite{D,F,E}, \cite{A,D,F}. Which would go from (1-3), (4-6), (1,4,6) to (1-3), (4-6), (3-5).

Now that way only removes one number from displaying, but I regularly have 9+ citations inside each \cite{} command (i.e. \cite{Miro2012,Ling2010,Takao2010,Unruh2009,Goff2008,Kubatko2007,DeAquino2001,Rose1994,David1993}) so an optimization like this could change thing from (1,3,5,7,9,11,13,15,17) to (1-9) at an extreme (going from 22 characters to 3).

I don't have a way to do this outside of LaTeX. If I had to do it by hand I would first make a list of every group of citations with more then 3 citations that also occur earlier in the document. Then I would find if any of those occurred in the same citation block and put them next to one another, provided that doesn't make other blocks worse. Then I would find if any of the citations that occur in earlier blocks occur in adjacent blocks, and try and move those to the ends, if I could do so without messing up other blocks. This would not generate an optimized solution, but I suspect it would generate a better one. If doing this by computer, I would also do multiple trials of optimizing the blocks in different orders so that you don't have an early reordering preventing later ones.

Potentially you could do something with a badness score so that you could mess up one citation block if it made two others better, or one a LOT better?

Rewrite inspired by @JosephWright explained well in the comments:

I think the point here is that if one has several multiple citation, \cite{A,B,C}, \cite{A,D,E}, \cite{B,D,F}, there will be an optimal approach to the ordering in each argument which places as many as possible in continuous runs so reduces the output complexity. -Joseph Wright, May 14 '16 at 5:30

The optimisation might for example spot that if you have a first use \cite{A,B,C} and later \cite{B,D,E} then reordering the first case to \cite{A,C,B} will swap two refs and go from output 1---3 (A = 1, B = 2, C = 3) and 2,4,5 (D = 4, E = 5), to 1--3 (A = 1, C = 2, B = 3) and 3--5 (D = 4, E = 5). The point here is that the refs are all still in use order, but within sets used at one time the order is optimised to globally-minimise the number of numbers which need to be printed. – Joseph Wright May 14 '16 at 13:34

Original text:

Currently, within each citation block, I'm listing citations by date. So I'll have a command with something like:\cite{Miro2012,Ling2010,Takao2010,Unruh2009,Goff2008,Kubatko2007,DeAquino2001,Rose1994,David1993}

This works quite well, LaTeX does the order for me, and gives me an output like 14,19,62,70,71,75–78

That is pretty ugly and takes up a good inch of space. However, most of those are all referenced in the same citation block earlier in the paper. So really, if I'd cited them in order I'd get something more like 14,19,62-68 Given that most of my citation order is arbitrary within each citation (date) I'm wondering if there is a citation optimizer that goes through and tries to compress your later citation references as much as possible by reordering your earlier ones? Has anyone done something like this in LaTeX?

It would be better still if I could specify times not to do this (i.e. this citation must be first within this block, if it hasn't been cited already) or if it could work with `achemso' but I'm mostly curious if this is possible at all.

(MWE removed, as they didn't add anything except making it look like I was trying to do this with a specific citation package, when I am curious if it is possible to do this at all).

siracusa
  • 13,411
Canageek
  • 17,935
  • You surely must know by now that we need a minimal example to work with. You have given us no information at all about how you are managing the bibliography and citations. Why don't you just number in order of first citation? – cfr May 14 '16 at 00:24
  • 1
    @cfr I think the point here is that if one has several multiple citation, \cite{A,B,C}, \cite{A, D,E}, \cite{B,D,F}, there will be an optimal approach to the ordering in each argument which places as many as possible in continuous runs so reduces the output complexity. However, I also think this is a difficult problem (not just in TeX terms)! – Joseph Wright May 14 '16 at 05:30
  • 2
    @JosephWright Won't that be very confusing for readers? – cfr May 14 '16 at 13:31
  • 4
    @cfr Not if I understand correctly. The optimisation might for example spot that if you have a first use \cite{A,B,C} and later \cite{B,D,E} then reordering the first case to \cite{A,C,B} will swap two refs and go from output 1---3 (A = 1, B = 2, C = 3) and 2,4,5 (D = 4, E = 5), to 1--3 (A = 1, C = 2, B = 3) and 3--5 (D = 4, E = 5). The point here is that the refs are all still in use order, but within sets used at one time the order is optimised to globally-minimise the number of numbers which need to be printed. – Joseph Wright May 14 '16 at 13:34
  • @JosephWright I think I see. Maybe I just find it a bit confusing because I don't use numbered systems much. – cfr May 14 '16 at 13:37
  • @cfr That is it, I'm wondering if it is possible in LaTeX at all. It almost certainly isn't possible in my citation system. Joseph has it exactly right. As is typical, my first few citations have a very large number of references in them, often 10 or so. Later on, I get large, ugly citation blocks. If the order within the first block were changed, you'd get a continues range later on. – Canageek May 14 '16 at 23:07
  • @Canageek How would you do it not in LaTeX? – cfr May 14 '16 at 23:11
  • Note that you really should provide a complete minimal example if you want people to play with this. – cfr May 14 '16 at 23:12
  • @cfr I don't know of any software that can do this, but OK, I'll try. – Canageek May 16 '16 at 18:01
  • @cfr There, I added them. But they are VERY arbitrary (i.e. random citation package, arbitrary citations and a contrived example), and well, I don't think add much to actually illustrating the problem, which is why I didn't add them before. – Canageek May 16 '16 at 18:23
  • If you are writing a book then the best way to get more compressed citations is to have a bibliography for each chapter. – Al-Motasem Aldaoudeyeh Mar 10 '18 at 05:53
  • 1
    It looks the main question is does the bibliography sorting have access to the grouping information in citations. For bibtex, it is not clear to me: if you load the cite package, then \citation{A,B,C} is in the .aux file, but I don't know whether one can see that grouping at the .bst stage. For biber it seems biblatex does not preserve the grouping when writing to the .bcf file, so biber doesn't see it. – Andrew Swann Sep 07 '18 at 11:04
  • @AndrewSwann This could be done on the LaTeX end before sending the information to bibtex, couldn't it, since it knows the order already? – Canageek Sep 07 '18 at 22:37
  • I am not sure. For bibtex, part of the point of my comment was the cite package already does that, but I don't know how to see this at the .bst stage. biblatex+biber can probably be configured more directly by writing additional info to the .bcf – Andrew Swann Sep 08 '18 at 08:42
  • I once worked on a similar problem (see http://www.elfsoft2000.com/projects/hash2.txt) with respect to perfect hash tables. – John Kormylo Sep 10 '18 at 19:08
  • This seems very hard (in the computer science sense). However, the easy (and most certainly correct) solution is to just not do this. Make your bibliography alphabetical. Anything else is harder to use, and therefore worse. – user3482749 Feb 01 '20 at 14:25
  • @user3482749 Alphabetical is not standard for any chemistry journal I know of, and would be harder anyway then seeing citation 1, and then finding 1 in the index. – Canageek Feb 05 '20 at 00:24
  • 5
    I don't think there is a way to do it in LaTeX. From a programming standpoint, it sounds a lot like travelling salesman. The Graph should contain all citations and initialize the weight of the border to all other citations with the number of citations. For any paired citations, remove the border weight of the pairs by 1. Implementations can be found online. Be warned: Travelling salesman is hard. – Michael Bölting Apr 13 '21 at 10:29
  • \biboptions{sort&compress} – fromthebeeland Jul 16 '21 at 19:02
  • 1
  • Just because the problem is NP-complete doesn't mean that there isn't a "reasonably efficient approximate solver" that "works in practical cases" (in fact if it's very well studied, there would be plenty of them) 2. TeX is Turing-complete, so it's definitely possible to implement in TeX, but for something like this it's not a good idea to implement it in such a hard-to-write "programming language". —Before thinking about the algorithm, some program must be able to extract the citations from the TeX file (remember that parsing TeX files is hard), then edit them once a good order is computed.
  • – user202729 Dec 20 '21 at 10:51
  • @user202729 You wouldn't need to parse the TeX file, though, because there are already solutions which extract the citation information and write it to one or more separate files. Whichever tool you use to do that gives you highly structured information. Sadly, Biblatex apparently fails to track grouping (according to the comment above). But presumably you could adapt something so you did preserve that information. I don't think extracting the information from the TeX file is the issue. – cfr Aug 09 '23 at 02:19
  • @cfr In retrospect, that's true. Maybe I'll work a bit on the theoretical side of this, starting from some constant-factor approximation as the problem is clearly NP-complete, and to benchmark the performance on real-life problems then real-life examples is necessary. -- unsurprisingly, the intersection between number of people knowing TeX-related tools and people doing theoretical CS is small. – user202729 Aug 10 '23 at 00:33
  • The problem is that any algorithm that does it by brute force scales as N!, where N is the length of the longest citation block (e.g., \cite{key1,key2,key3,key4,key5} would have N = 5). This is OK if N = 3, say, but if N = 10, there are millions of permutations, and with 20, there are ~10^18. I made it slightly more tractable by Monte Carlo sampling the citations I changed around (e.g., if there are 10 in a block, I only test, say, 10,000 permutations rather than over 3 million). It is better than doing it by hand, but not by much, and it can take a loooong time. – karlh Dec 08 '23 at 00:17
  • @karlh and if I didn't have five to ten citations in a block, I wouldn't need to do this, so yeah, not going to happen without a quantum computer version of TeX. – Canageek Dec 12 '23 at 02:29