7

Using bibtex, natbib, consider the following:

\def\grpA{key1,key2,key3}
\def\grpB{key3,key4,key5}

The first set\citep{\grpA} and the second set\citep{\grpB} discuss XYZ. 
In total\citep{\grpA,\grpB}, the bla bla bla is in common.

Problem is that, since key3 is common to grpA and grpB, the total citation set actually puts key3 in twice, which not only breaks the sort&compress option of natbib, but actually puts the same citation in twice, not recognizing them as being the same.

Is there a way to detect and discard duplicates? I am trying to summarize about 500+ papers and defining groups like this is handy, but some keys will be common.

lockstep
  • 250,273
  • A few tests suggest what is happening is that sort&compress works fine as far as it goes, but there is no 'unique set' stage. Thus if you want to allow removal of duplicate keys you'll need a preprocessing step. Are we allowed to load for example expl3 to avoid needing to define a suitable 'remove duplicates' command? – Joseph Wright Jun 26 '13 at 08:38
  • I have never used expl3, so I am open ears.... – Nicholas Hamilton Jun 26 '13 at 08:57
  • BWT, I agree that sort&compress still works, however, [1-4] gets changed to [1-3,3,4] if there are two instances of 3, rather than realizing that 3,3 is still within the range [1-4]. – Nicholas Hamilton Jun 26 '13 at 09:01

2 Answers2

6

Here's a way to remove duplicates, using all the \cite? commands of natbib:

\begin{filecontents*}{\jobname.bib}
@article{key1,
 author={A. Author},
 title={Title},
 journal={Journal},
 year={2000},
}
@article{key2,
 author={A. Buthor},
 title={Title},
 journal={Journal},
 year={2000},
}
@article{key3,
 author={A. Cuthor},
 title={Title},
 journal={Journal},
 year={2000},
}
@article{key4,
 author={A. Duthor},
 title={Title},
 journal={Journal},
 year={2000},
}
@article{key5,
 author={A. Euthor},
 title={Title},
 journal={Journal},
 year={2000},
}
\end{filecontents*}

\documentclass{article}
\usepackage[sort&compress]{natbib}

\usepackage{expl3}
\ExplSyntaxOn
\AtBeginDocument{
 \cs_set_eq:Nc \adp_orig_citex:wwn { @citex }
 \cs_set:cpn { @citex } [ #1 ] [ #2 ] #3 
  {
   \clist_set:Nx \l_tmpa_clist { #3 }
   \clist_remove_duplicates:N \l_tmpa_clist
   \adp_orig_citex:nnV { #1 } { #2 } \l_tmpa_clist
  }
} % end of \AtBeginDocument
\cs_new:Npn \adp_orig_citex:nnn #1 #2 #3
 {
  \adp_orig_citex:wwn [ #1 ] [ #2 ] { #3 }
 }
\cs_generate_variant:Nn \adp_orig_citex:nnn { nnV }
\ExplSyntaxOff


\begin{document}

\def\grpA{key1,key2,key3}
\def\grpB{key3,key4,key5}

The first set \citep{\grpA} and the second set \citep{\grpB} discuss XYZ. 
In total \citep{\grpA,\grpB}, the bla bla bla is in common.

\bibliographystyle{plainnat}
\bibliography{\jobname}

\end{document}

The code works as follows:

  1. \cs_set_eq:Nc \adp_orig_citex:wwn { @citex } saves a copy of \@citex, which is the internal command used by natbib

  2. the code

     \cs_set:cpn { @citex } [ #1 ] [ #2 ] #3 
      {
       \clist_set:Nx \l_tmpa_clist { #3 }
       \clist_remove_duplicates:N \l_tmpa_clist
       \adp_orig_citex:nnV { #1 } { #2 } \l_tmpa_clist
      }
    

    redefines \@citex to do a couple of steps more; it sets the variable \l_tmpa_clist to the expanded contents of the third argument (the list of keys), then removes duplicates from it and feeds it to the following instruction

  3. \adp_orig_citex:nnV { #1 } { #2 } \l_tmpa_clist is equivalent to

    \adp_orig_citex:nnn { #1 } { #2 } {<contents of \l_tmpa_clist>}
    

    because of the \cs_generate_variant:Nn command below

  4. \adp_orig_citex:nnn just call the original copy of \@citex

egreg
  • 1,121,712
5

One way of removing duplicates is:

\usepackage{natbib}

\begin{document}

\def\grpA{key1,key2,key3}
\def\grpB{key3,key4,key5}
\def\merge#1{\mergexx#1,\relax,}
\def\mergexx{\expandafter\mergex}
\def\mergex#1,{%
\ifx\relax#1\else
\ifcsname @??#1\endcsname\else
#1,\expandafter\ifx\csname @??#1\endcsname\relax\fi
\fi
\expandafter\mergexx
\fi}

\let\oldcitep\citep
\protected\def\citep#1{{\xdef\tmp{\noexpand\oldcitep{\merge{#1}}}}\tmp}

The first set\citep{\grpA} and the second set\citep{\grpB} discuss XYZ. 
In total\citep{\grpA,\grpB}, the bla bla bla is in common.


\end{document}

which leaves an aux file

\relax 
\citation{key1,key2,key3,}
\citation{key3,key4,key5,}
\citation{key1,key2,key3,key4,key5,}

Note key3 just appears once on the last line.

David Carlisle
  • 757,742
  • very clever. Thankyou. Could I trouble you to put it into function/command so it can be used more than once...? – Nicholas Hamilton Jun 26 '13 at 10:17
  • well the intention was that you could go \citep{\merge{\grpA,\grpB}} which is why it just uses expansion, and in most places you can do that (try it in \typeout) it's just that \citep starts processing the list before it is fully expanded so I had to redefine \citep here to expand its argument. So not sure how more generic a function could be defined. – David Carlisle Jun 26 '13 at 10:31
  • As you have intended is exactly what I wanted, apologies, I am still learning the syntax beyond the basics. Cheers. – Nicholas Hamilton Jun 26 '13 at 10:39
  • Am I right in assuming that if there were a \grpC, then \citep{\merge{grpC,\merge{\grpA,\grpB}}} would do the trick? – Nicholas Hamilton Jun 26 '13 at 10:44
  • David, egregs solution worked out of the box for me, I had errors with your code for some reason, not sure why. – Nicholas Hamilton Jun 26 '13 at 11:20
  • @ADP shocking:-) – David Carlisle Jun 26 '13 at 12:04