24

Something I thought would be absolutely amazing but is way beyond my BibLaTeX skills: whenever there is no url field, generate a link that searches for the title, authors and year on Google Scholar. So for a paper named "Rocket Science", written by J. Doe in 1999 the link http://scholar.google.com/scholar?q=%22Rocket+Science%22+author%3Adoe&as_ylo=1999&as_yhi=1999 would be generated and put on the paper name of the bibliography entry or where ever it would be convenient.

I think exact title, year and author should be almost always unambiguous and available. This is how the cryptic query above would look when entered:

screen capture of the result of the search query

So if somebody liked the idea and had the biblatex-fu to implement it, I'd be totally jazzed :)

lockstep
  • 250,273
Christian
  • 19,238
  • Looks doable, but it would be handy to know the exact format you want the query in. For example, we could just dump say author, title, year and journal into the q= part, or do something more complex. Which fields do you want, and how 'sophisticated' are you after? – Joseph Wright Aug 30 '12 at 16:50
  • Part of the answer (inserting the hyperlink once you have generated the url) may build on this thread: http://tex.stackexchange.com/questions/3802/how-to-get-doi-links-in-bibliography/64695#64695 – Corentin Aug 30 '12 at 16:50
  • 1
    This is really something the backend should do. It could be part of the biber sourcemapping feature and probably not that hard to implement. – PLK Aug 30 '12 at 17:45
  • Ok sorry, maybe that was not human-readable enough and too compressed. The link encodes author, year and title. These fields are almost always available (unlike journal or something) except for things like collections, where you only have editors. The search string encoded in the URL above is "Rocket Science" author:doe and you will get a line that says that you're searching in the timeframe 1999--1999. – Christian Aug 31 '12 at 06:24
  • I added a screen shot to illustrate. – Christian Aug 31 '12 at 06:32
  • @Christian OK, that's a slightly different requirement to the one I tried first :-) It may take a little while, but I'll see what I can do (I have a feeling I should be able to avoid using expl3). – Joseph Wright Aug 31 '12 at 06:37
  • 1
    Because of the consensus that biber would be the perfect place to implement this, I wrote a feature request for biber: https://sourceforge.net/tracker/?func=detail&aid=3563625&group_id=228270&atid=1073795 – Christian Aug 31 '12 at 06:40
  • @JosephWright Great, thanks for trying this :) And sorry that I didn't express myself clearly enough in the first place :/ – Christian Aug 31 '12 at 06:40
  • Nice idea, I can imagine this saving some time. – Harold Cavendish Aug 31 '12 at 21:50

2 Answers2

18

Here is a a solution which requires biblatex 2.3 and biber 1.3 (both in DEV on SF). Firstly, let's allow a new "AUTOURL" field in all entries so we can populate this as we probably don't want to use the URL field as it can be printed in the bibliography. We can change our driver to test for the AUTOURL field and to add a hyperlink on the title or whatever. Here I'm concentrating just on generating the url data for this.

Add this to your biblatex-dm.cfg

\DeclareDatamodelEntryfields{autourl}
\DeclareDatamodelFields[type=field, datatype=uri]{autourl}

Now we do the real work with the biber sourcemap feature which is better than hard-coding all this as we can then create arbitrary URLs:

\DeclareSourcemap{
  \maps[datatype=bibtex]{
     \map[overwrite]{
      \step[fieldset=autourl, fieldvalue={http://scholar.google.com/scholar?q="}]
      \step[fieldsource=title]
      \step[fieldset=autourl, origfieldval, append]
      \step[fieldset=autourl, fieldvalue={"+author:}, append]
      \step[fieldsource=author, match=\regexp{\A([^,]+)\s*,}]
      \step[fieldset=autourl, fieldvalue={$1}, append]
      \step[fieldset=autourl, fieldvalue={&as_ylo=}, append]
      \step[fieldsource=year]
      \step[fieldset=autourl, origfieldval, append]
      \step[fieldset=autourl, fieldvalue={&as_yhi=}, append]
      \step[fieldset=autourl, origfieldval, append]
    }
  }
}

Biber will also URL escape any UTF-8 or LaTeX char macros which end up in the URL due to splicing in parts of other fields so you don't have to worry about that.

This results in a field in the .bbl like this:

enter image description here

Which you can reference as the "AUTOURL" field in some logic such as the TITLE field format to add it as a hyperref link. To take a simple example:

\DeclareFieldFormat{title}{\href{\thefield{autourl}}{#1}}

Here is a more sophisticated example that works for all entry types in their default configuration without changing it and uses the auto-generated URL only when no custom one is available:

\DeclareFieldFormat{title}{\iffieldundef{url}{\href{\thefield{autourl}}{\mkbibemph{#1}}}{\href{\thefield{url}}{\mkbibemph{#1}}}}
\DeclareFieldFormat[article,inbook,incollection,inproceedings,patent,thesis,unpublished]{title}{\iffieldundef{url}{\href{\thefield{autourl}}{\mkbibquote{#1\isdot}}}{\href{\thefield{url}}{\mkbibquote{#1\isdot}}}}
\DeclareFieldFormat[suppbook,suppcollection,suppperiodical]{title}{\iffieldundef{url}{\href{\thefield{autourl}}{#1}}{\href{\thefield{url}}{#1}}}

Only the url field is used to replace autourl, not URLs generated from doi or eprint. A production version should also respect \ifhyperref and should be made more robust against missing fields.

Christian
  • 19,238
PLK
  • 22,776
  • Sounds great. What does "some logic" mean exactly? How would I make every title actually be something like \href{AUTOURL}{ORIGINALTITLE}? – Christian Aug 31 '12 at 22:46
  • I updated the example with an example field format for TITLE which uses the URL generated. – PLK Sep 01 '12 at 07:59
  • @PLK: What does append do? Is this a method to add information to an existing key. (e.g. keyworkds)? – Marco Daniel Sep 01 '12 at 08:02
  • It just appends the fieldvalue to the current value instead of replacing it. New in 1.3/2.3 – PLK Sep 01 '12 at 08:44
  • @PLK: Great: This is a feature I missed. – Marco Daniel Sep 01 '12 at 09:04
  • Looks perfect! I am going to install the biber and biblatex development trees and will then give this a try. – Christian Sep 01 '12 at 09:33
  • I'm looking into making it a bit easier to extract parts of information from a field for later insertion into another field. I'll update the example if necessary. – PLK Sep 01 '12 at 10:23
  • I have changed the example slightly after improving the functionality of the biber code for this. Now you can grab parts of fields using the usual regexp parentheses and refer to these in later fieldvalue lines. So now you can directly pull out just the lastname of the author to put in the URL (assuming, in the example, a "Last, First" format). I have put this example in the biblatex 2.3 PDF doc as it's quite useful. I have updated the biber/biblatex dev versions with this. – PLK Sep 01 '12 at 16:28
  • The part in the .bbl gets generated correctly but somehow the \DeclareFieldFormat I put in the header is ignored as the title is no link in the generated PDF. There are no warnings or errors :/ – Christian Sep 02 '12 at 14:34
  • You might need an explicit \usepackage{hyperref}? – PLK Sep 02 '12 at 16:47
  • No, forgetting that would have gotten me an error I guess. But I just realized that I have to be more precise: it seems to work for certain entry types but not for others. online, techreport, book, and misc work; article, conference, and thesis don't. – Christian Sep 02 '12 at 16:55
  • That's normal. If an entrytype has a type-specific format, it takes precedence and the ones you say don't work probably have that. For example, to make ARTICLEs work, do: \DeclareFieldFormat[article]{title}{\href{\thefield{autourl}}{#1}} – PLK Sep 02 '12 at 17:44
  • Great, thanks! Now everything seems to be working perfectly :) I added the three \DeclareFieldFormat commands I ended up using to your answer, based on the default configuration of biblatex. – Christian Sep 02 '12 at 22:14
17

Second version

Most of the work can be done using biblatex if we assume that the only tricky characters are spaces. To do the conversion, I've used a special name format which simply saves surnames to a temporary variable using + to separate them. The title is surrounded by \%22 (encoded ""), and the first and last years are set to be the same value (there is no check on the validity of the year).

\begin{filecontents}{\jobname.bib}
@article{test,
  author = {Doe, J. and Other, Arthur N.},
  title  = {Rocket Science},
  year   = {1999},
}
\end{filecontents}
\documentclass{article}
\usepackage{expl3}
\usepackage[backend=bibtex]{biblatex}
\bibliography{\jobname}

\ExplSyntaxOn
\char_set_catcode_space:N \ %
\cs_new_protected:Npn\spacetoplus#1%
  {\tl_greplace_all:Nnn#1{ }{+}}
\ExplSyntaxOff
\makeatletter
\DeclareNameFormat{searchurl}{%
  \ifnumequal{\value{listcount}}{1}
    {}
    {\gappto{\bbx@gtempa}{+}}%
  \xdef\bbx@gtempa{%
    \unexpanded\expandafter{\bbx@gtempa}%
    author\@percentchar 3A%
    \unexpanded{#1}%
  }%
}

\newbibmacro*{url+urldate}{%
  \iffieldundef{url}
    {%
      \savefield{title}{\bbx@gtempa}%
      \xdef\bbx@gtempa{%
        http://scholar.google.com/scholar?q=
        \@percentchar
        22%
        \unexpanded\expandafter{\bbx@gtempa}%
      }
      \xdef\bbx@gtempa{%
        \unexpanded\expandafter{\bbx@gtempa}%
        \@percentchar 22+%
      }
      \printnames[searchurl]{author}%
      \edef\bbx@tempa{&as_ylo=\thefield{year}&as_yhi=\thefield{year}}%
      \xdef\bbx@gtempa{%
        \unexpanded\expandafter{\bbx@gtempa}%
        \unexpanded\expandafter{\bbx@tempa}%
      }%
      \spacetoplus{\bbx@gtempa}%
      \restorefield{url}{\bbx@gtempa}%
    }
    {}%
  \printfield{url}%
  \iffieldundef{urlyear}
    {}
    {\setunit*{\addspace}%
     \printurldate}}
\makeatother
\begin{document}
\cite{test}
\printbibliography
\end{document}

I've loaded expl3 for a prebuild 'replace all' command, but this could be recoded without expl3. As that's not the key point here I've not bothered!

First version

Most of the work required here is to get the data out of biblatex's internal format and correctly escape into a URL string. That's particularly true for the author part, which is tricky as there are various braces to strip. I've decided to tackle this using the experimental LaTeX3 l3str module (edit: in January 2013 encoding functions moved to l3str-convert), which includes code for the URL encoding, along with the general LaTeX3 programming support system to do all of the construction. (You have to do the encoding in bits so that + is left unencoded between fields you are passing.)

\begin{filecontents}{\jobname.bib}
@article{test,
  author = {Doe, J. and Other, Arthur N.},
  title  = {Rocket Science},
  year   = {1999},
}
\end{filecontents}
\documentclass{article}
\usepackage{expl3,l3str-convert}
\usepackage[backend=bibtex]{biblatex}
\bibliography{\jobname}
\ExplSyntaxOn
\str_new:N \__searchurl_search_str
\str_new:N \__searchurl_tmp_str
\tl_new:N \__searchurl_tmp_tl
\cs_new_protected_nopar:Npn \createsearchurl
  {
    \str_set:Nn \__searchurl_search_str
      { http://scholar.google.com/scholar?q= }
    \savefield* { year } { \__searchurl_tmp_tl }
    \cs_if_exist:NT \__searchurl_tmp_tl
      { \str_put_right:NV \__searchurl_search_str \__searchurl_tmp_tl }
    \clist_map_function:nN { title , journal } \__searchurl_add_field:n
    \savename* { author } { \__searchurl_tmp_tl }
    \cs_if_exist:NT \__searchurl_tmp_tl
      { \__searchurl_convert_authors: }
    \restorefield { url } { \__searchurl_search_str }
  }
\cs_new_protected:Npn \__searchurl_add_field:n #1
  {
    \savefield* {#1} { \__searchurl_tmp_tl }
    \cs_if_exist:NT \__searchurl_tmp_tl
      {
        \str_set_convert:NVnn \__searchurl_tmp_str
          \__searchurl_tmp_tl { } { latin1 / url }  
        \str_put_left:Nn \__searchurl_tmp_str { + }
        \str_put_right:NV \__searchurl_search_str \__searchurl_tmp_str   
      }
  }
\cs_new_protected_nopar:Npn \__searchurl_convert_authors:
  {
    \exp_after:wN \__searchurl_convert_authors:nn
      \__searchurl_tmp_tl
  }
\cs_new_protected_nopar:Npn \__searchurl_convert_authors:nn #1#2
  {
    \tl_map_inline:nn {#2} 
      { \__searchurl_convert_authors:nnnnnnnnn ##1 }
  }
\group_begin:
  \char_set_catcode_active:N \~
  \char_set_catcode_space:N \ %
  \cs_new_protected_nopar:Npn\__searchurl_convert_authors:nnnnnnnnn%
    #1#2#3#4#5#6#7#8#9%
    {%
      \tl_set:Nn\__searchurl_tmp_tl{#2}%
      \tl_replace_all:Nnn\__searchurl_tmp_tl{~}{ }% 
      \str_set_convert:NVnn\__searchurl_tmp_str
        \__searchurl_tmp_tl{}{latin1/url}%
      \str_put_left:Nn\__searchurl_tmp_str{+}%
      \str_put_right:NV\__searchurl_search_str\__searchurl_tmp_str
    }%
\group_end:
\cs_generate_variant:Nn \str_set_convert:Nnnn { NV }
\cs_generate_variant:Nn \str_put_right:Nn { NV }
\ExplSyntaxOff

\newbibmacro*{url+urldate}{%
  \iffieldundef{url}
    {\createsearchurl}
    {}%
  \printfield{url}%
  \iffieldundef{urlyear}
    {}
    {\setunit*{\addspace}%
     \printurldate}}

\begin{document}
\cite{test}
\printbibliography
\end{document}

I've build the search to use author surnames only, with any non-breaking spaces converted to normal ones before encoding.

As PLK notes, this could probably be done rather more easily using biber at an earlier stage!

Joseph Wright
  • 259,911
  • 34
  • 706
  • 1,036
  • Will this strip out control sequences like the ones you find in names with accented charaters? – Audrey Aug 30 '12 at 18:54
  • @Audrey I've told the l3str module to use latin1 encoding for the output, so it will escape accented characters, but not control sequence versions (e.g. é is fine, \'{e} will give odd results). – Joseph Wright Aug 30 '12 at 18:57
  • OK. I suspect a comparable solution could be obtained without l3 using \thefield and friends. We can access unformatted name list elements with an indexing directive. For example: \DeclareIndexNameFormat{getname}{\gdef\thelastname{#1}} and \indexnames[getname][-1]{author}. – Audrey Aug 30 '12 at 19:43
  • @Audrey Yes, quite true. I mainly went for expl3 here as I know that there is a URL-encoding function available. I suspect Heiko Oberdiek has one too, for hyperref, etc., but don't know the interface and wanted to get some ideas down. It's not clear from the question exactly how to handle names, so I went for a simple loop to be going on with. – Joseph Wright Aug 30 '12 at 19:45
  • Ah, OK. Is there anything wrong with just using the percent encoding: \def\scholarurl{http://scholar.google.com/scholar?q=\%22\strfield{title}\%22} and \restorefield{url}{\scholarurl}? – Audrey Aug 30 '12 at 19:57
  • @Audrey My understanding is that valid URLs should not contain spaces at all (they should always be encoded). Of course, it may well work (certainly does here with a few quick tests), but I was aiming for reasonably 'robust'. Why not post a solution working this way and the OP can then try both out :-) – Joseph Wright Aug 30 '12 at 20:02
  • Spaces are replaced by +. – Christian Aug 31 '12 at 06:24
  • Wow, this code looks like it took a while. Thank you :) Unfortunately I get a "LaTeX error: "str/unknown-enc" Encoding scheme 'latin1' (filtered: 'latin1') unknown.' Am I doing something wrong? – Christian Aug 31 '12 at 06:26
  • @Christian Probably a version issue with l3str. However, the update avoids all of that, so is probably a lot less risky! – Joseph Wright Aug 31 '12 at 06:55
  • Sorry for the haiku; I added a missing ampersand. The new code works great already and I'm totally amazed. It doesn't handle the two authors well though. The second should have an author: in front of it, too, yielding the string author%3Adoe+author%3Aother. I'll try myself if I can get the url to '\href' the title instead of creating a new url field. This will enhance my understanding of the code I guess. – Christian Aug 31 '12 at 09:44
  • I forgot to explain the haiku thing: http://meta.tex.stackexchange.com/questions/2730/edits-must-be-at-least-6-characters-wtf – Christian Aug 31 '12 at 09:45
  • Wow, this is weirder and harder to tinker with than I thought. Maybe I can do some more tinkering this afternoon. But feel free to just implement it if for you it's a piece of cake ;) – Christian Aug 31 '12 at 09:57
  • @Christian I've adjusted the code to handle each author separately, and to escape the : as requested. – Joseph Wright Aug 31 '12 at 14:54
  • Sorry, totally forgot about the bounty once I activated it. There should be a reminder when you can actually use it ... I don't understand the time limit anyway for bounties that are marked as being for existing answers. Anyway, here you go :) – Christian Sep 06 '12 at 21:12