I am writing a thesis and I have made extensive use of hyperlinking through \url or \url within a \footnote. Now I would like to add a list (kind of like a simplified bibliography where each link appears only once) of all hyperlinks within the document. Is there a way to have that automatically generated for me?
- 495
2 Answers
The following example uses
hyperref(the question has mentioned "hyperlinking") and hooks into\hyper@linkurlto get the URLs.The catched URLs are written into an index file
\jobname-url.idx:\urlentry{<hex coded URL>}{<page number>}The URL are hex encoded to avoid trouble with special characters.
Package
filecontentshelps to create a style file\jobname-url.mstformakeindex. Makeindex automatically looks for a file with the same name as the input file, but with extension.mstas style file. Then only the.idxfile needs to be given as argument formakeindex.Makeindex generates the file
\jobname-url.ind:\begin{theurls} \urlitem{<hex coded URL>}{<page list>} ... \end{theurls}Environment
theurlsand\urlitemare defined appropriately to print the list of urls.\listurlnamecontains the title of the section.
Remarks:
- Makeindex takes care of the sorting and removes duplicates.
- Hooking into
\hyper@linkurlhas the advantage, that the URL is normalized (e.g.,%and\%are the same, a%with catcode 12/other). - Hex encoding has the advantage, that special characters such as percent, hash or characters with special meaning for makeindex (at sign, ...) do not need a special treatment.
Example file:
\RequirePackage{filecontents}
\begin{filecontents*}{\jobname-url.mst}
% Input style specifiers
keyword "\\urlentry"
% Output style specifiers
preamble "\\begin{theurls}"
postamble "\\end{theurls}\n"
group_skip ""
headings_flag 0
item_0 "\n\\urlitem{"
delim_0 "}{"
delim_t "}"
line_max 500
\end{filecontents*}
\documentclass{article}
\usepackage[colorlinks]{hyperref}
\usepackage{pdfescape}
\makeatletter
\newwrite\file@url
\openout\file@url=\jobname-url.idx\relax
\newcommand*{\write@url}[1]{%
\begingroup
\EdefEscapeHex\@tmp{#1}%
\protected@write\file@url{}{%
\protect\urlentry{\@tmp}{\thepage}%
}%
\endgroup
}
\let\saved@hyper@linkurl\hyper@linkurl
\renewcommand*{\hyper@linkurl}[2]{%
\write@url{#2}%
\saved@hyper@linkurl{#1}{#2}%
}
\newcommand*{\listurlname}{List of URLs}
\newcommand*{\printurls}{%
\InputIfFileExists{\jobname-url.ind}{}{}%
}
\newenvironment{theurls}{%
\section*{\listurlname}%
\@mkboth{\listurlname}{\listurlname}%
\let\write@url\@gobble
\ttfamily
\raggedright
}{%
\par
}
\newcommand*{\urlitem}[2]{%
\hangindent=1em
\hangafter=1
\begingroup
\EdefUnescapeHex\@tmp{#1}%
\expandafter\url\expandafter{\@tmp}%
\endgroup
\par
}
\makeatother
\usepackage[T1]{fontenc}
\usepackage[variablett]{lmodern}
\begin{document}
This this file answers the
\href{http://tex.stackexchange.com/q/121977/16967}{question}
on \href{http://tex.stackexchange.com/}{\TeX.SE}.
Further examples for URLs:
\url{http://www.dante.de/}\\
\url{http://www.ctan.org/}\\
\url{mailto:me@example.org/}\\
\url{ftp://ftp.dante.de/pub/tex/}\\
\url{http://www.example.com/\%7efoo/index.html}\\
\url{http://www.example.com/%7efoo/index.html}
\printurls
\end{document}
The following commands generate the result (linux/bash):
$ pdflatex test
Generates test-url.mst and test-url.idx.
$ makeindex test-url
Generates test-url.ind.
$ pdflatex test
Update for page numbers
There are many formatting ways for the page numbers. The following example uses dots to separate the URL from the page numbers that appear at the end of the line (similar to the index of package doc). As requested the page numbers are prefixed with p., if only one page number follows and pp. otherwise. This is implented with the help of package xstring by testing the page number list, whether it contains a comma separator or a hyphen from a range specifier.
\RequirePackage{filecontents}
\begin{filecontents*}{\jobname-url.mst}
% Input style specifiers
keyword "\\urlentry"
% Output style specifiers
preamble "\\begin{theurls}"
postamble "\n\\end{theurls}\n"
group_skip ""
headings_flag 0
item_0 "\n\\urlitem{"
delim_0 "}{"
delim_t "}"
line_max 500
\end{filecontents*}
\documentclass{article}
\usepackage[colorlinks]{hyperref}
\usepackage{pdfescape}
\usepackage{xstring}
\makeatletter
\newwrite\file@url
\openout\file@url=\jobname-url.idx\relax
\newcommand*{\write@url}[1]{%
\begingroup
\EdefEscapeHex\@tmp{#1}%
\protected@write\file@url{}{%
\protect\urlentry{\@tmp}{\thepage}%
}%
\endgroup
}
\let\saved@hyper@linkurl\hyper@linkurl
\renewcommand*{\hyper@linkurl}[2]{%
\write@url{#2}%
\saved@hyper@linkurl{#1}{#2}%
}
\newcommand*{\listurlname}{List of URLs}
\newcommand*{\printurls}{%
\InputIfFileExists{\jobname-url.ind}{}{}%
}
\newenvironment{theurls}{%
\section*{\listurlname}%
\@mkboth{\listurlname}{\listurlname}%
\let\write@url\@gobble
\ttfamily
\raggedright
\setlength{\parfillskip}{0pt}%
}{%
\par
}
\newcommand*{\urlitem}[2]{%
\hangindent=1em
\hangafter=1
\begingroup
\EdefUnescapeHex\@tmp{#1}%
\expandafter\url\expandafter{\@tmp}%
\endgroup
\urlindex@pfill
\IfSubStr{#2}{,}{pp}{%
\IfSubStr{#2}{-}{pp}{p}%
}.\@\space\ignorespaces
#2%
\par
}
\newcommand*{\urlindex@pfill}{% from \pfill of package `doc'
\unskip~\urlindex@dotfill
\penalty500\strut\nobreak
\urlindex@dotfil~\ignorespaces
}
\newcommand*{\urlindex@dotfill}{% from \dotfill of package `doc'
\leaders\hbox to.6em{\hss .\hss}\hskip\z@ plus 1fill\relax
}
\newcommand*{\urlindex@dotfil}{% from \dotfil of package `doc'
\leaders\hbox to.6em{\hss .\hss}\hfil
}
\makeatother
\usepackage[T1]{fontenc}
\usepackage[variablett]{lmodern}
\begin{document}
This this file answers the
\href{http://tex.stackexchange.com/q/121977/16967}{question}
on \href{http://tex.stackexchange.com/}{\TeX.SE}.
Further examples for URLs:
\url{http://www.dante.de/}\\
\url{http://www.ctan.org/}\\
\url{mailto:me@example.org/}\\
\url{ftp://ftp.dante.de/pub/tex/}\\
\url{http://www.example.com/\%7efoo/index.html}\\
\url{http://www.example.com/%7efoo/index.html}
% further pages to generate more page numbers for testing the url index
\newpage
\url{http://www.ctan.org}
\newpage
\url{http://www.ctan.org}
\url{http://tex.stackexchange.com/}
\newpage
\printurls
\end{document}
- 757,742
- 271,626
-
1
-
-
I guess adding a
\space(p. #2)to\urlitemis a minimal idea. I'd like to have "pp." when there are multiple pages though... seems much more complicated though. Hm... not so bad after all: http://tex.stackexchange.com/questions/26870/check-if-a-string-contains-a-given-character – Joe Corneli Jul 03 '13 at 14:34 -
@JoeCorneli: I have updated the answer and added an example with page numbers that are preceded by
p.orpp.. – Heiko Oberdiek Jul 03 '13 at 20:18 -
Heiko: Would you be willing to comment on a related but more complex example? http://tex.stackexchange.com/questions/146954/makeindex-print-links-sorted-by-page – Joe Corneli Nov 26 '13 at 14:34
-
2
-
@wilx In theory yes, it could make a new package, but currently I do not even have enough time for important updates of existing packages. At least I have put it on the ToDo list. – Heiko Oberdiek Dec 29 '15 at 05:18
-
1For those who want to have the index made and added automatically via
latexmk(e.g., for sites like Overleaf), add the following to your code after the\InputIfFileExistsline:\wlog{} \wlog{Writing index file \jobname-url.idx}– Chris Gregg Dec 10 '17 at 19:11 -
@HeikoOberdiek I have to agree with @wilx. Having this as a full blown package would be amazing. Especially if there's stuff that autodetects the right type of heading (like
/chapterformemoirinstead of/sectionfor most classes). So is there any updated related to that? – BrainStone Jul 12 '18 at 14:15 -
This is super cool! For me it is picking up bibliography URLs, which is undesired, is it possible to limit the portion of the document it lists? – Andrew Hundt Oct 24 '21 at 14:33
Atention: the following code works only for simple URLs, that is, URLs that do not contain special characters, like %. For a complete solution, please refer to Heiko's answer.
As Nicola mentioned in the comments, redefining \url might be an interesting idea, but some characters in the URL might cause problems. Sadly my TeX-fu isn't good enough to overcome this issue, but here's a preliminary start:
\documentclass{article}
\usepackage{url}
\usepackage{imakeidx}
\let\originalurl\url
\makeindex[name=urls, title={Links found in this document}, columns=1]
\renewcommand{\url}[1]{\originalurl{#1}\index[urls]{\protect\originalurl{#1}}
}
\begin{document}
Hello, make sure to visit \url{http://www.google.com} and,
of course, our own place \url{http://tex.stackexchange.com}.
By the way, \url{http://tex.stackexchange.com} is awesome!
\printindex[urls]
\end{document}
The list is then generated:

Hope it helps. :)
- 44,220
-
1This will have problems with special characters in the URL, particularly
%. But +1 anyway. – egreg Jul 01 '13 at 17:46 -
2@egreg: I know, but I have no idea on how to solve it.
:)Is "Make sure to use URL shortening services for all URLs." a good excuse?:)– Paulo Cereda Jul 01 '13 at 18:00 -
1Dear downvoter, I really would like your help to improve my answer. Could you write down a few suggestions on what should I do in order to make my answer more adherent to a valid solution? Thanks.
:)– Paulo Cereda Jul 02 '13 at 12:26 -
-
+1 for the idea and the elegant clear simple approach (with the mentioned restriction). – Heiko Oberdiek Jul 03 '13 at 19:27


\urls in your document? – Werner Jul 01 '13 at 16:53\urlto include a\indexentry? That way, we can have a list of URL's followed by their occurrences.:)– Paulo Cereda Jul 01 '13 at 17:01\newcommand{\url}{\url \index}? – typR Jul 01 '13 at 17:09\let\oldurl\url\renewcommand{\url}[1]{\oldurl{#1}\index{#1}}but if any of your urls contain characters such as~or@extra coding is required. – Nicola Talbot Jul 01 '13 at 17:23