Smart URL Shortening

Question

I am writing text that primarily describes a lot of URLs. The problem is that many of these URLs can be very long.

Fortunately, >95% of my readers will use a on-screen pdf reader, not a printout. So, I want a solution that makes it clear to print readers that my URL is incomplete (not allowing them to click), and to screen readers that they can just clicks to go to the deep page.

The best simple solution I can think of is that \http{www.example.com/morelevel/andmorelevel/andyetmorelevel/andmore/andmore/andmore} should produce something like \href{www.example.com/morelevel/andmorelevel/andyetmorelevel/andmore/andmore/andmore}{\uline{www.example.com/+}}. If there are no deeper links, the '+' would be omitted. (would one use xstrings.sty to implement this?)

A more sophisticated solution would endnote each unique URL (i.e., superscript increasing number), and produce an index-like alpha-sorted or page-sorted list at the end of the document for print readers (\href{www.example.com/morelevel/andmorelevel/andyetmorelevel/andmore/andmore/andmore}{\uline{www.example.com}$^\arabic{urlcounter}$}) or something like it.

I also may or may not like URL breaking and/or display of the http://.

These are not bibtex questions, but main text questions. I am also probably not the only one who has encountered this need.

May I ask how others have solved this?

Mico · Answer 1 · 2021-04-25T14:00:40.173

Here's a LuaLaTeX-based solution to the questions you ask in both the "The best simple solution..." paragraph and the "A more sophisticated solution would" paragraph. The main user macro is called \hx; it invokes a Lua function called short_href that does most of the work. (I deliberately didn't include the \uline part of your request. If you want the URL strings to be typeset in the main document font rather than in a monospaced font, remove both \ttfamily directives and add the instruction \urlstyle{same} in the preamble.)

The truncated URLs will look like this:

and the endnotes page with the list of non-truncated URLs will look like this:

The red numerals at the start of each line are back-references to the corresponding \hx directives.

% !TEX TS-program = lualatex
\documentclass{article}
\usepackage{xurl} % or 'url', if arbitrary line breaks in URL strings not allowed
\usepackage{enotez}
%% Set suitable package options:
\setenotez{list-name = {List of complete URL strings}, backref = true}
\usepackage{hyperref}
\hypersetup{colorlinks,urlcolor=blue} % select suitable options
\newcounter{mycounter}
\renewcommand{\themycounter}{\romannumeral\value{mycounter}}
\usepackage{luacode} % for 'luacode' environment and '\luastringN' macro
\begin{luacode}

local t,u
function short_href ( s )
   t = s 
   if t:find ( "^http[s]?://" ) then -- Remove prefix substring from 't'
      u = t:sub ( 1 , t:find ( "/" ) ) 
      t = t:sub ( string.len ( u ) + 2 ) 
   end 
   u = t:sub ( 1 , t:find ( "/" ) ) -- Retrieve main part of URL
   if string.len ( u ) == string.len (t) then
      -- No truncation necessary; hence, generate a simple \href directive
      tex.sprint ( "\\href{" .. s .. "}{\\ttfamily " .. u .. "}" )
   else -- Truncation of displayed URL is necessary
      tex.sprint ( "\\stepcounter{mycounter}" ) 
      -- See https://tex.stackexchange.com/a/594361/5001 for further details.
      -- A. Generate an \urldef command (for use with \endnote, see next command):
      tex.sprint ( "\\expandafter\\urldef\\csname zz\\themycounter\\endcsname\\url{".. s .."}" )
      -- B. Generate \href _and_ \endnote commands:
      tex.sprint ( "\\href{" .. s .. "}{\\ttfamily " .. u .. "+}" .. 
                   "\\expandafter\\endnote\\expandafter{\\csname zz\\themycounter\\endcsname}" )
   end
end

\end{luacode*}

%% LaTeX utility macro:
\newcommand\hx[1]{\directlua{short_href(\luastringN{#1})}}


\begin{document}
\hx{https://tex.stackexchange.com/questions/594302/smart-url-shortening}

\hx{https://www.google.com/}

\hx{https://www.sciencedirect.com/science/article/abs/pii/S0304405X00000763}\

\newpage
\printendnotes
\end{document}

lualatex is nice and well-suited here, but it can slow some document compilation times quite significantly. https://tex.stackexchange.com/questions/261472/lualatex-vs-pdflatex-speed-in-texlive-2019/261475#261475 — ivo Welch, Apr 27 '21 at 04:40
@ivoWelch - For sure, if a document compiles fine under pdfLaTeX, there's little to no need to switch to LuaLaTeX. What''s crucial here is the if-clause. I've been using LaTeX for about 30 years now to write my own papers, and I daresay that I'm quite good at it. However, I've never truly gotten the hang of TeX's macro expansion stuff. When LuaTeX arrived on the scene, I happliy started using Lua's string functions to handle string manipulation work. Compilation under LuaLaTex is a tad slow. But that's far outweighed by my ability to actually get real stuff done in the first place. — Mico, Apr 27 '21 at 09:19
LuaTeX also includes the LuaSocket library which provides url.parse. That'll be more robust than str:find and also correctly unescapes things like %20. — Henri Menke, Apr 30 '21 at 10:48
@HenriMenke - If you have something constructive and/or useful to say, post a separate answer. If you don't, stop posting comments. — Mico, May 01 '21 at 08:15

Smart URL Shortening

1 Answers1

Linked