0

It is common on French URLs to have accents (because French words have accents). The probably most known site that uses these possibility to not restrict URL to ASCII characters is Wikipedia, but also French online dictionaries uses it (as we can see in the examples provided at url hyperref does not work with French accent characters).

Here is a MWE with the examples used in the above URL (that worked at 2013, but not now).

\documentclass{article}
\usepackage{xcolor}
\usepackage[colorlinks,allcolors=blue]{hyperref}

\begin{document} URL 1: \href{http://www.larousse.fr/dictionnaires/francais-anglais/%C3%A9cr%C3%A9mer/27576?q=%C3%A9cr%C3%A9m%C3%A9}{% \nolinkurl{http://www.larousse.fr/dictionnaires/francais-anglais/écrémer/27576?q=écrémé}}

URL 2: \href{https://fi.wikipedia.org/wiki/Andr%C3%A9_Weil} {\nolinkurl{https://fi.wikipedia.org/wiki/André_Weil}} \end{document}

We obtain:

enter image description here

All the "é" have disappeared.

Expected:

http://www.larousse.fr/dictionnaires/francais-anglais/écrémer/27576?q=écrémé

and

https://fi.wikipedia.org/wiki/André_Weil.

Moreover, if I add \usepackage[T1]{fontenc} in the preamble, as here:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{xcolor}
\usepackage[colorlinks,allcolors=blue]{hyperref}

\begin{document} URL 1: \href{http://www.larousse.fr/dictionnaires/francais-anglais/%C3%A9cr%C3%A9mer/27576?q=%C3%A9cr%C3%A9m%C3%A9}{% \nolinkurl{http://www.larousse.fr/dictionnaires/francais-anglais/écrémer/27576?q=écrémé}}

URL 2: \href{https://fi.wikipedia.org/wiki/Andr%C3%A9_Weil} {\nolinkurl{https://fi.wikipedia.org/wiki/André_Weil}} \end{document}

I obtain this weird output:

enter image description here

How can I obtain:

http://www.larousse.fr/dictionnaires/francais-anglais/écrémer/27576?q=écrémé

and

https://fi.wikipedia.org/wiki/André_Weil

also with \usepackage[T1]{fontenc} in the preamble?

(An old document compiled in summer 2021 doesn't have these broken output).

Update:

Ulrike Fischer say in the answer that "url's with non-ascii-chars never worked properly with pdflatex".

But here (on other computer not updated), with TeXlive 2021 (in fact, TeXlive 2022/dev), it works:

enter image description here

When I click on links, the browser displays the right pages.

Proof that it's TeXlive 2021 in my other computer (via TeX Live Utility app): enter image description here

quark67
  • 4,166
  • 1
    It is some bug-feature of 2022, because with TeX live 2021 the result is also correct. – Fran Jan 06 '23 at 09:09
  • 1
    @PaulGaborit if you don't use \nolinkurl in the second argument you get a better output but you loose the option to (line)break the url in various places. – Ulrike Fischer Jan 06 '23 at 11:47

1 Answers1

2

url's with non-ascii-chars never worked properly with pdflatex. Basically the underlying url package has been written at a time, when non-ascii-chars in urls and files names was something that "you shouldn't do" and when using only the url package you always got your current output. hyperref did a quite good job to get some chars like your é working better, but for example the german ß failed too. Now in a current LaTeX non-ascii chars are protected and so you get also with hyperref what you get with the original url command. One can improve the url command a bit but again this doesn't work for the german ß.

% utf8 encoded file!
\documentclass{article}
\usepackage{iftex}
\ifluatex \else 
\usepackage[T1]{fontenc}
\fi
\usepackage{hyperref}

\begin{document}

\makeatletter Original url from url package: \HyOrg@url{grüße} \HyOrg@url{André_Weil} \makeatother

Hyperref url: \url{grüße} \url{André_Weil}

\makeatletter \def\Url@FormatString{% \UrlFont \Url@MathSetup \mathcode"C3="8000 %more needed for other chars ... $\fam\z@ \textfont\z@\font \expandafter\UrlLeft\Url@String\UrlRight \m@th$% }% \makeatother

Improved url: \url{grüße} \url{André_Weil}

\end{document}

Output in texlive 2021 with pdflatex:

enter image description here

Output in texlive 2022 with pdflatex

enter image description here

Output with lualatex in texlive 2022

enter image description here

Ulrike Fischer
  • 327,261
  • Please see my edit. It really worked on older version of TeX Live (version 2021 for example). – quark67 Jan 06 '23 at 10:38
  • 1
    as I wrote (and as my screenshots show): it worked for some chars like your é, it didn't work for my ß (or for cyrillic or for greek ...). – Ulrike Fischer Jan 06 '23 at 10:43
  • No, I'm afraid that it don't work with "é" in my TeX Live 2022 as you can see in the first screenshot in my question (but I would be happy if it work). I'm stay with pdflatex. In 2021 version, it worked. How can we fix this regression? – quark67 Jan 06 '23 at 10:49
  • Sorry but you really read my answer? I explicitly wrote "it worked" (past tense). My screenshots show the changes between texlive 2021 and 2022, and my code also show how you can improve that. – Ulrike Fischer Jan 06 '23 at 10:57
  • Sorry for the mistake, I'm not English native speaker. But why this regression? And also, what will "C3="8000 do in this code? Thanks for your patience. – quark67 Jan 06 '23 at 11:09
  • non-ascii chars are now much safer, you can use them in labels and file names and many other places without problems, but the price is that hyperref can not expand them anymore. Use lualatex if you don't want to struggle with this. – Ulrike Fischer Jan 06 '23 at 11:18
  • I accept this answer because you have take time to help me. I have understood some things. I will probably write an answer with all I have learned, in the next days. – quark67 Jan 09 '23 at 14:50