0

Here is the input:

\documentclass[12pt]{article}

\usepackage[T1]{fontenc} \usepackage[latin1]{inputenc} % my emacs saves this file with latin1 encoding \usepackage{url}

\begin{document}

\urldef{\urlex}\url{https://e.g/page#§}

\noindent The § sign appears as \u{g} in the URL: \begin{itemize} \item The URL: \urlex \item A § sign in tt font: \texttt{§} \end{itemize}

\end{document}

And here the rather surprising result from a pdflatex run:

enter image description here

Note that Emacs properly saves the file latin1 encoded. It does not seem to be a problem with inputenc as the paragraph sign is properly set in the cm font and also outside of the URL in tt font. Does url fiddle around with the input encoding?

Any idea?

ThomasS
  • 61
  • 2
  • Welcome to TeX SX! Do you know it's been almost three years that LaTeX expects utf8 input encoding by default? – Bernard Dec 19 '21 at 14:51
  • Workaround: use percent encoding for the URL https://tex.stackexchange.com/questions/87205/special-character-in-url-link – user202729 Dec 19 '21 at 15:00
  • Thanks!
    (1) utf8 does not solve the problem but introduces new issues; (2) I don't want hyperref because it also produces links from other references.
    – ThomasS Dec 19 '21 at 15:10
  • Looks like duplicate of https://tex.stackexchange.com/questions/406762/pdflatex-breakurl-and-unicode-characters — I didn't try but you try it out and report if it solves your problem. – user202729 Dec 19 '21 at 15:11
  • @Bernard: in principle, I agree. But with UTF-8 encoding, I also break my German Umlaute in \url{}. That's why I stick with ISO-Latin-1. – ThomasS Dec 20 '21 at 05:34

1 Answers1

1

url has been written at a time when url's contained only ascii character. It doesn't handle input outside the ascii range. Input there will be passed simply through. $ has (in latin1) the code "A7, and at that position T1-encoding has the ğ.

You get § if you use as input Ÿ (as long as you use latin1 as input encoding, with utf8 it won't work):

\documentclass[12pt]{article}

\usepackage[T1]{fontenc} \usepackage[latin1]{inputenc} % my emacs saves this file with latin1 encoding

\begin{document}

\urldef{\urlex}\url{https://e.g/page#Ÿ} \noindent The § sign appears as \u{g} in the URL: \begin{itemize} \item The URL: \urlex \item A § sign in tt font: \texttt{§} \end{itemize}

\end{document}

Ulrike Fischer
  • 327,261
  • Shouldn't proper urlencoding work, though? Meaning, use %C2%A7 in place of the section sign. – Ingmar Dec 19 '21 at 15:48
  • @Ingmar well the OP doesn't want a link, but to print, so how to get the output is the main question. – Ulrike Fischer Dec 19 '21 at 15:56
  • +1: Would \usepackage{xurl} not also work? – Dr. Manuel Kuehner Dec 19 '21 at 16:31
  • @DrManuelKuehner look in the xurl source (ctan for example), it only adds extra line break point – daleif Dec 19 '21 at 19:05
  • @UlrikeFischer: thanks. That's exactly the pragmatic solution that I need. Cork encoding has the paragraph sign at "9F and using ISO-Latin-1 encoding in emacs is mute to this caracter, it doesn't encode it. So it appears as "9F in the .tex file and LaTeX converts it into the paragraph sign :-). – ThomasS Dec 20 '21 at 05:37