1

Biber changes ASCII-TeX accents into their respective Unicode characters and while this works wonderfully well for most letters and accents -- it is a problem for accents on the "dotless i".

The problem resides on the fact that popular fonts like Times New Roman have coverage for the Unicode equivalent of

\u{i}
\"{i}

and do not have coverage for the Unicode equivalents of the "dotless"

\u{\i}
\"{\i}

even though under visual inspection they look exactly the same, as it can be seen on this MWE:

\documentclass{report}

\RequirePackage{fontspec}
\setmainfont{Times New Roman}

\begin{document}

Accent on dotless with ASCII TeX: Ole\u\i nik,  Ha\"\i ssinsky

Accent on dotted i: Olĭnk, Haïssinsky

Accent on dotless i: Oleı̆nik, Haı̈ssinsky

\end{document}

Some of this is covered in EGreg's answer to here. The same problem happens with several other accents of dotless-i, but amazingly not with accent acute (\'{\i}).

I have a few questions on the subject:

How come the the ASCII-TeX input works for \u{\i} if there is no coverage for this character on the font? Is TeX making a substitution and using the dotted character in the final output?

Observe that converting sources to accents-on-dotted-i will obviously work for LaTeX and produce the right look into the PDF, but it is the wrong thing to do, since the sources may be used by other programs besides TeX or the sources may be coming from places that are not willing to make a wrong-change.

For an example on how biber deal with it use:

\documentclass{report}
\RequirePackage{fontspec}
\usepackage{filecontents}
\setmainfont{Times New Roman}

\usepackage{biblatex}
\addbibresource{\jobname.bib}

\begin{filecontents}{\jobname.bib}
@unpublished{a,
    author = {Ole\u\i nik and  Ha\"\i ssinsky and Sina\^\i},
    title = {These display fine (dotted i):  Ole\u{i}nik and  Ha\"i{}ssinsky and Sina\^i},
}
\end{filecontents}

\begin{document}

\nocite{*}
\printbibliography

\end{document}
Paulo Ney
  • 2,405
  • 3
    your descriptions are a bit misleading "Accent on dotted i" is a single character U+00ef LATIN SMALL LETTER I WITH DIAERESIS and "Accent on dotless i" is two characters U+0131 LATIN SMALL LETTER DOTLESS I , U+0308 COMBINING DIAERESIS – David Carlisle Apr 22 '18 at 20:17
  • 2
    Note that the two Unicode forms are not wrong or right, they are canonically equivalent, the single character version is in normal form "C" (NFC) and the two character version is in NFD, generally speaking NFC form is easier to handle as it does not required correct use of combining characters, but a full Unicode renderer will handle either form. – David Carlisle Apr 22 '18 at 20:24
  • An example with a bib entry for a document using biblatex would make it possible to diagnose the issue, which seem to depend on how biber deals with \u{\i} – egreg Apr 22 '18 at 20:54
  • @egreg, I edited to include a full biber example. – Paulo Ney Apr 22 '18 at 21:17
  • I get the expected output with Biber version 2.10 and biblatex version 3.10. – egreg Apr 22 '18 at 21:21
  • 1
    I generally find that Biber only works properly if I use unicode characters in the .bib rather than macros. However, that's only for pdfTeX, which you're not using here and I'm not sure it would apply to a unicode engine. – cfr Apr 22 '18 at 21:40
  • 1
    There are way too many questions here for one question! – cfr Apr 22 '18 at 21:41
  • @egreg, indeed I am using TL 2017 that contain Biber 2.7 and biblatex 3.7. Will upgrade ... Any hunches on questions 4 and 5? – Paulo Ney Apr 22 '18 at 21:48
  • @PauloNey Biber only deals with particularly formatted files. – egreg Apr 22 '18 at 21:49
  • @egreg Yeahhh! It will not be good to jamm in and entire TeX file as the title of a paper, but depending on the routine it could be used independently. I have tried placing some rather complex material in a .bib file and it seem to tackle it very well. – Paulo Ney Apr 22 '18 at 21:55
  • @egreg I just upgraded to biber 2.10 and biblatex 3.10 and I still see the same boxes in the place of the accented characters. I am using Times New Roman from the MS TTF Core Fonts packages. Could that be the reason? – Paulo Ney Apr 23 '18 at 00:16
  • @egreg I also installed "Times New Roman MT Std" from Adobe and with that one I get 4 characters missing (one more), so I would be deeply interested in what Time Roman exactly are you using. – Paulo Ney Apr 23 '18 at 01:01
  • Here is what I get: https://i.imgur.com/P3TbM0z.png – Paulo Ney Apr 23 '18 at 01:16
  • On my Windows 10 system the Times New Roman font (times.ttf, Monotype Co., version 6.98) displays all involved characters as expected. Have a look at https://tex.stackexchange.com/q/251261/35864 and all the linked questions for Biber and the dotless i. – moewe Apr 23 '18 at 07:31
  • I absolutely agree with cfr here: You are asking way to many things in one question. – moewe Apr 23 '18 at 07:34
  • @PauloNey Do you get any “missing character” warning in the log file? – egreg Apr 23 '18 at 08:16
  • @egreg, Yes, I do get the report for the missing characters - like in: Missing character: There is no ̆ in font [times.ttf]/OT:mapping=tex-text;! – Paulo Ney Apr 23 '18 at 15:42
  • I guess @moewe gave the hint that solved the problem. It is tied to the version of the Times NR font from Monotype, and users of MS Windows will never see the problem which is Linux bound. I'll write a full answer. – Paulo Ney Apr 23 '18 at 16:11
  • @cfr I'll edit the question and move some of them out to new questions. – Paulo Ney Apr 23 '18 at 16:46
  • @PauloNey How can you typeset the name if the font doesn’t support the necessary glyphs? – egreg Apr 23 '18 at 16:52
  • @egreg - Well ... LaTeX typesets \u{\i} just fine, even with fonts that do not have this glyph. – Paulo Ney Apr 23 '18 at 17:00
  • @PauloNey I'd say this is not true: if a font doesn't have the precomposed ĭ or the breve accent (and the dotless i), the character will not be typeset. – egreg Apr 23 '18 at 17:17
  • @egreg The first file above (in the question) together with TimesNR from the MSTTCoreFonts package is an exact example of this. \"{\i} gets typeset fine, but not ı̈. – Paulo Ney Apr 23 '18 at 17:22
  • @PauloNey Blame the CoreFonts package and use TeX Gyre Termes. – egreg Apr 23 '18 at 17:25
  • @egreg That is problematic ... specially when you have other fonts that are made to blend with Times NR and do not blend well with Termes. The solution is simply get the new fonts and I mentioned in the answer, but I am still puzzled by WHY LaTeX typesets on string and not the other -- since they are exactly the same glyph on the font. – Paulo Ney Apr 23 '18 at 17:31
  • My hunch is that in the case of \"\i is that the font had the dotless-i and has the umlaut and put the two of them together and in the case of ı̈ the font does not have the particular glyph and TeX gives up. If this is indeed the case - it could be fixed. TeX knows how to put the characters together even if the ready-glyph is not there. – Paulo Ney Apr 23 '18 at 17:41

1 Answers1

1

This is a partial answer on why some accents are not showing, but not a full answer to the question brought up here. I still do not know why \u{\i} is typeset correctly even though the font is missing the glyph.

The problem of the glyphs entered in Unicode not being typeset properly is due to an old version of Times New Roman being distributed in the Microsoft TrueType Core Fonts package. The Version of Times NR being installed by that package is 2.95 which is too old and does not have the coverage for the accents on dotless-i.

From what I gather these old version of the fonts were distributed with a loose (or lousy) license and Microsoft has tightened it since.

What can you do to fix the problem?

  1. MS Windows Users of (newer installations) MS Windows have the access to a fairly recent version of the font that do not show the problem.

  2. Linux dual-boot You have the new fonts sitting in the Windows partition, just do a locate times.ttf and you can see it. Before copying them over to the Linux partition or using it within Linux -- you should consult the license.

  3. Linux by itself Buy a recent version of the fonts from the foundry or a distributor like fonts.com

Paulo Ney
  • 2,405