4

Based on this question, I've managed to colour separately different Hebrew diacritics, called niqqud and teamim. However, recently I've discovered that this only works with vowels and cantillation marks, but not with the dagesh or the shin/sin dot. Am I making a mistake?

Please see my result and sample code below (the dot inside the letter should be yellow):

enter image description here

\documentclass{minimal}
\usepackage{luacolor}
\usepackage[nil,bidi=basic]{babel}
\babelprovide[import=he,main]{hebrew}
\babelfont[hebrew]{rm}[Renderer=Harfbuzz,Script=Hebrew,Language=Default]{Times New Roman}

\begin{document}

\textcolor{blue}{ב}\textcolor{yellow}{ּ}\textcolor{red}{ָ}\textcolor{green}{֑}

\end{document}

  • Looks like it is a ligature, in which case image clipping is a solution. Do \displayfonttable{Times New Roman} with unicodefonttable package, look under Alphabetic Presentation Forms in the output. – Cicada Oct 18 '23 at 06:05

2 Answers2

4

As Cicada pointed out in a comment, this is because the font shaper transforms the sequence of Unicode characters 0x05D1 (bet) and 0x05BC (dages) to 0xFB31, so the color attribute of the dagesh is gone, and you cannot color partially a glyph by simple pdf color specification, you need some clipping or redraw part of the glyph (say only the base letter).

One idea is the have two ב, such that the first one will be colored yellow and will form the ligature, and the second one will be blue and will overlap the first one.

To my knowledge, the ligatures involving diacritics don't change the positioning nor the width of the character, so it should work.

Here is a simple example (I've reduced some of the preamble, babel can take care of things).

\documentclass{article}
\usepackage{luacolor}
\usepackage[provide=*,bidi=basic,hebrew]{babel}
\babelfont{rm}[Renderer=Harfbuzz]{Times New Roman}

\newcommand*\dagesh[1]{\leavevmode\rlap{\textcolor{yellow}{#1^^^^05bc}}\textcolor{blue}{#1}}

\begin{document} \dagesh ב\textcolor{red}{ָ}\textcolor{green}{֢} \end{document}

enter image description here

If we will compare that to a glyph with no colors, we will see that they overlap perfectly

\documentclass{article}
\usepackage{luacolor}
\usepackage[provide=*,bidi=basic,hebrew]{babel}
\babelfont{rm}[Renderer=Harfbuzz]{Times New Roman}

\newcommand*\dagesh[1]{\leavevmode\rlap{\textcolor{yellow}{#1^^^^05bc}}\textcolor{blue}{#1}}

\begin{document} \dagesh ב\textcolor{red}{ָ}\textcolor{green}{֢}\llap{בָּ֢} \end{document}

enter image description here

There are two main problems with this approach. First of all, the horizontal box we are using can break correct kerning, and secondly, the source is mostly unreadable, So let's try to improve both.

Instead of manually color each glyph, we can use lua callbacks to assign color attributes for the characters we will choose. For this example, my aim is to color all base letters in blue, all nikud, maqaf and sof passuq in red, teamim in green and finally dagesh, shin dot and sin dot in yellow.

To do that we will color base letters in blue, ligatures will be colored according to the diacritic involving that ligature, teamim in green and nikud in red.

The following function will insert after each ligature a negative kern equal to its width and the base letter of that ligature, and assign an appropriate color attribute so that luacolor will later color it.

the \pagecolor and the twocolumn option are just for a better picture

\documentclass[twocolumn]{article}
\usepackage{luacolor}
\usepackage[provide=*,bidi=basic,hebrew]{babel}
\babelfont{rm}[Renderer=Harfbuzz]{Times New Roman}

\directlua{

local traverse = node.traverse
local glyph = node.id('glyph')
local set_attribute = node.set_attribute
local has_attribute = node.has_attribute
local insert_after = node.insert_after
local node_new = node.new
local getvalue = oberdiek.luacolor.getvalue
local red = "1 0 0 rg 1 0 0 RG"
local green = "0 1 0 rg 0 1 0 RG"
local blue = "0 0 1 rg 0 0 1 RG"
local yellow = "1 1 0 rg 1 1 0 RG"

local chars = { }

for char = 0x5d0, 0x05f2 do % base letters
    chars[char] = {color = blue}
end
for char = 0x0591, 0x05af do % teamim
    chars[char] = {color = green}
end
for char = 0x05b0, 0x05c7 do % nikud
    chars[char] = {color = red}
end
for char = 0xfb30, 0xfb4a do % dagesh ligatures
    chars[char] = {color = yellow, base = char - 0xf560}
end
for char = 0xfb2a, 0xfb2d do % shin dot ligatures
    chars[char] = {color = yellow, base = 0x05e9}
end

chars[0x05bc] = {color = yellow} % dagesh
chars[0x05f3] = {color = blue} % geresh
chars[0x05f4] = {color = blue} % gershyim
chars[0xfb1d] = {color = red, base = 0x05d9} % yod + hiriq
chars[0xfb1d] = {color = red, base = 0x05f2} % yod yod + patah
chars[0xfb20] = {color = blue} % alt ayin
chars[0xfb2e] = {color = red, base = 0x05d0} % alef patah
chars[0xfb2f] = {color = red, base = 0x05d0} % alef qamats
chars[0xfb4b] = {color = red, base = 0x05d5} % vav holam
chars[0xfb4c] = {color = red, base = 0x05d1} % bet rafa
chars[0xfb4d] = {color = red, base = 0x05db} % kaf rafa
chars[0xfb4e] = {color = red, base = 0x05e4} % pe rafa
chars[0xfb4f] = {color = blue} % alef + lamed

chars[1180354] = {color = red, base = 0x05da} % ???
chars[1180355] = {color = red, base = 0x05da} % ???
chars[1180356] = {color = red, base = 0x05dc} % ???

local function color_char(hlist)
    for n in traverse(hlist) do
        local id = n.id
        if id == glyph then
            local char =  n.char
            local data = chars[char]
            if data then
                local color = data.color
                if color then
                    local color_val = getvalue(color)
                    local color_attr = luatexbase.attributes["LuaCol@Attribute"]
                    set_attribute(n, color_attr, color_val)
                end
                local base = data.base
                if base then
                    local newn = node_new("glyph")
                    newn.font = font.current()
                    newn.char = base
                    %
                    local startactual = node_new("whatsit", "pdf_literal")
                    startactual.data = "/Span<</ActualText<>>>BDC"
                    startactual.mode = 1
                    %
                    local endactual = node_new("whatsit", "pdf_literal")
                    endactual.data = "EMC"
                    endactual.mode = 1
                    %
                    local newk = node_new("kern")
                    newk.kern = -newn.width
                    %
                    insert_after(hlist,n,newk)
                    insert_after(hlist,newk,startactual)
                    insert_after(hlist,startactual,newn)
                    insert_after(hlist,newn,endactual)
                end
            end
        end
    end
end

luatexbase.add_to_callback('pre_linebreak_filter',
    function(h)
        color_char(h)
        return true
    end,
'color_char')
luatexbase.add_to_callback('hpack_filter',
    function(h)
        color_char(h)
        return true
    end,
'color_char')
}

\begin{document}
\pagecolor{black}\noindent
וּֽבְקֻצְרְכֶם֙ אֶת־קְצִ֣יר אַרְצְכֶ֔ם לֹ֧א תְכַלֶּ֛ה פְּאַ֥ת שָׂדְךָ֖ לִקְצֹ֑ר וְלֶ֥קֶט קְצִֽירְךָ֖ לֹ֥א תְלַקֵּֽט׃
וְכַרְמְךָ֙ לֹ֣א תְעוֹלֵ֔ל וּפֶ֥רֶט כַּרְמְךָ֖ לֹ֣א תְלַקֵּ֑ט לֶֽעָנִ֤י וְלַגֵּר֙ תַּעֲזֹ֣ב אֹתָ֔ם אֲנִ֖י יְהֹוָ֥ה אֱלֹהֵיכֶֽם׃
וְלֹֽא־תִשָּׁבְע֥וּ בִשְׁמִ֖י לַשָּׁ֑קֶר וְחִלַּלְתָּ֛ אֶת־שֵׁ֥ם אֱלֹהֶ֖יךָ אֲנִ֥י יְהֹוָֽה׃
לֹֽא־תַעֲשֹׁ֥ק אֶת־רֵֽעֲךָ֖ וְלֹ֣א תִגְזֹ֑ל לֹֽא־תָלִ֞ין פְּעֻלַּ֥ת שָׂכִ֛יר אִתְּךָ֖ עַד־בֹּֽקֶר׃
לֹא־תְקַלֵּ֣ל חֵרֵ֔שׁ וְלִפְנֵ֣י עִוֵּ֔ר לֹ֥א תִתֵּ֖ן מִכְשֹׁ֑ל וְיָרֵ֥אתָ מֵּאֱלֹהֶ֖יךָ אֲנִ֥י יְהֹוָֽה׃
לֹא־תַעֲשׂ֥וּ עָ֙וֶל֙ בַּמִּשְׁפָּ֔ט לֹא־תִשָּׂ֣א פְנֵי־דָ֔ל וְלֹ֥א תֶהְדַּ֖ר פְּנֵ֣י גָד֑וֹל בְּצֶ֖דֶק תִּשְׁפֹּ֥ט עֲמִיתֶֽךָ׃
לֹא־תֵלֵ֤ךְ רָכִיל֙ בְּעַמֶּ֔יךָ לֹ֥א תַעֲמֹ֖ד עַל־דַּ֣ם רֵעֶ֑ךָ אֲנִ֖י יְהֹוָֽה׃
לֹֽא־תִשְׂנָ֥א אֶת־אָחִ֖יךָ בִּלְבָבֶ֑ךָ הוֹכֵ֤חַ תּוֹכִ֙יחַ֙ אֶת־עֲמִיתֶ֔ךָ וְלֹא־תִשָּׂ֥א עָלָ֖יו חֵֽטְא׃
לֹֽא־תִקֹּ֤ם וְלֹֽא־תִטֹּר֙ אֶת־בְּנֵ֣י עַמֶּ֔ךָ וְאָֽהַבְתָּ֥ לְרֵעֲךָ֖ כָּמ֑וֹךָ אֲנִ֖י יְהֹוָֽה׃
\end{document}

enter image description here

A few notes.

  • The function is called after the font shaper, which is something usually not recommended, but in this case we need to assign colors to the ligatures, which exists only after that stage.

  • each base letter inserted after a ligature is wrapped with /ActualText span so that it will not be accounted when extracting text from the PDF.

  • Some glyphs at that stage are not unicode characters anymore. I'm not sure what their character code mean.

  • The example text was taken from this answer, which is on the same topic.

  • if you will want shin dot and dagesh in different colors, you will have to overlap three glyphs. A glyph for each color.

Udi Fogiel
  • 3,824
  • A question that I've posted also here: As I don't know its precise mechanics, hence whether that mattered either, but if I copy the letter with dagesh from the PDF and check its Unicode, it returns a value from the 05DX-05EX block and not the ligatures with dagesh in the FB3X-FB4X block. – Kazi bácsi Oct 22 '23 at 10:29
  • As you've written, the first approach slightly influences kerning, but in my case the result is satisfactory. I need time to understand your second solution, as I'm not familiar with Lua programming. – Kazi bácsi Oct 22 '23 at 10:48
  • @Kazibácsi, As for the first comment, I really don't think this is related. There are quite a few questions about how copy-paste from pdf is working on this site. If you don't find something that is answering your question, you can always as a new one. – Udi Fogiel Oct 22 '23 at 22:06
  • @Kazibácsi Personally I would go with the second solution, just so that the source of the document would be readable. Adding three commands per letter can be quite annoying, unless you have a short text. – Udi Fogiel Oct 22 '23 at 22:08
  • @Kazibácsi I've edited the second approach so that the base letters of the ligatures won't be accounted when extracting text from the PDF. – Udi Fogiel Nov 07 '23 at 12:34
1

You can add a kern to stop harfbuzz combining the glyphs in the blue area, but that messes up the accent positioning for the accents that go below so you need another kern to compensate.

enter image description here

\documentclass{minimal}
\usepackage{luacolor}
\usepackage[nil,bidi=basic]{babel}
\babelprovide[import=he,main]{hebrew}
\babelfont[hebrew]{rm}[Renderer=Harfbuzz,Script=Hebrew,Language=Default]{Times New Roman}

\begin{document}

\textcolor{blue}{ב}\kern 0pt\textcolor{yellow}{ּ}\kern-2pt\textcolor{red}{ָ}\textcolor{green}{֑}

\end{document}

David Carlisle
  • 757,742
  • Although I appreciate your effort, sometimes you also need to vertically adjust the positioning, which adds even more to the cumbersomeness of your workaround. Maybe if you enlarge this, you can see the difference: בּ vs. שּׁ. – Kazi bácsi Oct 17 '23 at 16:45
  • 1
    @Kazibácsi yes sorry I meant to add that I can't read Hebrew and I have no cultural appreciation of where the things should go. I just moved it 2pt to the left as that looked plausible to my English reading eyes. It hopefully might form the basis of a more complete solution but I don't think you should trust me to do any finer adjustments. – David Carlisle Oct 17 '23 at 16:47
  • @Kazibácsi if no one else answers with a better solution and you do extend this with finer adjustments please post and accept a more complete answer, I really don't need the green tick – David Carlisle Oct 17 '23 at 16:49