Make icon fonts in PDF searchable by name

Question

Nowadays, icon fonts are quite popular on the web. There have been latex bindings to these icon fonts through packages such as academicons, fontawesome5 and fontmfizz.

This brings in new possibilities such as using them on documents such as resumes to indicate computing-related skills. While being visually pleasing, there are two issues with them

The words extracted from the PDF by an Applicant Tracking System (ATS) eg. by using a pdftotext is non-parseable
More importantly, even if a human recruiter manually reads the PDF of the CV, they won't be able to search for any keywords for job descriptions currently handled by them.

This renders the entire skills section nearly useless and running the risk of a suitable applicant not making it to the interview stage, diminshing the value of icon fonts to mere eye candy.

I think embedding an alt-text is a great workaround for such cases for which I tried the solutions proposed here

However, weirdly the first solution there using accsupp package only works with adobe reader while the second solution works only with sumatrapdf or other mupdf based readers. But both these methods highlight a far greater rectangular area than that corresponding to the searched icon.

My question is Is there a way to combine both these methods into a single solution? I am happy to accept a luatex solution if needed. I know there is no perfect solution, but is there any alternative to this approach that is viewer agnostic?

Here is a MWE

\documentclass{article}
\usepackage{fontmfizz}
\usepackage{tikz}

\newcommand{\archlinux}{\ooalign{\hidewidth\tikz\node[inner sep=0pt,opacity=0]{Linux};\cr \mfArchlinux \cr}}

\begin{document}
I know \Huge \archlinux.
\end{document}

The solution needs to be case-insensitive e.g. here, when searching for either linux or Linux, we need to always find the match/highlight the icon in the PDF.

PS:

Another useful reference question for this situation.

Update:

The tikz based solution in the original link works in mupdf based readers, but weirdly in adobereader only the single letter l or L is matched. Upon entering the next character li, the search stops matching. Totally weird.

With your code I can find linux using Okular. Perhaps this is viewer dependent... The case sensitiveness is. Okular has a toggle to switch that on and off. — Phelype Oleinik, Feb 13 '19 at 21:38
@PhelypeOleinik I know there is no perfect solution, but is there an alternative to this approach that is viewer agnostic? — Dr Krishnakumar Gopalakrishnan, Feb 13 '19 at 21:39
@PhelypeOleinik It works on sumatrapdf (a mupdf reader) for me, but does not work in adobereader which is the reference reader for the pdf standard (and the most widely used one, particularly by people outside academia). It is most likely that a recruiter is on Windows/AdobeReader combo just searching a PDF for some keywords. — Dr Krishnakumar Gopalakrishnan, Feb 13 '19 at 21:41
Sorry, I really don't know. But usually things seems to work better/be implemented first for Adobe Reader. Have your tried the accsupp package, using ActualText or E? — Phelype Oleinik, Feb 13 '19 at 21:43
@PhelypeOleinik I am unable to understand that ActualText answer. I shall much appreciate it if you can try it out and post an answer if it works for you? — Dr Krishnakumar Gopalakrishnan, Feb 13 '19 at 21:44
ActualText (with accsupp or tagpdf) is the method recommended in the pdf reference, but as you discovered not every viewer handles this correctly. — Ulrike Fischer, Feb 13 '19 at 21:54
@UlrikeFischer Can we try to combine these two solutions somehow? The actualtext solution works flawlessly with adobereader while the tikz solution works only with mupdf-based readers. I am convinced that combining the two is the key! — Dr Krishnakumar Gopalakrishnan, Feb 13 '19 at 21:56

Ulrike Fischer · Accepted Answer · 2019-02-13T22:12:52.730

4

You can combine both answers like this:

\documentclass{article}
\usepackage{fontmfizz}
\usepackage{tikz}
\usepackage{accsupp}
\newcommand{\archlinux}{%
 \BeginAccSupp{
    method=plain,
    unicode=false,
    ActualText=Linux,
  }%
  \ooalign{\hidewidth\tikz\node[inner sep=0pt,opacity=0]{Linux};\cr \mfArchlinux \cr}%
  \EndAccSupp{}%
  }
\begin{document}

I know \Huge \archlinux.

\end{document}

An alternative is

\newcommand{\archlinux}{%
  \ooalign{\hidewidth\tikz\node[inner sep=0pt,opacity=0]{Linux};\cr 
   \BeginAccSupp{
    method=plain,
    unicode=false,
    ActualText=Linux,
  }%
\mfArchlinux\EndAccSupp{}\cr}%
  %
  }

But copy and paste is a bit curious:

ILknionw ux Linux. 1
1

Imho it is better to make the word so small that adobe don't get confused about the reading order. This here works with sumatra and adobe:

\documentclass{article}
\usepackage{fontmfizz}
\usepackage{tikz}
\newcommand{\archlinux}{%
 \tikz[overlay]\node[opacity=0,font=\tiny]{Linux};\mfArchlinux}%
\begin{document}

I know \Huge \archlinux.


\end{document}

And it copy&paste better: I know Linux �.

edited Feb 13 '19 at 22:12

answered Feb 13 '19 at 22:00

Ulrike Fischer

327,261

that gives an error \ACCSUPP@bdc ->\pdfliteral \ACCSUPP@pdfliteral {\ACCSUPP@span BDC} l.16 I know \Huge \archlinux . The control sequence at the end of the top line of your error message was never \def'ed. If you have misspelled it (e.g.,\hobx'), type I' and the correct spelling (e.g.,I\hbox'). Otherwise just continue, and I'll forget about whatever was undefined. ! Undefined control sequence. \ACCSUPP@emc ->\pdfliteral \ACCSUPP@pdfliteral {EMC} l.16 I know \Huge \archlinux` – Dr Krishnakumar Gopalakrishnan Feb 13 '19 at 22:02
Then your accsupp is probably not up-to-date. – Ulrike Fischer Feb 13 '19 at 22:05
@UrikeFischer got it. Overleaf is the culprit. They are still on TL2016. I shall try it out tomorrow and report back. – Dr Krishnakumar Gopalakrishnan Feb 13 '19 at 22:07
I added a third possibility without accsupp. – Ulrike Fischer Feb 13 '19 at 22:13
The third possibility is 99% there. But, may I push a little bit further if that's okay with you? when doing pdftotext.exe -layout main.pdf, the resulting text file is a bit mangled `I know . Linux
```
         1`  with some crazy series of linebreaks/incorrect ordering and weird characters (none of which are rendered in this comment unfortunately), but please try for yourself. Why is this happening? What can be done to improve the situation?
```
– Dr Krishnakumar Gopalakrishnan Feb 14 '19 at 14:44
the reading order in a pdf is not well defined if the pdf has not been tagged, the viewers guess the order of chars when copying -- and if the pdf has been tagged you need a viewer which understands tagging to benefit from it. If you want a really accessible format, don't use pdf. Sent a html or a docx or a odt. – Ulrike Fischer Feb 14 '19 at 15:03
alright. No problems. So this is the best we can do, right? You have an experimental tagging package, right? Any chance we could apply that? – Dr Krishnakumar Gopalakrishnan Feb 14 '19 at 15:07
You can try. tagpdf is on ctan. But at first tagging is not local code: if you tag you must tag the whole document, and at second you need a pdf viewer which understands and uses the tagging code. – Ulrike Fischer Feb 14 '19 at 15:11
alright. I give up. I don't want to do global tagging. Just want to tag the local icon font's overlay. Would you consider a feature request in the package repo for local tagging (in the future, of course)? – Dr Krishnakumar Gopalakrishnan Feb 14 '19 at 15:12
that's not possible, that a restriction of the pdf format. – Ulrike Fischer Feb 14 '19 at 15:14

Make icon fonts in PDF searchable by name

PS:

Update:

1 Answers1