19

I manage style files for an academic journal. A typical article will contain authors' email addresses, which we attempt to obfuscate in order to give some measure of protection against e-mail harvesters.

Our current strategy: We replace the @ and . in e-mail addresses with bitmapped images of these symbols. (Specifically, we define new commands \imageat and \imagedot which print .pdf images of their respective characters; then an email address like me@place.com is typeset as me{\imageat}place{\imagedot}com.) This has some problems:

  1. The images don't reflect the font or size of the surrounding text.
  2. With this solution, our LaTeX distribution must include the .pdf of these images, which can lead to errors and confusion.

What I would like: I would like (you to tell me how) to define two commands \crazyat and \crazydot which have the effect of typesetting @ and . in the current typeface, but appear as non-standard characters in the generated .pdf file. Specifically, I would like to temporarily populate a little used part of the current font with the @ and . so that they appear correctly, but make no sense to anyone else. (Other suggestions very welcome.)

A few notes about other postings on this (and closely related topics):

  1. I am aware of the AccSup package. It seems very appealing, but only Adobe Acrobat seems to play along. Specifically, the LaTeX line My email address is \BeginAccSupp{ActualText={email address}}me@place.com\EndAccSupp{} produces output that copies and pastes (in)correctly with Adobe Acrobat (giving the intended behavior) but misbehaves (so that copy/paste gives the e-mail address) on other .pdf readers. Anyway, I guess this will not fool an e-mail harvester. (See What can cause generated PDF document whose text are not correctly copyable?.)
  2. I do not want to, e.g., simply replace the @ symbol with the text [AT]. I am dead set on this symbol actually appearing correctly in the .pdf document. (See How to redefine @ and . to obfuscate email addresses?.)
  3. There seems to be a way to blow away the "cmap," which I do not understand. However, I would only like to be "locally" destructive--I would like the rest of the document to be well-formed. (See Is it possible to produce a PDF with un-copyable text?.)
acr
  • 1,024
  • 2
    What do you think about drawing @ with few commands using Tikz, for example. Then you can use it any time. Also you can put it on a box such that it would be possible to resize it together the text. – Sigur Jan 24 '13 at 01:19
  • 6
    My favorite obfuscation: john.doemy@pantsfoo.com – to e-mail me, remove my pants. From a related question on [su], but with a focus on HTML/web sites: Does e-mail address obfuscation actually work? – doncherry Jan 24 '13 at 02:12
  • @Sigur Hm, well, there’s the PGF/TikZ library shapes.letters at launchpad:tex-sx that transforms letters into shapes. I have never tested the library but my guess would be that the actual text representation is lost in the final output. – Qrrbrbirlbel Jan 24 '13 at 02:20
  • I don't exactly see the problem with having the @ and the . as vector (not bitmap, please) PDFs alongside your LaTeX source. You have to do the same with your images anyway, don't you? In any case, you can adapt the size of the symbols automatically by using something along the lines of \includegraphics[width=1em]{atsign} as your \crazyat macro. – Christian Jan 24 '13 at 02:49
  • 1
  • @Christian That solution would be (much) better than what we have. Incidentally, if I snip a @ out of a LaTeX-generated .pdf, how would I get the position just right using includegraphics when I include it in another document? Anyway, my principal complaint is that this method won't reflect the current font (not a big deal in my setting, since we always use Times). – acr Jan 24 '13 at 03:00
  • @Werner Wow--I didn't know about randtext. How does it work? If I could do the same thing, but simply replace the @ with an A, I would be delighted. (Anyway, this does more or less solve my problem.) – acr Jan 24 '13 at 03:07
  • @Werner: randtext's usefulness for obfuscation is limited, as some PDF viewers routinely unobfuscate without you even being able to tell it was obfuscated in the first place. Therefore you can assume its transparent for spammers too. See eg randtext not working – cyberSingularity Jan 24 '13 at 07:58
  • Ok, if having to have two PDF files alongside your source is ok with you after all, I made an answer out of this, explaining how to get the position and size right in this case. There might be times when you also need \raisebox but here it works fine without. – Christian Jan 24 '13 at 09:55
  • What do you think to change the font for e-mail addresses, for example, using \texttt so you can draw the @ symbol and convert it to curve, using Tikz, for example, or inkscape and then you just include controlling the height according to the current font size. – Sigur Jan 24 '13 at 10:48
  • @Werner I experimented a bit with randtext, and (as also noted above) this seems to suffer from the same problems as accsup. Not all PDF viewers appear to be confounded by the randomization. – acr Jan 24 '13 at 12:23
  • @Sigur In principle, it seems like a good solution. In our setting, however, we typeset email addresses in a serif font (Times), a choice which I can't change until the next volume, roughly a year from now. Mocking up a Times @ with tikz seems really challenging! – acr Jan 24 '13 at 12:26
  • @Christian Thanks very much--I did not know about standalone. – acr Jan 24 '13 at 12:27
  • You're welcome. You should probably switch off microtype locally when pulling this trick since the images won't be stretched which could look ugly. – Christian Jan 24 '13 at 12:30
  • Wouldn't it be easiest to map some unused part of ASCII (say an old control character that isn't used anymore) to @ and . in the font files, then let them copy and paste the wrong thing? – Canageek Jan 26 '13 at 03:33
  • @Canageek That would be awesome; how do I do it? – acr Jan 27 '13 at 21:33
  • I have no idea; Ask someone who understands LaTeX map files and such. I'm sure it can be done, I just don't know HOW or I would have made it into an answer. – Canageek Jan 28 '13 at 16:00

2 Answers2

8

To provide PDF files containing the dot and the at in the right font, put this in at.tex

\documentclass{standalone}
\usepackage{mathptmx}
\usepackage[T1]{fontenc}
\begin{document}
@
\end{document}

and likewise this in dot.tex

\documentclass{standalone}
\usepackage{mathptmx}
\usepackage[T1]{fontenc}
\begin{document}
.
\end{document}

Edit: This solution was actually wrong as I first wrote it. I assumed you can just use the generated PDFs as a neutral vector graphic. You can't; the mail addresses are still easily copy-and-pasted. You can, however, use some software like inkscape to convert the text to a "real" vector graphic and save it as a PDF again. You can then proceed as before. [End of Edit]

\documentclass[a5paper]{article}
\usepackage{mathptmx}
\usepackage[T1]{fontenc}
\usepackage{graphicx}
\newcommand{\crazyat}{\includegraphics[width=.9em]{at}}
\newcommand{\crazydot}{\includegraphics[width=.25em]{dot}}
\begin{document}
\noindent foobar@example.com\\
foobar\crazyat{}example\crazydot{}com\\
\Large foobar@example.com\\
foobar\crazyat{}example\crazydot{}com
\end{document}

Which looks good to my eye:

comparison between obfuscated and normal mail address

David Carlisle
  • 757,742
Christian
  • 19,238
  • It looks like the link to inkscape is missing the i. – acr Jan 24 '13 at 12:33
  • Thanks for working this up. Is there a principled way to ensure that I place, for example, the @ character at exactly the right height? – acr Jan 24 '13 at 12:43
  • @acr I cannot think of an automatic way to determine the width in em. As you can see, using the font size-dependent unit em ensures that the whole thing scales well once you figured out the right number (works less well with fonts like Computer Modern that look different at different sizes). You could use gimp to produce arbitrary-sized renderings and then overlay the two lines with 50% transparency. – Christian Jan 24 '13 at 13:17
  • 3
    \includegraphics[height=\fontcharwidth\font`A]{at} should make the height equal to that of an A. Similarly for the period: height=\fontcharheight\font`. – egreg Jan 24 '13 at 13:17
  • @egreg Excellent! (It should be \fontcharheight in the first example, right?) Also, wouldn't we want the height to match that of an @? Finally, I guess that both the @ and the . rest precisely on the baseline, so there is no need for a \raisebox? Is there a reason that we couldn't use this same idea to set the width to the correct value (based on the current font)? (I understand that one would probably want to set only one of these to keep the aspect ratio correct.) – acr Jan 24 '13 at 14:18
  • @acr Of course I was thinking to \fontcharheight, but width slipped in. :( – egreg Jan 24 '13 at 14:21
  • Hmm. I'm having a hard time losslessly converting this .pdf file of a single character (say, "@") to a vectorized .pdf file. I've tried Inkscape, but it complains about a missing font and doesn't even show that "@" correctly. Is there an idiot-proof solution? – acr Feb 05 '13 at 19:05
  • @acr Well, this works for me http://stackoverflow.com/a/10290006/1050373 but if it doesn't for you, try http://stackoverflow.com/a/4502030/1050373. You can then convert the svg back to pdf using inkscape. It's complicated but for two characters it should be doable and for more you can automate it using a makefile or a shell script since both tools provide a command line interface. If nothing works, I'm out of ideas and can only suggest posting a new question on stackoverflow or graphicdesign.SE, sorry :/ – Christian Feb 06 '13 at 09:04
  • 1
    The right names are \fontcharwd and \fontcharht, sorry for the typos. – egreg Aug 23 '22 at 16:18
5

The letters can be exported in the font editor FontForge as SVG graphics. Then, the paths descriptions in the SVG files (attribute d in element path) can be used to fill the path in TikZ.

\documentclass{article}
\usepackage{tikz}
\usetikzlibrary{svg.path}

\newcommand*{\svgat}{%
  \leavevmode
  \tikz[baseline=0pt, x=1pt, y=1pt, scale=1em/1000]\fill
    svg {
      M588 457v-241c0 -15 0 -66 35 -66c66 0 73 90 73 182c0 241 -171 351 -308
      351c-164 0 -307 -145 -307 -336c0 -179 130 -336 312 -336c94 0 187 24
      272 64c5 3 7 3 23 3h9c16 0 23 0 23 -10c0 -16 -120 -51 -146 -57c-64 -15
      -128 -22 -180 -22 c-200 0 -338 170 -338 358c0 199 150 358 333 358c161
      0 332 -133 332 -367c0 -107 -14 -210 -102 -210c-38 0 -90 19 -100 71c-30
      -42 -78 -71 -132 -71c-103 0 -198 93 -198 219s95 219 198 219c39 0 91
      -14 137 -77c6 -7 7 -8 23 -8h17c23 0 24 -1 24 -24zM519 262v170 c0 18 0
      21 -13 40c-36 56 -84 72 -116 72c-73 0 -132 -86 -132 -197s60 -197 132
      -197c20 0 71 6 115 69c14 21 14 25 14 43z
    }
    (current bounding box.west) ++(-56, 0) % left side bearing
    (current bounding box.east) ++(56, 0) % right side bearing
  ;%
}

\newcommand*{\svgperiod}{%
  \leavevmode
  \tikz[baseline=0pt, x=1pt, y=1pt, scale=1em/1000]\fill
    svg {
      M192 53c0 -29 -24 -53 -53 -53s-53 24 -53 53s24 53 53 53s53 -24 53 -53z
    }
    (current bounding box.west) ++(-86, 0) % left side bearing
    (current bounding box.east) ++(85, 0) % right side bearing
  ;%
}

\begin{document}
  john.doe@example.org

  john\svgperiod doe\svgat example\svgperiod org

  {\Large john\svgperiod doe\svgat example\svgperiod org}

  {\scriptsize john\svgperiod doe\svgat example\svgperiod org}
\end{document}

Result

Copy does not see the letters drawn by TikZ. The selected email addresses in evince:

Selected

Remarks:

  • The SVG paths are taken from font file "cmr10.pfb".

  • If TeX switches fonts in different font sizes, here "cmr12.pfb" for \Large and cmr5.pfb for \scriptsize, then different macros could be defined to get the perfect glyphs for the size. But for this purpose, a scaled normal size version should do. The scaling is done automatically, because option scale depends on the current size of unit em.

  • Library svg.paths uses 1pt as unit, therefore the TikZ glyphs are scaled to the correct size by scale. The font uses 1000 glyph units for 1em. The TFM file for cmr10 keeps the width of 1em: QUAD R 1.000003 (small rounding glitch). Therefore, the scale factor is 1em/1000.

  • The values for the side bearings are taken from the glyph views in FontForge.

  • Of course, the letter obfuscating can be circumvented by using OCR.

Heiko Oberdiek
  • 271,626