16

As I'm new to LaTeX and learning it, I often came across

\usepackage[utf8]{inputenc}

and would like to know what this command does and why we should use this?

moewe
  • 175,683
  • 7
    If your LaTeX is reasonably recent (newer than April 2018, I believe), it will already behave as if \usepackage[utf8]{inputenc} were loaded, so you don't need to add this line any more. – moewe Apr 18 '22 at 13:50
  • 2
    inputenc can be used to tell LaTeX about the input encoding of your document (the input encoding controls how characters are saved by your computer). In the olden days LaTeX only really supported ASCII input, then support for 8bit encodings was added via inputenc. Nowadays UTF-8 is the de facto standard for most documents, so many people used \usepackage[utf8]{inputenc}. Because UTF-8 is a de facto standard, the LaTeX team decided to make it the LaTeX default encoding. You can read more about encodings at https://tex.stackexchange.com/q/6448/35864 – moewe Apr 18 '22 at 13:52
  • 2
    See also https://tex.stackexchange.com/q/370278/35864 – moewe Apr 18 '22 at 13:53
  • @moewe: Surely those comments would make a good answer? – Peter LeFanu Lumsdaine Apr 18 '22 at 13:56
  • I think we should just duplicate-close against the linked question though. Reading through the answers there, they explain quite clearly the difference already. – user202729 Apr 18 '22 at 14:53

2 Answers2

18

The inputenc package allows you to tell LaTeX about the input (file) encoding of your TeX document. The input encoding controls how the various characters you type are stored in your document (basically which sequence of zeroes and ones maps to which character). There are various input encodings (in the olden days, memory was precious and so you would only have characters in your encoding table that you actually used), so we needed a way to tell LaTeX how to interpret the file it reads. This is done by loading inputenc and passing the appropriate file encoding as an option.

Over time UTF-8 emerged as the de facto standard encoding for text files. Most editors use it as default when you write and save a new file. It is well supported across different operating systems and programs. That's why you will often see \usepackage[utf8]{inputenc} in TeX documents. People use the de facto standard and tell TeX about it.

In fact UTF-8 is so popular that the LaTeX team decided to make \usepackage[utf8]{inputenc} the default in LaTeX. Since the April 2018 release of LaTeX, LaTeX has automatically pre-loaded \usepackage[utf8]{inputenc}. That means that you no longer need to specify \usepackage[utf8]{inputenc} in order to use UTF-8 input. Some people still do, maybe out of force of habit, because they like to be explicit about the input encoding, or because there is a chance their documents need to run on outdated systems.

Some more details on input encodings and UTF-8 can be found at What is the difference between font encoding and input encoding? and Is there any reason to use inputenc?.

Do note that while LaTeX now sets UTF-8 as default input encoding, full UTF-8/Unicode support is not something you can achieve with pdf(La)TeX due to technical limitations (I'm thinking combining accents). If you need full Unicode support, you may want to look into Lua(La)TeX or Xe(La)TeX.

moewe
  • 175,683
  • Last part is a bit confusing. I think you want to say "PDFLaTeX cannot pick Unicode character from Unicode (opentype/truetype) font" instead – user202729 Apr 18 '22 at 14:51
  • 3
    @user202729 no he means that pdflatex can't handle combining accents in the input (as they are after the base char), that has nothing to do with fonts. – Ulrike Fischer Apr 18 '22 at 15:13
  • @UlrikeFischer Ah okay (although for one whole-file regex-replace is possible. Not sure tweaking font file to define ligature is possible) – user202729 Apr 18 '22 at 15:32
3

A text file, like any computer file, is really made up of a series of numbers (usually collected as a series of numbers from zero through 255 called a byte). You need a set of rules (called an encoding) that can associate a character with a a set of these numbers. The UTF-8 encoding is such an encoding that has a set of rules that allows the characters associated with many languages (including mathematics, hieroglyphics, etc.) to be represented in a text file. When (La)TeX reads the file it has to ``know'' how to interpret the input bytes so needs to be told what encoding is being used. Recent versions of LaTeX default to using UTF-8 encoding (actually a sub-set of characters that can be represented by the font being used).

Herb Schulz
  • 3,410