8

I am looking for a table for converting German diacritics into their non-diacritic character combination equivalent. For instance that table would indicate that the umlaut ü may be converted to ue).

Does such a table exist, and if so, could someone share a link to it?

RegDwight
  • 2,335
  • 1
  • 21
  • 33
Max
  • 191
  • 1
  • 1
  • 5
  • 6
    Note: "it may be converted", but it's definitely wrong (and ugly) to do it in German. – splattne Aug 07 '12 at 10:02
  • 1
    @splattne: it's for use in URLs, see http://webmasters.stackexchange.com/questions/33032/how-to-handle-urls-with-diacritic-characters#comment32233_33032. In that question I've given an example showing the Bundesliga using that approach, so I'm thinking it might not be that bad if such big companies are doing it. – Max Aug 07 '12 at 11:46
  • 1
    you're right, in URLs and Internet domain name it's absolutely okay. – splattne Aug 07 '12 at 17:49
  • 2
    See also http://german.stackexchange.com/questions/4994/is-the-eszett-ss-used-in-last-names German ID-cards use converted names in the machine readable zone. – knut Aug 07 '12 at 18:10

4 Answers4

17
  • ä → ae
  • ö → oe
  • ü → ue
  • Ä → Ae
  • Ö → Oe
  • Ü → Ue
  • ß → ss (or SZ)

The SZ is only for words in capitals (and I think for old spelling).

Für LaTeX-Nutzer gibt es auch die Umschreibung "a, "o, "u, bzw. "A, "O, "U (bzw. \"a, \"o, \"u, bzw. \"A, \"O, \"U)


You are also asking for a kind of official link.

Maybe Din 5007 helps a bit. That's a norm for sorting. An "ä" is either treated like an "a" (variant 1 of DIN 5007) or like an "ae" (variant 2).

RegDwight
  • 2,335
  • 1
  • 21
  • 33
knut
  • 9,282
  • 2
  • 27
  • 50
  • 2
    We should add that if the whole word is in capitals we may convert like AE, UE, ... – Takkat Aug 06 '12 at 21:37
  • 2
    "SZ" for "ß" is old and obsolte. Always replace "ß" by "SS" if the word must be written in capitals. In swizzerland you don't use "ß". The swizz name for the letter "ß" is "Doppel s". In Germany "ß" is called "esszet" and in Austria "scharfes s". – Hubert Schölnast Aug 09 '12 at 06:24
  • 3
    There is in meantime also a capital-ß. It's unicode U+1E9E – knut Aug 10 '12 at 19:04
  • 1
    The only occurrence of SZ for ß I ever heard of was on teletypes for military communications. – starblue Aug 10 '12 at 19:54
  • 4
    @HubertSchölnast: In Northern Germany it is called "eszett". In Southern Germany, as in Austria, it's "scharfes s". – celtschk Aug 13 '12 at 13:52
  • 2
    @starblue: AFAIK before the spelling reform, you were supposed to use SZ in cases where there's another word which is written with ss. For example, you'd capitalize "in Maßen" as "IN MASZEN" because there's also "in Massen" which gets capitalized as "IN MASSEN". – celtschk Aug 13 '12 at 13:56
  • As an aside the conversion from ß -> ss is (unfortunately) lossy. My understanding is that ß will make a preceding a long, but ss will not. – Att Righ Aug 05 '17 at 13:59
  • @celtschk In parts of Southern Germany we also use to call it Buckel-s. – Christian Geiselmann Feb 05 '19 at 14:30
3

If you are looking for replace the German Umlaute with cleverly respecting the case, use this (opensource, happy to share, all by me) in JavaScript:

let umlautMap = {
  '\u00dc': 'UE',
  '\u00c4': 'AE',
  '\u00d6': 'OE',
  '\u00fc': 'ue',
  '\u00e4': 'ae',
  '\u00f6': 'oe',
  '\u00df': 'ss',
}

function replaceUmlaute(str) {
  return str
    .replace(/[\u00dc|\u00c4|\u00d6][a-z]/g, (a) => {
      var big = umlautMap[a.slice(0, 1)];
      return big.charAt(0) + big.charAt(1).toLowerCase() + a.slice(1);
    })
    .replace(new RegExp('['+Object.keys(umlautMap).join('|')+']',"g"),
      (a) => umlautMap[a]
    );
}

It will:

  • Übung -> Uebung
  • ÜBUNG -> UEBUNG
  • üben -> ueben
  • einüben -> einueben
  • EINÜBEN -> EINUEBEN
  • and the same for Ä, Ö
  • and simple ß -> ss
  • 2
    It would help if you added which language that's in – PiedPiper Jan 24 '19 at 13:30
  • Thank you for a nice code! Couple adjustments I did - the pipe (|) inside [] character group is unnecessary, and it would match |u in Üb|ung erroneously. And also, there's now a capital ẞ. – Gman Jan 23 '23 at 08:53
3

For another official link I recommend the ICU project (International Components for Unicode). It basically is a database for all(?) languages/scripts and how to convert, sort and compare words to be used by computer programs.

They have a ICU Transform Demonstration which demonstrates the transform rules. For German, you can start with "Latin" as "Source 1" and "ASCII" as "Target 1". The example "Names (Variant)" is very impressive as well as it can transliterate Korean(?), Cyrillic, Greek to Latin.

Shi
  • 139
  • 4
  • 1
    I think the Latin->ASCII transform you describe with convert "ä", "ö", and "ü" to "a", "o", and "u", respectively, not "ae", "oe", and "ue". – Rob Dec 20 '14 at 19:55
  • 2
    This was fixed in the latest release of ICU 60, you can now use the de-ASCII transform to convert german text into ASCII. – ausi Nov 07 '17 at 13:21
1

In addition to Knuts answer:

In July 2017 the "Rat für deutsche Rechtschreibung" declared, that it is allowed to use the Capital ẞ as well as SS when capitalizing ß.

Therefore a capitalized ß results in SS or .

Please pay attention, that a lot of fonts do not contain this letter.

mtwde
  • 14,201
  • 2
  • 34
  • 59