Should users be allowed to use any special character they want when creating a password?

Question

I came across a number of login configuration settings where there is a list of allowable special characters and was wondering:

Does this limitation cater for a specific security or usability need?

Example: A list of special characters supported by Oracle Identity Manager and Microsoft Active Directory for password field :

enter image description here

Update:

Thanks everyone for the generous response!

Every time I have asked a question that involves security and usability there seems to be a clear divide between proponents on each side. However this need not be as this is one area that requires a lot of compromises and trade-offs… UX depends on it!

There is a good thread about this topic over here:
http://security.stackexchange.com/questions/17192/why-disallow-special-characters-in-a-password — Tyrus, Jan 26 '15 at 17:53
Consider masked password entry. Since it is password it will go through rigorous testing. By limiting it to certain characters the number of test cases is more manageable. Clearly you don't want to allow beep or tab down. There are a number of control type charters that don't belong in passwords. — paparazzo, Jan 26 '15 at 18:08
The limitation generally caters for implementer-laziness, because they're worried that allowing arbitrary characters might break something elsewhere (which usually means they have bigger problems). Limiting passwords to specific characters drives me crazy, and there's no (good) reason for it ever (unless you're worried that not doing it will lead to injection attacks, in which case, fix the code you're worried about injection attacks in, and then fix your password validation). — neminem, Jan 26 '15 at 19:17
UX people, developers are not the enemy (nor lazy). For something like a password would wire up a test cases for all possible combinations. There are only so many CPU cycles in a day. @neminem What would be the purpose of allowing a beep or tab down? What would be the purpose of a character not on the keyboard? — paparazzo, Jan 26 '15 at 19:26
@Blam The purpose would be not limiting your password, without having to hardcode anything. Who's going to put a non-keyboard symbol in, and if they tried, so what? By contrast, if I were, say, Chinese, and had a Chinese keyboard (or for that matter, French with a French one), I'd be sort of annoyed if my password had to be in English (because Chinese symbols are "not on the [standard American] keyboard", either.) I would argue you really don't have to test every possible password, just test "it works in Unicode", done. (And make sure you can't get sql injection attacked.) — neminem, Jan 26 '15 at 19:50
p.s. I'm not a UX person, I'm a developer. :p But more importantly, I'm a user. It drives me crazy when sites do this - just that, as a developer, I know why they're usually doing it. :p — neminem, Jan 26 '15 at 19:51
@neminem Just test in unicode? Unicode has 60 some thousand characters. Really you know why Microsoft Active Directory is limited. You know for a fact it is because they were lazy. And if you were a developer you would know you don't protect from SQL infection attack by limiting characters - you do it with parameterized queries. And a hash is binary - it is not even character aware. — paparazzo, Jan 26 '15 at 20:02
@Blam Right, obviously you don't have to test every possible unicode character, you test a random sampling of them, same as you don't have to test every string length, you test 0, 1, a few, and a random large number. And yes, smart people don't protect from injection attacks by limiting characters... but that doesn't mean that people don't still try protecting from injection attacks by limiting characters (usually, as Nathan Rabe points out, the same people also most likely to end up on plaintextoffenders.) — neminem, Jan 26 '15 at 21:13
@neminem First it was test unicode and now it is test a sample. Really you would allow control characters like beep and down tab. You would allow smart quotes when most people don't know what they are. You would allow visual characters that have more than one unicode. You would allow ½ when some normalization would break that out. Maybe YOU don't fully test what goes in a password but I do. — paparazzo, Jan 26 '15 at 21:34
@neminem: Unicode has a lot of tricky rules and edge cases. In the old days, code could be character-set agnostic for any sequence of bytes that didn't contain nulls, but that's no longer the case. For usages other than passwords, the fact that a database sometimes represents "mañana" as six code points and sometimes represents it as seven may be an annoyance, but it can be resolved by having a search for "mañana" look for both forms. Even if a string has many characters that could appear in multiple forms, it may be possible to search for all forms simultaneously. — supercat, Jan 27 '15 at 00:40
@neminem: Unfortunately, such an approach doesn't really work for passwords. If a password contains eight characters, each of which might appear in two different forms, the only way to find out if it is equivalent to a particular hashed password is to try hashing all 256 combinations of characters to see if any of them work. Nasty. — supercat, Jan 27 '15 at 00:41
"The grave accent cannot be reproduced in this document" --- seriously? What half-baked markup language are they writing it in? — Federico Poloni, Jan 27 '15 at 17:21
If there is a password character restriction, there is a very good chance that they are storing your password unhashed. — AKS, Jan 27 '15 at 18:03
One VERY important thing to keep in mind when allowing arbitrary characters is NORMALIZATION. If two strings contain the same characters, but are normalized differently, their hashes will be different. — Cole Tobin, Jan 27 '15 at 20:48
@FedericoPoloni: Actually, that makes sense. TRWTF is the fact that they chose a grave accent in the first place, not to mention the following statement that it's "also known as the backquote character". What nonsense. A backtick and an accent on a character are two very different things. They should have written "backtick" in the first place and stuck with it... and then they'd be able to reproduce it. :) — Lightness Races in Orbit, Jan 27 '15 at 23:21
@Blam Unicode is a 21-bit character set. It has far more than "60 some thousand characters", although most of the 2^21 code points are not allocated, and may never will be. — David Conrad, Jan 28 '15 at 18:54
@DavidConrad Yes, so. It is bigger than 128. A lot of the lookups are like in the 60000 — paparazzo, Jan 28 '15 at 19:07
From my school: "The dis-allowed characters are mostly to prevent cleverly-constructed passwords from causing harm to Unix-based systems. [...] There are, of course, other ways to guard against those sorts of problems, but keeping certain characters out of passwords in the first place is the simplest, and can be controlled centrally. That's in stark contrast to the way Unix/Linux systems that use netid authentication are managed - by numerous schools, departments or individuals. It would be prohibitively difficult to ensure that each application & server had all the right safeguards in place." — Nick T, Jan 28 '15 at 19:08
The more characters available to a user, the harder it is to guess their password. If you were trying to brute force a password and knew that the user couldn't use the @ symbol, you could automatically rule out every string that contains an @ symbol, whereas with the @ symbol accepted there's a lot more combinations the brute forcer would have to go through. — Pharap, Jan 29 '15 at 16:23
If we are following the principle that four dictionary words is enough entropy to create a strong password, then all you need is lowercase letters (maybe not even space). Tell everyone: "type four uncommon words without spaces, caps, digits, special characters, etc and it will be good enough." That solves all the problems. — , Mar 14 '17 at 20:06

score 121 · Accepted Answer · edited Nov 05 '18 at 06:37

121

If the user can type it then it should be allowed in their password.

Telling someone what they can and can't use in their password always feels wrong to the user. Passwords are currently the most universal way to authenticate. Preventing users from entering anything is, in essence, telling them who they can or can't be.

1. Any printable character that a user inputs should be allowed.

The following characters are okay...

'A', 'a', 'á', 'Æ', 'æ', 'Ñ', 'ñ', '-', '_', ' ' (space), '\t' (tab), '\n' (newline), ...

Just because I don't know how to submit a TAB or ENTER character as part of my password doesn't give me the right to prevent others from doing so.
(Don't worry, few people will try to submit an ENTER character as part of their password but allowing the few that do will earn their respect.)

2. Keys that don't display a printable character should not be allowed.

The reasons for this should be obvious but for completeness I will mention the following keys which can be detected but are reserved for other actions. For example, a password input shouldn't record that the shift was hit multiple times...

[ctrl], [alt], [shift], [arrow keys], [apple key], [windows key], etc.

3. Not allowing certain characters makes users question your security.

When you prevent users from putting certain characters in their password it not only annoys people but causes many of them to question what else you are doing that isn't secure.

You may as well be saying...

"Hey we don't want to fix our application to properly deal with special characters so would you mind helping us out by making your password less secure?"

The rules below will allow for secure input while preventing a user from ever getting stuck:

Don't show the characters that the user is typing in password fields (there are some exceptions on mobile)
Having the user type in their password twice is usually sufficient in letting them know that they got it right (i.e. didn't accidentally add unintended white space etc.)
Having a password reset mechanism is important to handle any cases of accidental lockout.

4. Encouraging a user to add more isn't the same as prohibiting characters.

One way to help a user come up with a secure password is to make a game out of it...

password strength indicator

5. The future of authentication

"The Tech That Will Kill Passwords Dead" is a pretty good gizmodo article discussing the problems we all face with passwords. It also talks about some new patterns that could possibly replace passwords one day.

Many mobile applications are starting to allow users to show or hide passwords in plain text in order to increase ease of use and remove one barrier to entry. I would still avoid this because the problem it creates is worse than the problem it solves.

Even with a very intuitive mechanism for showing/hiding a password 60% of users still say it feels wrong to see passwords in clear text.

According to that same article it appears that Touch ID is on the right track and easily wins as far as usability is concerned. Touch ID still has some major problems that make it impractical. The biggest being that it only works on select devices and has issues with one person controlling multiple accounts.

Facial recognition is another contender as an increasing number of high pixel density cameras make their way into the world but this approach often leaves people worrying about privacy.

The one problem shared by all of these new authentication attempts is this: You are the password.

It's actually a lot easier to fake who you are than what you know. In addition, once you've been compromised it's nearly impossible to change (your fingerprints for example)

Passwords are the authentication mechanism of choice for a good long while so...

Please don't place arbitrary character limitations on my password. Thanks!

edited Nov 05 '18 at 06:37

Adriano

537
4
16

answered Jan 26 '15 at 17:44

DaveAlger

15,893
6
46
76

68

Sites that only allow certain characters always make me worry that they are storing/transmitting the text of my actual password instead of a hash. – Nathan Rabe Jan 26 '15 at 17:50
@NathanRabe I would think the opposite. If they are confining it I would think more controls in place. – paparazzo Jan 26 '15 at 18:10
62

@Blam If a site says I can't use a character like @, %, $, or ; in my password, to me that means some script will be reading my password and they know those characters will mess something up. Even if the only time they parse the characters is to generate a hash, it seems like bad coding if it was possible to inject code via a password and all they did to fix it was disallow certain characters. – Nathan Rabe Jan 26 '15 at 18:45
1

@NathanRabe It was just a comment and my opinion. I am they - I am a developer. – paparazzo Jan 26 '15 at 18:56
4

@NathanRabe Once there was a site that didn't allow '&' because the password was sent via GET, and it fucked up the URL string. It was surprising when the 'dev' told me he didn't know of encodeURIComponent... even though the password was stored hashed. – rev Jan 26 '15 at 20:56
11

Even tabs or other non printable characters should be allowed; most users can't type them and so won't use them, but if someone is copy pasting or it's an automated script which uses those then let them do so. – Jan 26 '15 at 21:09
18

Just because it's a key doesn't mean it's a character (ctrl, shift, arrows, etc.); but if it's a character it should generally be allowable (including tab). If I want §™╔┌çÅ╡Θ as my password, I should be permitted to do so. I would probably make exception for non-printing characters such as ␀, ␇, ␌ and such (and the backspace character, because we want people to be able to edit a mistyped password!), but "non-printing" does not include the likes of ␉! (horizontal tab) – Brian S Jan 26 '15 at 21:23
3

Updated: I agree that even tab shouldn't be prevented if you can figure out how to get that character in there then go for it! – DaveAlger Jan 26 '15 at 21:26
15

@AndréDaniel What should the we do with this password: 12345\b67890 where \b is the backspace character? What about this one: 12345\067890 where \0 is the null character, which is used to terminate strings? Control characters are intended to do weird things with text that are not appropriate in a password. If a user copy/pastes something that contains strange non-printable characters, they may not even be aware that they're present. Best to exclude non-printing characters. Then the user knows exactly what's in the password, and whoever wrote that bogus nonsense can fix their stuff. – jpmc26 Jan 26 '15 at 22:37
3

@DaveAlger It's fine to exclude characters that are used for behavior on a web page, such as tab which will move focus to another field. I'd exclude newline characters for this reason, too, since Enter should usually submit a form. Excluding characters that will actively detract from a normal user experience (by making it overly difficult to type the password) is fine. At a minimum, I'd say warn the user that it could cause problems. – jpmc26 Jan 26 '15 at 22:38
Really anything I can type? How about duplicates? These are not the same character - ☉⊙. How about diacritic (can be broken out)? Characters not supported in HTML? How about normalization in some environments that breaks out ½ into 1/2. A password from a character set of 20 is not less secure than a password from a character set of 2000 - just might need to use more characters. Numbered swiss bank accounts are numbered. A 32 character hash of a password is represented as 0-9 A-F. Actually the hash is binary - 0 or 1. Allow characters that are problematic is not good UX. – paparazzo Jan 26 '15 at 23:44
8

@jpmc26 it doesn't matter whether the characters can be typed or produce weird stuff with text. If the user submitted them successfully then it means he is able to type/enter them and the system should accept that password just fine. The system should accept any block of data as the password, the only constraint must be length so that attackers can't upload 1GB files and DoS the server by forcing it to hash that enormous file. – Jan 27 '15 at 01:13
4

@AndréDaniel Yeah, until they go to type the password without the control characters and can't log in because they didn't know the invisible characters were there. That sounds like great UX, doesn't it? You're not talking about arbitrary data in a file. You're talking about designing your system for typing the password in. Control characters are simply not helpful for that. Sure, if you wanna use key files that they upload or something, go for it, but as long as you're talking typing, control characters are not helpful. – jpmc26 Jan 27 '15 at 02:12
2

@DaveAlger agree with your response, except that having a function to unmask the password typed instead of asking user to confirm their password would be particularly helpful in this case as users make more errors when they can't see what they're typing ultimately this helps in avoiding login failures. – Okavango Jan 27 '15 at 06:58
1

@DaveAlger would you know if smartphone soft-keyboards have all the characters that exist on the desktop keyboards? If they are not the same, then it might be useful to exclude those characters that are not available in the soft-keyboard on the smartphones. – Ades Jan 27 '15 at 07:06
9

@jpmc26: At most, you might warn the user that their password could be hard to type. But I, for example, use a password manager. I don't type passwords! If I have to type a password, something has already gone wrong in UX. The sorts of power users that copy-paste generated passwords are often those that then use the same automated tools to enter those passwords; forbidding weird characters is a misguided attempt at coddling those who are deliberately taking security and usability into their own hands. – Nathan Tuggy Jan 27 '15 at 07:13
4

@jpmc26 great UX is not when the system makes me loose time just to accommodate some idiot who has weird characters in his password (and I really don't see how this can happen; if he's typing the pass each time then he has no way of entering those chars when setting the password, and if he has a "passwords.xls" file then he's copy/pasting and even if there are weird chars they'll still get copied just fine). The system should consider whatever was submitted when setting the pass as a key, and require that exact same key when logging in; even binary data should be accepted. – Jan 27 '15 at 13:00
5

+1 except for limiting password length to 64 characters. As the owner of the server, there is really only one concern which should limit password length: the time & memory it takes to process. Using modern computers and hashing algorithms means limiting the password size to 1000 characters or even more should not have any recognizable impact on your service. – l0b0 Jan 27 '15 at 14:48
I'm open to update the max password length with some data backing it. Limiting the length is certainly not to help the user it is done to protect yourself. A lot of attacking bits can fit in 1000 characters. Microsoft says that 16 characters has been the limit for years though I think this is stupidly short. -- http://thenextweb.com/microsoft/2012/09/21/this-ridiculous-microsoft-longer-accepts-long-passwords-shortens/ – DaveAlger Jan 27 '15 at 14:59
1

Thanks for the help fleshing out this answer everyone. I'm always looking for a more user friendly way of identifying you as you. Passwords are what we have right now. Limiting passwords without good reason is like telling someone who they can or can't be. – DaveAlger Jan 27 '15 at 15:00
1

@AndréDaniel Never let your inability to envision a user doing something fool you. All that would be required is that they copy/paste the password from one place and then try to type it in when the original source is unavailable, thinking they know what they typed in. In today's multi-device world, that could easily happen. I like this quote: "So why would a user do that? Because the user could do that." -Why Would a Baseball Player Do That? Also, if you have binary, you can just convert it to hex for the password. – jpmc26 Jan 27 '15 at 22:48
1

What's a standard keyboard? - I don't have most of the characters you mentioned on mine (e.g. Ñ). – Danny Varod Jan 27 '15 at 23:17
@daveAlger Just read the article in your response, really interesting! Guess the devil is in the detail! so it all boils down to how password masking is implemented. – Okavango Jan 28 '15 at 12:07
1

@NathanTuggy I would never set a password that wasn't typeable on both a PC and a phone. There are always edge cases like websites that won't let you download pdf etickets on their mobile site (and the button is off-screen if you force the desktop version) so you have to log on to someone else's machine and print. If you use e.g. KeePass you can easily display the password on your phone and retype on the borrowed box. You've entered a password on an untrusted box, but not a master password of any kind. (I have seen all the fails I mention, luckily not all at once). – Chris H Jan 28 '15 at 20:46
2

"Chances are anyone doing this knows more than you" This is an assumption, and a bad one at that. I've seen plenty of ways for a user to accidentally/unintentionally include whitespace characters when setting up a password, and thus later when they try to enter the password they don't know why it doesn't work. I certainly agree with the spirit of your answer in terms of not wanting to get in the way of what the user wants to do, but there are certain limitations in passwords intended to avoid common problems. – AaronLS Jan 29 '15 at 01:58
"Passwords are currently the most universal way for people to say who they are." wat – bjb568 Jan 29 '15 at 05:00
"Passwords are currently the most universal way to authenticate" edit – DaveAlger Jan 29 '15 at 05:21
1

Still, Google didn't allow me to use the special chars of my german keyboard for my password (didn't check for 1 year though, I'm relying on 2FA). I'm not sure wether this is because I probably have to type it on a smartphone, but this annoys me. – Sebb Jan 29 '15 at 10:59
tab and return should NOT be allowed, else how do you mark the end of a password? – JamesRyan Jan 29 '15 at 13:04
4

I like this answer, but... One time I signed up for something with my_email+tag@gmail.com so I could filter information from them (school district). No problems, email worked as desired, filtered as desired, etc. Then, when I no longer needed emails from that district I COULD NOT UNSUBSCRIBE because their email parser for unsubscribe was more rigorous and didn't like + symbols. Moral of the story is be consistent with your filtering. :-) – kmort Jan 29 '15 at 16:46
When I type in passwords in my mobile phone, it is very difficult to get them right, as my fingers often hit the wrong key. Not seeing what I type makes it almost impossible to type in correct passwords. One further problem is that my phone automatically ads a space after a comma or full stop (thinking a sentence has ended, so I get a space that I cannot see and correct). Invisible passwords are a pain and idiotic. – Jun 09 '15 at 10:11

score 35 · Answer 2 · answered Jan 26 '15 at 19:40

35

If a site requires that passwords only contain certain character codes, then a user will be able to enter the password into almost any device which is capable of producing those characters. If the password contains character codes which may be entered on some devices but not on others, then a user who creates a password on a device which could enter the codes contained therein but then later needs to log in with a device that can't, would be effectively locked out of his account.

On almost any reasonable platform, the 94 printable ASCII characters will be clearly distinct. Even if a font annoyingly uses identical glyphs for I and l, or for 0 and O, people who enter such characters will generally have no doubt about which they entered. By contrast, on some platforms a user might think he's entering a character like ɸ when he's actually entering a φ; if such a user moves to a machine where characters are entered differently, he may be unable to access his account unless or until he can figure out what characters he might have used in entering his password.

Things get further complicated if one factors in things like combining diacritical marks. Some characters like ë have two legitimate representations--either a single "Latin Small Letter E With Diaeresis" [code 0x000EB] or a "Combining Diaeresis" [code 0x00308] followed by "Latin Small Letter E" [0x00065]. Some devices may not allow the user to control which form is entered. Ideally the password would be converted to a known normalized form prior to hashing, but it's far from certain that all code which tries to "normalize" Unicode strings will always work the same way (even if it's specified to, that doesn't mean it actually will).

answered Jan 26 '15 at 19:40

supercat

3,255
1
17
12

24

I understand this point, however, just because you allow all printable characters doesn't mean people will do it. Preventing the few who try isn't a very good user experience... "Sorry that character can't be typed on other devices and is not allowed in your password" - Dang it I don't have any other device! – DaveAlger Jan 26 '15 at 20:41
6

@DaveAlger: Although code could accept most Unicode characters without difficulty, some really shouldn't be, and there's no easy way to determine which ones those are. Even if code had a list of 70,000 characters which were known to be problem-free, a list of 94 permissible characters will probably be easier for users to work with than would be a list of 70,000. – supercat Jan 26 '15 at 21:05
Actually, the Unicode combining diacritic marks have to be entered after the characters they are supposed to decorate. – O. R. Mapper Jan 26 '15 at 21:43
11

If the user chooses to pick those characters, you can trust them to know how to enter them. By far the most common use case for such users will be copy-pasting from a password manager, anyway, where this won't be an issue. – sapi Jan 26 '15 at 22:04
10

@sapi Many users don't realize certain characters can't be typed on different devices, so it can depend on your demographic. For example, on desktop computers | will be entered, even if the keyboard shows ¦. However, on older Android devices, ¦ would be entered instead. So, sometimes it is necessary to protect your users from themselves, although an overridable warning is better than a restriction. – 0b10011 Jan 26 '15 at 22:16
3

@sapi: If one wishes to safely allow non-ASCII characters, one approach would be to say that if a password contains any non-ASCII characters or unbalanced ASCII-brace characters, all non-ASCII characters and any ASCII brace characters will be replaced by their character code, enclosed in braces before hashing. Such behavior would generally be transparent to the user, but would provide a means by which a user whose password contained e.g. the codepoint sequence 0030B+00065 could enter it, even if his system would normally replace it with 000EB. – supercat Jan 26 '15 at 22:29
1

@sapi Not necessarily. Depending on the OS you're using a what looks like the same character to them may result in different codepoints. One example is the Won (Korean currency) sign ₩ has the unicode codepoint U+20A9, but on a Korean windows, the codepoint U+005C, which usually stands for a backslash, renders as the won sign. The combining characters supercat mentioned are another potential issue, albeit one that's easier to fix via unicode normalization (I recommend normalizing to Form C). – CodesInChaos Jan 27 '15 at 17:07
@CodesInChaos: I find it somewhat surprising that international versions of Windows would change the meanings of code points, especially 32-126; a more interesting question would relate to the behavior of files which would appear like they could either be ASCII or UTF-8. If 0x5C is a Won symbol, is there any character which is semantically equivalent to a 0x5C backslash? As for normalization, that would suffice if one could guarantee that anything that normalizes the password prior to hashing will always treat any given string the same way, but I don't think it's easy to guarantee that. – supercat Jan 27 '15 at 18:01
1

I think the gap in historical renderings of ASCII character 0x7C is there to ensure that when rendered with a 7-pin printer it would not be confusable with "I", "l", or "1". Are you saying that Android devices don't have a key for code point 0x7C but instead have one for some other code point that renders as "¦", or that Android's fonts include a gap in their rendering of code point 0x7C? If the former, what do programmers on Android use for the "or" operator? – supercat Jan 27 '15 at 18:06
@supercat Korean versions of windows still use 0x5C as path separator, it just looks different. Something like C:₩Windows₩System32₩ I guess the reasons for this date back to code pages under DOS. A vaguely remember that a few codepoints below 128 were not fixed, \\ being one of them. In some northern European countries brackets []{} were used as letters, which lives in IRC case insensitivity rules treating some of those symbols as equivalent. – CodesInChaos Jan 27 '15 at 18:51
1

On a German keyboard, Y and Z are swapped (compared to an English keyboard), and most special characters are moved around. A keyboard is a device, so following your arguments, you should forbid Y, Z, and special characters as well. As well as A and digits, since they are different on French keyboards... – oefe Jan 27 '15 at 20:38
@oefe: When keyboards are set to input Latin alphabet (as opposed to Greek, Cyrillic, etc.), knowing that a key generates a character that looks like y is a pretty strong indication that the key is generating code point 0x00059, even if the key is positioned to the left of "X" on the bottom row and is labeled "Z". It's far less clear whether e.g. a Mathemetician's keyboard layout which includes a "א" key should have it generate a single character, or have it also generate marks to control writing direction (having aleph-1234 appear as "א1234" may seem normal to someone writing Hebrew... – supercat Jan 27 '15 at 21:20
...but would seem very odd to a mathematician.) – supercat Jan 27 '15 at 21:20
@supercat in the password dialog, they all look He same. – oefe Jan 27 '15 at 22:15
Unicode normalization forms are very well specified, so there shouldn't be a problem with diacritics as long as they are normalized before hashing. – David Conrad Jan 28 '15 at 19:32
@DavidConrad: Is the equivalence mapping among code-point sequences locked in stone for all time, such that code which correctly implements today's rules will know how to correctly handle all codepoints that will ever be defined? If so, when was that set of rules established? – supercat Jan 28 '15 at 19:59
@supercat No, your implementation of the normalization algorithm would have to be updated from time to time as new code points are allocated for diacritics, but 1) that's going to be pretty uncommon going forward, as Unicode already has a very rich set of diacritics, 2) it's all table driven, so you only have to incorporate the new Unicode data, not change logic, and 3) it's probably done by your language runtime, so you get it "for free" when you upgrade. – David Conrad Jan 28 '15 at 23:43
@DavidConrad: And what happens if a user with a newer Unicode implementation sets his password on a system with an older implementation that doesn't know about a certain combining sequence, but the system then gets upgraded so it does know about it? – supercat Jan 29 '15 at 00:37
@DaveAlger 1) They install a character set on the current computer 2) change domain account password using langauge specific characters 3) User gets a new machine, can't login because they don't have that character set installed currently. Admin has to install it for them. The fact that someone uses unicode characters doesn't some how make them a computer genius such that they "know what their doing". Many lay people use languages that they use with friends/family, and wouldn't think twice about using unicode characters in a password, not realizing the potential consequences. – AaronLS Jan 29 '15 at 01:53
I agree @AaronLS - password reset will always be needed for this and other such cases... – DaveAlger Jan 29 '15 at 02:05
@supercat In that incredibly unlikely scenario, they would simply rest their password, the same as if they had forgotten it. – David Conrad Jan 29 '15 at 04:02
@supercat Normalization is stable for any assigned codepoint. So if you want to be on the safe side, reject unassigned codepoints. http://unicode.org/policies/stability_policy.html – CodesInChaos Jan 29 '15 at 13:42
@CodesInChaos: How should one convey to users what codepoints they're allowed to use? – supercat Jan 29 '15 at 14:11
@DavidConrad: Not all systems accommodate a "forget password" feature--certainly not for all accounts. – supercat Jan 29 '15 at 14:12

Mayo · Answer 3 · 2015-01-26T19:19:56.827

I would like to add to DaveAlger's point. I, like many people, create algorithms in order to better remember passwords. I've spoken to many people (in an informal manner) about passwords and I have heard a lot of objections

why can't I use a part of my email or my username in my password?
why is there a character limit? (affects my algorithm)
why can't I use special characters?
why must I use upper case, or numbers, or special characters?

Just about everyone is frustrated when they're forced to change their password. IF the password submitted is considered insecure (example 1234, 1111 or qwerty) then tell the user that the password is rejected as it considered insecure. But make certain that the password is indeed insecure even it doesn't meet some individual requirement.

Password blueOrangeMetsWasSheaNowCiti is safer than #$78rt even though it doesn't use numbers or special characters.

Limiting does not help usability as it can only frustrate users and, while I'm not a security expert, I cannot see how limiting a character set can, in any way, aid in securing a system.

good point mayo! saying "Be sure to use a mix of characters that is hard for others to guess" is different than saying, "Don't use special characters in your password!" — DaveAlger, Jan 26 '15 at 19:22

Andrew Hoffman · Answer 4 · 2015-01-26T20:58:34.633

3

It can make sense from a usability and support perspective.

If the character isn't possible to type on a keyboard/phone without using alt codes or copy-pasting.

Keep in mind that the most active internet enabled devices have touch screens. Your user could create the account from their laptop, then try to access the account with their phone, which isn't capable of entering in the Æ character.

And all whitespace characters should be cleaned as a generic space, trimmed from front and back, and multiple whitespace characters back to back being ignored and treated as a single whitespace. Why? Because lots of things have issues with whitespace, especially leading, trailing, and multiple whitespace characters in a row.

While you should probably allow whitespace (people like to use phrases now), you might want to inform your user how you tidied up their password, but that they don't need to worry about it if that is how they are used to entering it.

Also allowing unicode characters that then need to be piped over HTTP is another potential support ordeal.

edited Jan 26 '15 at 20:58

answered Jan 26 '15 at 20:52

Andrew Hoffman

682
7
11

7

"you might want to inform your user how you tidied up their password" is a terrible concept. Looking at the Unicode character database, there are 11 white-space characters. By changing "Medium Mathematical Space" to "SPACE", you've effectively broken the ability for the user to enter their password. On a similar note, as a website designer, you shouldn't discourage users to be copy-pasting their passwords, utilities such as Keepass lead to greater user-unique passwords which is better for our entire industry. – AWinkle Jan 26 '15 at 21:56
8

If my computer is set up to easily input Æ,ß,ž or 말, then nowadays you can be pretty sure that my phone can also do the same because I'll be using those characters in my everyday communication including the phone. – Peteris Jan 26 '15 at 22:03
1

@AWinkle Perhaps I've miscommunicated. You're still able to enter your special whitespace character and log in. Its just that, when your password is created, it is created where your special whitespace simply becomes a space. The same cleaning should be applied during creation as well as during authentication. Regardless what kind of whitespace character, it just becomes a space. Unicode can cause problems well beyond the keyboard interface, just like with special whitespace characters and leading and trailing whitespace. That part isn't really about discouraging special characters. – Andrew Hoffman Jan 26 '15 at 22:27
1

Ultimately my answer isn't about discouraging the user or trying to protect them from themselves. Its about limiting potential support problems, real world UI & transport limitations. And I only mentioned web-interface technology, there are others. Its possible that credentials won't be going directly from the user to you. That isn't typical for most shops, but microsoft certainly has to deal with that. Ultimately it is easy to condemn such limitations strictly from a security perspective. When you get neck deep the weeds of UI and customer support, it becomes much harder. – Andrew Hoffman Jan 26 '15 at 22:40
@Peteris Pretty sure your phone will can do it? So you have not even tested your phone. – paparazzo Jan 27 '15 at 00:40
2

@Blam Are you kidding me? Of course my phone can easily enter all the non-ascii symbols of my native language and Korean users can easily enter the korean symbols on any non-ancient phone, etc. It's not something that needs to be specifically tested - almost every person with a smartphone outside of USA/UK is successfully entering non-ASCII symbols in their webbrowser, SMS, google queries, every day everywhere for many years now. In fact, in some environments it's harder to enter ascii symbols than non-ascii (a reason why numeric web domain names with no english letters are popular in China) – Peteris Jan 27 '15 at 01:03
8

@Blam my point is that if a person has chosen to have a password in, for example, Russian cyrillic or Greek or Korean or some other alphabet, then it's a reasonable assumption that they will be able to use this alphabet in all their input devices and many of them will, in fact, strongly prefer to use that alphabet instead of the latin alphabet. – Peteris Jan 27 '15 at 01:09
@Peteris On a logon maybe 'pretty sure', 'almost every', and 'reasonable assumption' is good enough for you but it is not for me. Statistically you can create a valid password with 128 ASCII. – paparazzo Jan 27 '15 at 01:19
13

@Blam You can't assume that all people can use latin letters. There are millions of users online who don't know any languages with latin alphabet, and they don't use those letters - i.e., they authorize with a facebook account (linked to phone number instead of an email that they don't have), and a password in their native language. For them, remembering any password made up from latin letters would be as easy as remembering a random string in chinese for me - bad UX. People can and do use online services without knowing that in some strange faraway countries 'a' and 'A' are the same letter. – Peteris Jan 27 '15 at 01:46
1

@AndrewHoffman If you don't like the character a user is entering, don't allow it either through character input restriction or validation. Altering the user input without the knowledge/consent of the user leads to worse UX (mainly through bugs). – AWinkle Jan 27 '15 at 01:52
@Peteris I am not assuming anything. And I seriously doubt a chines web site would use Latin characters. It is called a code page and a code page support 256 character of which about 30 are trash / control. Point is you don't need to support windings to have a valid password. You pick a control set you can support. A control set of 128 is significantly sufficient. I am not just speculating here. I support an application where we get some foreign language and we need to deal with word recognition. Unicode is not walk in the park even if UX thinks it is. – paparazzo Jan 27 '15 at 01:54
@DavidConrad How are you so sure you are not doing that today? What is the purpose of that? – paparazzo Jan 28 '15 at 19:46

score 3 · Answer 5 · answered Jan 26 '15 at 23:43

It depends.

If you've got reasonably strong control over the password input mechanism (keyboard layouts, software stacks, etc.), then letting users freely input anything they want is a good idea, because it maximizes the available password space. Someone attacking an English-language site probably won't try even obvious things like "كلمة المرور" (which Google Translate assures me is Arabic for "password"). In such a case, encoding mix-ups don't really matter, since any mix-up will be the same across all systems, canceling itself out.

On the other hand, if you're trying to support as broad a range of systems as possible, you should restrict passwords to the 95 printable ASCII characters, in order to keep programming and support nightmares to a minimum. Supporting everything means dealing with homoglyphs (Α, А, and A look identical to a human, but have different byte values), duplicate characters (the "micro sign" µ and the "Greek lowercase mu" μ represent the same character, but are encoded with different byte values), different composition forms (ñ and ñ look the same, but the first is an "n" followed by a combining tilde, while the second is a single precomposed character), and different ordering of combining charcters (ế can be expressed as either "e + acute accent + circumflex accent" or "e + circumflex accent + acute accent", which a computer sees as different) -- and that's just within Unicode. Garbled encoding transformations can mean that somebody's attempt to enter "Pässword" on an ISO 8859-1 system gets interpreted as "Pδssword" by an ISO 8859-7 system.

Unicode normalization available in all non-broken implementations should ensure that strings obtained from user devices with varying composition forms and different ordering of combining characters in the end are equal and hash to the same value. Non-unicode encodings doesn't have any good solutions, but are they still a problem nowadays? 10 years ago it was a serious issue, but in the last few years generally even the oldest obsolete living systems seem to interface with the outside world by 100% unicode or ascii. — Peteris, Jan 27 '15 at 00:56
@Peteris, key word there: should. I've been in software development too long to trust "should" until it's been thoroughly tested and subjected to a few years of hammering in a large-scale deployment. As for encodings, ISO-8859-1 and Windows-1252 are still common enough to worry about -- but since they look like ASCII as commonly used, most people can get away with assuming they are. — Mark, Jan 27 '15 at 02:01

score 3 · Answer 6 · answered Jan 27 '15 at 08:16

One problem that I've seen with non-ASCII passwords is that some systems deal with characters and others deal with encoding-specific code points. The "א" character might be represented in different ways depending on encoding, and sometimes the same character may be represented in any of multiple ways ("é" could equally be U00E9 or U0065 U0301).

Consider the Python2 -> Python3 transition. Serializing "שלום" in Python 2 and then comparing to the same value serialized with Python 3 will result in FALSE as Python 3 due to the changes in the way strings are represented internally in Python.

PHP has a similar mishap: it deals with bytes, not characters. So a user who has input "שלום" on a page with CP-1252 encoding may not be able to log in on a page with UTF-8 encoding. Combine this with the host of encoding issues that were inherent in using the mysql_* drivers (less so in the PDO drivers, which make it easier) and the problem is compounded.

In PHP I've dealt with the issue on non-English websites by ensuring that I'm properly connecting to the database via UTF-8 (Much easier since PHP 5.3.3 but problematic before that, even with PDO, so much so that I still remember the critical version number), using prepared queries and proper hashing, and that I'm always serving the page as UTF-8. However I've seen no end to the problems that my less pedantic colleagues face with the issue.

@JanDvorak: Nice! Unfortunately PHP was only the example. Perl had it even worse. — dotancohen, Jan 28 '15 at 14:22
Perl wasn't meant for web development, was it? TBH, it looks a bit like an esoteric language that got a bit too popular to me. I do agree that poor unicode support in a language meant primarily for text processing is unfortunate. Reference? — John Dvorak, Jan 28 '15 at 14:28
The database part shouldn't cause problems with passwords, since they get hashed before touching the database in most applications. — CodesInChaos, Jan 29 '15 at 14:50
@JanDvorak Do you mean don't use Wikipedia or Facebook or a particular hobby community's forum? I imagine few people are likely to accept those "solutions". — Damian Yerrick, Mar 08 '16 at 20:16
@DamianYerrick I meant "don't use PHP", not "don't use web pages made in PHP". Wikipedia seems to cope pretty well with unicode support, and I'm well okay with recommending to ditch Facebook :-D — John Dvorak, Mar 08 '16 at 20:20

score 2 · Answer 7 · answered Jan 29 '15 at 07:11

Yes, unfortunatelly, the restrictions on characters the user can use in the password have sense from the UX perspective.

If you're operating a worldwide service, you'd like your users to be able to log from all places of the world. If it's the case, you must take into account, that unfortunatelly, the keyboards are not standarized.

Event the layout of the basic characters is variable. For example, in Germany, gods know why, they've exchanged 'Z' with 'Y' and they've done a complete mish-mash with other characters.

Event if you, as a user, manage to find given button on the keyboard, it's still most likely you won't be able to type 'national' characters of your choice, because every country has own keyboard layout (virtual, or in extreme cases, like in Germany, even physical) and allowing any user to choose any keyboard layout is not an option, at least not in Windows machines.

Please note, that most users are not aware of that issues.

So there are practical reasons for limiting the choice of the characters one can have in the password to the set that can be (probably) typed by any user from any machine.

How common is it for a given user to need to log in using a device with a different layout than one they are used to? Anyone who travels with a laptop/tablet/smartphone will not need to worry about this, nor anyone who doesn't travel internationally. (Nor, obviously, anyone using a password manager.) — Nathan Tuggy, Jan 14 '16 at 02:38

score 1 · Answer 8 · edited Jan 28 '15 at 19:16

1

Authentication and security are critical. A security breach will kill you.

Do you have any idea what goes on between a keyboard and server? You have normalization, encoding, ambiguities in Unicode, serialization, NAT, man-in-the-middle, and other measures. That is a secure end-to-end transaction that is used for the entire session. Bad guys want to take advantage of any of that stuff.

A common security practice is to limit the attack surface and protect it like a soldier.

This is not programmers being lazy: it is about protecting a very sensitive and critical function that lots of bad guys are tying to exploit.

To say “I want to use any character, but protect me” is like saying “accept any form of ID but assure me that they are who they say they are”. I cannot get on a plane with my school ID for a reason: it is not as secure.

The bottom line is that a functional secure password can be created from 128 characters. There is no security reason to support more. To support all Unicode is a needless security vulnerability.

edited Jan 28 '15 at 19:16

TRiG

793
6
20

answered Jan 27 '15 at 03:51

paparazzo

2,212
15
19

3

"Reducing attack surface" and "maximize available keyspace" are here directly opposed. Why do you consider that the former takes such precedence? – Nathan Tuggy Jan 27 '15 at 07:13
@NathanTuggy No they are not. More unique characters is more attack surface. Why do you think you need more than 128 characters to create a valid password? – paparazzo Jan 27 '15 at 14:00
2

You're just reiterating what you already said, without explaining why they aren't opposed. The keyspace with two valid characters is adequate, in a technical sense, since you can simply construct enormously long passwords; that doesn't mean that's a secure choice, though. – Nathan Tuggy Jan 27 '15 at 19:48
@NathanTuggy Yes I am reiterating what I said and my language is clear and used used accurately. Attack surface is the type of stuff that goes on paragraph two. Attack surface is not password complexity. A sufficiently complex passwords can generated with 128 characters - that is is not attack surface. Shutting down port 1433 is reducing attack surface. Only allow certain verbs in HTTP is attack surface. Not allow characters that can be exploited is reduce attack surface. A limited set of characters so you can perform extensive is reduce attack surface. – paparazzo Jan 27 '15 at 20:43
The former takes precedence because that is standard security practice. I don't do security full time now but I used to. – paparazzo Jan 27 '15 at 20:46
3

Characters 0-32 and 127 are problematic too - that leaves less than 128. Regarding security breaches - injection is an issue if the code is bad, however, basic characters e.g. ' which should be OK for passwords falls under this condition. If you save the encrypted password and not-plain text (which you should be doing anyway (including salt)), then SQL injection is not an issue. – Danny Varod Jan 27 '15 at 23:33
@DannyVarod OK then like 94 good soldiers (characters). With some languages would need to go into the 256 character code page and get like 200. At 90 characters a password of length 6 has 531,441,000,000 combinations. In milliseconds that is 17 years. That is a valid password. – paparazzo Jan 27 '15 at 23:51
1

Honestly, NAT? What possible relevance does that have? Telling your Chinese, Arabic, Korean, Russian, Hebrew, Greek, et cetera ad nauseam users that they must use Latin letters in their password is completely unacceptable in the modern age, and is not necessary for security. – David Conrad Jan 28 '15 at 19:48
@DavidConrad Honestly, can we all agree that you have trouble reading English. I said you can create a functional password from 128 character - I did not say Latin chars. If limiting password and userid to the a subset of the local code page is is such a stupid idea that you would not buy that product then stay off Microsoft products as that is how Active Directory works. Troll. – paparazzo Jan 28 '15 at 20:07

score 1 · Answer 9 · edited Apr 13 '17 at 12:32

A few guide lines:

Let user enter any character in the ASCII range of 32 (space) to 126 (~) - these should be the same in any character code.
Limiting your users to less characters will only frustrate them and force them to choose less secure or harder for them to remember passwords.
Characters bellow ASCII 32 (and ASCII 127 = Delete) have specially meaning e.g. ESC, Enter (submit form), Tab (jump field) and other characters that are not meant to be typed and therefore should not be accepted.
Characters above ASCII 127 may not be typable on some keyboards or devices (may prevent login via phone or via other PCs while abroad) and require unicode storage (less of an issue, however, you need to be aware of it)
Don't limit the length too much - e.g. let users enter 10s of characters e.g. up to 50 or to 100.

More details on passwords in my answer here.

Ian · Answer 10 · 2015-01-27T22:54:29.403

0

What if it takes two weeks to get the password reset by a letter being sent to my home address and I can’t access my bank on holiday due to their being a character in my password that I can’t type on my iPhone.

Will I blame the bank, will I consider changing banks……

What if my back decides at some later stage they wish to use drop down lists to input a password rather than a text field….

What if the keyboard I am using today outputs a different character when I press £?

Given the risk in resetting passwords, are we just creating a different risk by allowing users to have password that are more likely to need resetting?

However if the password can be reset just by the website sending me an email, it should allow anything in the password.

edited Jan 27 '15 at 22:54

answered Jan 27 '15 at 11:02

Ian

591
2
7

2

If a password reset requires a physical letter sent to your home then they are still doing it wrong and you should absolutely blame the bank. Ditto for using anything non-standard like drop down lists. – NotMe Jan 28 '15 at 15:20
@ChrisLively, drop down lists makes it a LOT harder for some password stealing virus. Doing the reset by a "back chancel" make it a lot harder for a fishing site to trick someone into resetting their password. Remember that is my be my Mother in Law that is being targeted, not one of us... – Ian Jan 28 '15 at 17:05
1

I hope you're kidding, because you've now moved into "seriously bad info" territory. Drop down lists don't make it harder on anyone other than the person trying to enter their password. – NotMe Jan 28 '15 at 17:49
@ChrisLively, they do, for example they stop keyboard loggers from working. – Ian Jan 28 '15 at 18:07
1

There is a very simple answer to your last question. Simply allow users to use a wide variety of characters, and see whether the number of password reset requests actually goes up by a statistically significant amount. If not, don't borrow trouble. – David Conrad Jan 28 '15 at 19:52

score 0 · Answer 11 · answered Jan 31 '15 at 09:44

Limiting allowed characters in passwords to a sane subset of printable characters is a good idea. More flexibility is not always better. That's why we have speed limits on roads. Frequently, usability is about protecting users from their natural aptitude for shooting themselves in the foot.

From a server-side security standpoint, there is no problem in supporting full unicode passwords. Even if you don't want to handle crazy characters on the server side, a little bit of javascript coding can easily preconvert all passwords to hexadecimal notation before they are sent and handled by the server. But, for the reasons given above, one should do this -and- also limit to a sane subset of printable characters.