There are a number of questions relating to problems when using utf8 input encoding, and the closest I found to my problem is, I think, What characters in a normal text document will screw up LaTeX?.
A user of my censor package asked if I can get the package's \blackout macro to work with utf8 umlauts. So far, the answer is "no". I narrowed the problem down to the macro \bl@t which censors argument #2 and then reinvokes a recursion via argument #1. The problem, best I can tell, is that utf8 encoding requires more than 1 byte for things like umlauted characters, and so the #2 passed to \bl@t is only half a character, and so it chokes.
Here is an MWE that, if you uncomment either of the two commented lines, will break the code:
\documentclass{article}
\usepackage[ngerman]{babel}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{censor}
\makeatletter
\def\stringend{$}
\long\def\blackout#1{\def~{-}\censor@Block#1\stringend\let~\sv@tilde}
\long\def\censor@Block{\IfNextToken\stringend{\@gobble}%
{\IfNextToken\@sptoken{ \bl@t{\censor@Block}}%
{\bl@t{\censor@Block}}}}
\long\def\bl@t#1#2{\if\bpar#2\par\else\if.#2\censordot\else\censor{#2}\fi\fi#1}
\makeatother
\begin{document}
äöüß, \censor{äöüß}\par
\blackout{ab\par cd}\par
%\blackout{ä}\par
\makeatletter
%\bl@t xä
\end{document}
The package's \censor macro works fine on the umlauted stuff, but \blackout and more specifically, the \bl@t service routine, do not. If you want it more simplified, you can think of \bl@t as \def\bl@t#1#2{\censor{#2}#1} (but this will not work with \pars in the input stream). The #1 is always a reinvocation of \censor@Block on the remaining input string.
EDIT: It would seem that, if a multi-byte input character is next in the input stream, then this definition
\long\def\bl@t#1#2#3{\if\bpar#2#3\par\else\if.#2#3\censordot\else
\censor{#2#3}\fi\fi#1}
can absorb it properly. Thus, reversing which invocations are commented:
%\blackout{ab\par cd}\par
\blackout{äöüß}\par
\makeatletter
\bl@t xä
works fine. So the key will be to be able to determine in advance which type of character lies next in the input stream and choose the appropriate parsing method.

\censor{#2}, but set aside that byte, and wait for the next one in the input stream and then recombine them. – Steven B. Segletes Dec 03 '13 at 20:27\showöwill show> �=macro: ->\UTFviii@two@octets �.which tells you the first byte of o umlaut is the first byte of a two octet sequence (you could get three or four, depending on the character) – David Carlisle Dec 03 '13 at 21:09\if\bpar#2\par?\bparseems to be\endgraf(ie\letto the primitove\par) so the effect is that any non expandable token other than a character is replaced by\par. – David Carlisle Dec 03 '13 at 22:11\bparis a deprecated feature retained for backward compatibility. If it shows in the user's input stream, it should be treated as a\par. As far as\ifvs.\ifx, I trust your interpretation better than my own limited understanding. The purpose of\blackoutis to censor the input stream one character at a time, making allowances for space tokens,\partokens, and periods. – Steven B. Segletes Dec 03 '13 at 23:12\ifconsiders all (non expandable) commands equal\if\par\vskipis true. But even as edited it is not testing for\bparin the input stream\par\bpar\endgrafetc would all test as true with\ifx\bpar#2– David Carlisle Dec 03 '13 at 23:14\vskipinto a\parwould not be ideal, I'm sure that\censor{\vskip{...}}would likewise not turn out well. – Steven B. Segletes Dec 03 '13 at 23:18\bparand\parto both test as true in that test, so what you have is correct. – Steven B. Segletes Dec 03 '13 at 23:21