I am trying to output several megabytes of UTF8 text into a printable format. As enscript and friends don't like Unicode, this Unix SE question soon turned to LaTeX.
Summary of requirements: processing of several MB of Chinese, Latin and special chars (i.e. invalid LaTeX code), consisting of long and short lines, into a tightly packed, multi-column format. I know LaTeX has the ability to control font size, spacings, margins and so on, as well as headings and page numbers, so I am not too worried about that.
The problem I am having so far is getting both wrappable and UTF8-capable verbatim input of the file.
Is there a way to get both?
With the fancyvbr package, for example, I get good rendition of the Chinese text, but no line breaking:
\documentclass{article}
\usepackage{CJKutf8}
\usepackage{ucs}
\usepackage{multicol}
\usepackage{verbatim}
\usepackage[encapsulated]{CJK}
\newcommand{\myfont}{bsmi} % or {stheiti}, etc
\begin{document}
\begin{CJK}{UTF8}{\myfont}
\begin{multicols}{2}
\VerbatimInput[fontfamily=cmr]{file.txt}
\end{multicols}
\end{CJK}
\end{document}
I have so far been unable to get a listings-type environment to deal with the Unicode in the file.
Example of the type of thing I'm feeding it. This to to format thousands of lines of multi-lingual chat logs. Basically 100,000++ lines of this kind of thing:
###### 2013-10-26.223938+0000GMT.html ######
**** user@example.com/ (jabber) ****
(00:00:00) Lorem ipsum
(00:00:01) 車檢作畫病得星定局而是作的所由次園又此對這一馬的生故他試……外由懷黃客建時常嚴在位以說其配場戲回有部結一自法就生機,定的被各世皮全空!也地傳現他重城,書照展商直起眾家不思政國林年八計出地口早體故失離際們層氣,簡他廣集義,四便入的只了極。
(00:00:02) odesset ullamcorper quo. Cu adipisci assentior eam, debet definiebas eos ad. Te eos nihil populo vivendum, vix iusto noster peri
(00:00:02) odesset ullamcorper quo. Cu adip
(02:00:02) odesset ullamcorper quo. Cu adipisci assentior eam, debet d
(02:00:01) 車檢作畫病得星定局而是作的所由次園又此對這一馬的生故他試……外由懷黃客建時常嚴在位以說其配場戲回有部結一自法就生機,定的被各
(02:00:02) ok
(00:01:02) 病
(00:00:02) ok
(00:01:02)
###### 2013-10-26.223938+0000GMT.html ######
**** user@example.com/ (jabber) ****
(00:00:01) 車檢作畫病得星定局而是作展商直起眾家不思政國林年八計出地口早體故失離際們層氣,簡他廣集義,四便入的只了極。
(00:00:02) odesset ullamcorper quo. Cu adipisci assentior eam, debet definiebas eos ad. Te eos nihil populo vivendum, vix iusto noster peri
(00:00:02) odesset ullamcorper quo. Cu adip
(02:00:02) odesset ullamcorper quo.
I'm ideally looking for the most general solution possible, as the input text is not necessarily in a well-defined machine-readable format.
Edit also tried the following using the plain verbatim package:
\usepackage{verbatim}
\makeatletter
\def\@xobeysp{ }
\makeatother
with this in the document body:
\verbatiminput{file.txt}
This works for normal text and Chinese, but fails to break lines containing very long words, URLs or long strings of letters, all of which occur in the file.



file.txtfile? – egreg Nov 20 '13 at 16:19fancyvrb, trylistings. See Verbatim environment and line-breaking. Don't forgetcolumns=fullflexible. – Gilles 'SO- stop being evil' Nov 20 '13 at 16:49listingsto work with CJK. Workarounds like this one don't scale to several tens of thousands of Chinese characters! – diwhyyyyy Nov 20 '13 at 16:59xeCJKpackage (since v3.2.3) does supportlistingspackages well, if you need it. – Leo Liu Nov 21 '13 at 11:18