How to create multilingual (English, Japanese) bibliographies with biblatex, biber and polyglossia

Question

I would like to update my bibtex file in a way that allows me to create multilingual bibliographies. Since I publish in English and Japanese, I will need this to work in a way that: In an English paper,

Japanese entries in the Bibliography will have the names of authors both in Kanji and in Romanization; titles will also have to be duplicated in this way, but in addition I sometimes also translate the title, so there might be three entries.
English citations will appear as usual.

In a Japanese paper

The Japanese entries will be just author, title etc in Kanji, without additions.
The English entries might have a translation of the title and give the Author in Japanese Katakana spelling.

To enable this, I thought of extending the bibtex entries in the following way

@collection{yanagida_zengaku_sosho_1975,
    Address = {京都},
        Adress_Romaji = {Kyōto}
    Editor = {柳田聖山},
        Editor_Romaji = {Yanagida Seizan}
    Publisher = {中文出版社},
    Publisher_Romaji = {Chūbun shuppansha},
    Title = {禪學叢書},
    Title_Romaji = {Zengaku sôsho},
        Title_en = {Collected Materials for the Study of Zen}
    Volumes = {10},
    Year = {1974-1977}}

I then hope that I will be able to pull the necessary pieces out of here and process them with biber and biblatex. How to go about that, I have no idea.

The functionality to do this is in biber/biblatex in experimental form using a different data source format. Let me see if I can get something working with your example. — PLK, Sep 12 '11 at 18:43
I have this working now - I'll add a real answer when biber 0.9.6/biblatex 1.7 is released soon as you'll need them. — PLK, Sep 14 '11 at 18:25
This sounds great, I am really looking forward to it. Does it work with the data structure outlined above? Even if it can't be used today, I'd still be glad to know more, so that I can start working on the bibliography! — Chris, Sep 19 '11 at 06:00

score 15 · Answer 1 · edited Jan 12 '12 at 07:36

I've discussed this with the biblatex maintainer and we will probably aim for a style implementation of this with biblatex 3.x. With 1.7/0.9.6, the following will be possible. You will have to use the experimental biblatexml datasource format for such entries (you can still have all of your normal entries in bibtex format).

<?xml version="1.0" encoding="UTF-8"?>
<bib:entries xmlns:bib="http://biblatex-biber.sourceforge.net/biblatexml">
  <bib:entry id="yanagida_zengaku_sosho_1975" entrytype="collection">
    <bib:editor>
      <bib:person gender="sm">柳田聖山</bib:person>
    </bib:editor>
    <bib:editor mode="romanised">
      <bib:person>
        <bib:first>
          <bib:namepart initial="Y">Yanagida</bib:namepart>
        </bib:first>
        <bib:last>Seizan</bib:last>
      </bib:person>
    </bib:editor>
    <bib:title>禪學叢書</bib:title>
    <bib:title mode="romanised">Chūbun shuppansha</bib:title>
    <bib:title mode="translated" xml:lang="en">Collected Materials for the Study of Zen</bib:title>
    <bib:location>京都</bib:location>
    <bib:location mode="romanised">Kyōto</bib:location>
    <bib:location mode="translated" xml:lang="en">Kyoto</bib:location>
    <bib:publisher>中文出版社</bib:publisher>
    <bib:publisher mode="romanised">Chūbun shuppansha</bib:publisher>
    <bib:date>
      <bib:start>1974</bib:start>
      <bib:end>1977</bib:end>
    </bib:date>
  </bib:entry>
</bib:entries>

There is no way to do this with bibtex format but this is no problem for biber - you can have many datasources of different formats. With the above example, you could choose to use the display format "romanised" and biber would construct the .bbl with only the romanised mode fields, for example. There will be no way to use mixed modes in the same entry however as this would need a radically enhanced .bbl format and massive internal biblatex changes which are planned for version 3.x

The above example uses the global displaymode setting (which will be in biblatex 1.7). You will also be able to set per-entry modes with an attribute on the entry, for example:

<bib:entry id="yanagida_zengaku_sosho_1975" entrytype="collection" mode="translated">

The default mode is "original" which matches fields with no mode specified too.

Edit on release of biber 0.9.6/biblatex 1.7: This is now implemented as mentioned. The default global setting is:

\DeclareDisplaymode{original,romanised,uniform,translated}

this biblatex macro is undocumented at the moment but you should be able to use it to change the global displaymode choice order. You can also set displaymode per-entry as shown above. Let me know on the biber SourceForge forum if you have problems.

The problem is, the biblatexml format isn't documented yet and probably isn't completely stable so you'll be partly testing it ... — PLK, Sep 19 '11 at 18:47

score 12 · Answer 2 · answered Feb 18 '14 at 21:08

12

I'm adding another answer as biblatex 3.0+biber 2.0 are now in experimental release and have a different solution to this. You can now make a test.bib file like this:

@COLLECTION{yanagida_zengaku_sosho_1975,
  LANGID = {japanese},
  EDITOR = {柳田聖山},
  EDITOR_romanised = {Yanagida, Seizan},
  TITLE = {禪學叢書},
  TITLE_romanised = {Chūbun shuppansha},
  TITLE_translated_english = {Collected Materials for the Study of Zen},
  LOCATION = {京都},
  LOCATION_romanised = {Kyōto},
  LOCATION_translated_english = {Kyoto},
  PUBLISHER = {中文出版社},
  PUBLISHER_romanised = {Chūbun shuppansha},
  DATE = {1974/1977}
}

with this document and XeLaTeX:

\documentclass{article}
\usepackage{fontspec} 
\usepackage{polyglossia} 
\setdefaultlanguage{english}
\usepackage{xeCJK}
\setCJKmainfont{Hiragino Mincho Pro}

\usepackage[style=authoryear,%
            language=auto,%
            autolang=langname,%
           ]{biblatex}
\addbibresource{test.bib}

\begin{document}
\nocite{*}

\printbibliography
\end{document}

You will get:

enter image description here

Adding the new 3.0 biblatex option vform=romanised (either globally or to the options field of the .bib entry), you get:

enter image description here

More importantly, all biblatex internals and externals now understand these field variants and so you could unset vform and redefine, for example:

\renewbibmacro*{editor+others}{%
  \ifboolexpr{
    test \ifuseeditor
    and
    not test {\ifnameundef{editor}}
  }
    {\printnames{editor}%
     \addspace\mkbibparens{\printnames[][][form=romanised]{editor}}
     \setunit{\addcomma\space}%
     \usebibmacro{editor+othersstrg}%
     \clearname{editor}}
    {}}

(notice the \printnames call). Now you get:

enter image description here

You can mix any variants within the same bibliography by altering your style to reference the variants without abusing the orig* fields. There is a lot more you can do with this, see the biblatex 3.0 documentation. biblatex 3.0 is in the "experimental" folder on SourceForge and you will need to be using biber 2.0.

answered Feb 18 '14 at 21:08

PLK

22,776

1

This is very cool! So the idea is I could have location = {Munich}, location_ngerman = {München}, location_latin = {Monacum} (etc., etc.) and then output whichever form based on the appropriate vform option? (I had been meaning to make a feature request along these lines for some time now...) – jon Feb 18 '14 at 22:11
The extended .bib format is FIELD_FORM_LANG (where the divider is configurable, underscore by default). So you could do: location_translated_ngerman = {München} and then set vform=translated, vtranslang=ngerman to default to a specific form/lang. – PLK Feb 19 '14 at 12:43
Ah, OK. I look forward to having this functionality. – jon Feb 19 '14 at 15:38
You can try it now by getting the biblatex and biber experimental versions from SF. – PLK Feb 19 '14 at 16:58
I will, hopefully soon. I'm in nearing the end of two projects and I'm in the do-no-mess-with-things part. – jon Feb 19 '14 at 17:08
I was asked if the system will also be able to handle other transscriptions than "romanized" (e.g. from one japanese script to another.). Btw: I downloaded the experimental biblatex and it contains two documentations - one for 3.0. and one for 2.9. – Ulrike Fischer Apr 14 '14 at 15:47
Not at the moment - is the list of such scripts manageable - what sort of thing were you thinking of? – PLK Apr 14 '14 at 18:24
Well I normally tried to avoid to make things to complicated ;-) But I got a request for fields like AUTHOR_transcribed_hepburn, AUTHOR_transcribed_katakana etc. If I would add "normal" fields with such transscriptions to the datamodel could I map the wanted transscription to author_romanized with DeclareSourceMap before biber handle the data variants? If yes: what are actually the allowed chars for field names? "field-new" seems to work but I can't find anywhere an exact description what makes a valid field name (and a valid key) with biber/biblatex . – Ulrike Fischer Apr 15 '14 at 07:47
Fields should allow whatever bibtex (more accurately, the btparse C library) allows. Those requests are tricky because the experimental biber/biblatex branch needs FIELD_form_lang (by the way, the field/form/lang separator is configurable with biber). I need to see if the form part is customisable in the current code. – PLK Apr 15 '14 at 08:08
From http://search.cpan.org/~gward/btparse-0.34/doc/btparse.pod strasse_neu should be a valid name but I get from biber 1.8 a warning: WARN - The field 'strasse' in entry 'max' cannot be null, deleting it. Perhaps a catcode problem. But don't bother I can use dots or hyphen instead. Are you planing also to add something to handle the "name order" problem with asian languages? See e.g. http://tex.stackexchange.com/questions/66741/chicago-style-citations-of-cjk-documents-e-g-american-oriental-society-name. – Ulrike Fischer Apr 15 '14 at 08:46
That error in 1.8 is because 1.8 already has code in to handle the new field name form/lag formats. You shouldn't use an underscore in field names in the 1.x branch or use the --mssplit option to change this to something which never appears in your field names. I'll look at that TSE question about name order. – PLK Apr 15 '14 at 08:50
1

The name order issue is really a style issue I think - entries should include a langid field and then the style should conditionalise the name format on the language. That is, parse the name normally as last,first but print them differently depending on the language of the entry? – PLK Apr 15 '14 at 08:59
I'm not sure that langid is the correct field. After all a chinese could write an french book. I think an additional field would be better. I will discuss this a bit with my customer. She knows more about the name traditions than me ;-). And one would need an idea to handle a book with two authors with different name structure. – Ulrike Fischer Apr 15 '14 at 09:19
1

You are right about the names - in fact in the new "multiscript" branch, the point of being able to specify a language as part of the field name is that you can then conditionalise on that language. That is, your style knows which language rules to use to render the name. The new biblatex/biber functionality simply provides the necessary information - the style has to use it ... I'd be glad to hear your requirements and experiences with the new experimental version - there is a "Multiscript testing for biblatex 3.0" enhancement on github you can use for this. – PLK Apr 15 '14 at 14:33
I saw that this has been put "on hold" in the biblatex issue tracker. Are there any plans regarding extending biblatex to allow multiscripts "variants"? @moewe: Would you mind to contact me by mail about this question? – Ulrike Fischer Apr 25 '16 at 09:12
Good question. A lot of work was done on this and a few solutions tried but the complications to biblatex internals was considerable and I found a few areas where it was really tricky. I put it on hold to concentrate on other issues. At the moment, the experimental branch isn't really useable. I would appreciate any general thoughts on what the functionality would need to be. I opened a general enhancement issue or discussion: https://github.com/plk/biblatex/issues/416 – PLK Apr 26 '16 at 08:47

score 2 · Answer 3 · answered Mar 25 '18 at 19:49

I am using a custom style to address this problem. For a MWE, let's use a test.bib file like the one PLK prepared:

@COLLECTION{yanagida_zengaku_sosho_1975,
  LANGID = {japanese},
  EDITOR = {柳田聖山},
  editor_romanised = {Yanagida, Seizan},
  TITLE = {禪學叢書},
  TITLE_romanised = {Zengaku sōsho},
  TITLE_translated_english = {Collected Materials for the Study of Zen},
  LOCATION = {京都},
  LOCATION_romanised = {Kyōto},
  LOCATION_translated_english = {Kyoto},
  PUBLISHER = {中文出版社},
  PUBLISHER_romanised = {Chūbun shuppansha},
  DATE = {1974/1977}
}

First we need to extend biblatex's data model. Add a file named biblatex-dm.cfg with something like:

% Declare transliterated/translated fields
% To guess the type cf. https://github.com/plk/biblatex/blob/dev/tex/latex/biblatex/blx-dm.def

\DeclareDatamodelFields[type=field, datatype=literal]{title_romanised}
\DeclareDatamodelFields[type=list, datatype=literal]{location_romanised}
\DeclareDatamodelFields[type=list, datatype=name]{editor_romanised, author_romanised}

%Fields should be now assigned relevant entry types, but it seems not to be actually needed...
%\DeclareDatamodelEntryfields{title_romanised, editor_romanised, author_romanised,...}

Finally, we have to define a style that uses the fields as desired. For example, to print romanised title and editor when available create a file named addromanised.bbx with:

\ProvidesFile{addromanised.bbx}

%A base style
\RequireBibliographyStyle{numeric}

%Check what bibmacros need to be rewritten. See the style you are using as a base to find out the macros. They usually come from:
%https://mirror.hmc.edu/ctan/macros/latex/exptl/biblatex/latex/bbx/standard.bbx
%https://github.com/plk/biblatex/blob/dev/tex/latex/biblatex/biblatex.def

\renewbibmacro*{title}{%
  \iffieldundef{title_romanised}{%
    \printfield{title}%
  }{%
    \printfield{title} (\printfield{title_romanised})%
  }%
}%

\renewbibmacro*{editor+others}{%
  \ifboolexpr{
    test \ifuseeditor
    and
    not test {\ifnameundef{editor}}
  }
    {\ifnameundef{editor_romanised}{%
      \printnames{editor} %
    }{%
      \printnames{editor} (\printnames{editor_romanised}) %
    }%
     \setunit{\printdelim{editortypedelim}}%
     \usebibmacro{editor+othersstrg}%
     \clearname{editor}}
    {}}

This file is easy to extend to add more field or to add a translation as well or to replace them by their romanised versions.

The following code shows how the style is used with LuaLaTeX:

\documentclass{article}

\usepackage{fontspec}
\usepackage{luatexja-fontspec}

\usepackage{polyglossia} 
\setdefaultlanguage{english}

\usepackage[backend=biber,bibstyle=addromanised]{biblatex}
\addbibresource{test.bib}

\begin{document}
\nocite{*}

\printbibliography
\end{document}

Creating an additional style which adds katakana fields instead will solve the other half of the problem.

As a final note, if you are sorting by any of these fields, you may want to use the romanizations when they are available. For example, to adapt the nyvt sorting you could add the following code to your style:

\DeclareSortingScheme{romanisednyvt}{
  \sort{
    \field{presort}
  }
  \sort[final]{
    \field{sortkey}
  }
  \sort{
    \field{sortname}
    \field{author_romanised}
    \field{author}
    \field{editor_romanised}
    \field{editor}
    \field{translator}
    \field{sorttitle}
    \field{title_romanised}
    \field{title}
  }
  \sort{
    \field{sortyear}
    \field{year}
  }
  \sort{
    \field{volume}
    \literal{0}
  }
  \sort{
    \field{sorttitle}
    \field{title}
  }
}

And load it with \usepackage[backend=biber,bibstyle=addromanised sorting=romanisednyvt]{biblatex}

Welcome to TeX! Nice answer style. – Bobyandbob Mar 25 '18 at 20:00 — Bobyandbob, Mar 25 '18 at 20:00

score 1 · Answer 4 · answered Aug 03 '23 at 09:42

I had trouble with japanese + english text as well. Using the CJKutf8 package and inserting \begin{CJK*}{UTF8}{MS Mincho}...\end{CJK*} around japanese text - even in the bib file! - worked for me. Example:

document.tex

\documentclass{article} % document style - adjust to your needs
\usepackage[square,numbers,sort&compress]{natbib} % citation style - adjust to your needs
\usepackage{CJKutf8} % this is the library needed to include chinese, japanese, korean and thai characters
\begin{document}
Hello World. Citation example\cite{mynaviEneplat2022}. Japanese font in text:
%  in the 'MS Mincho' font - if you need other languages, you might need other fonts.
\begin{CJK*}{UTF8}{MS Mincho} % {goth} or {min} didn't work for me, I don't know why
    こんにちは世界
\end{CJK*}

% japanese font in bibliography:
\bibliographystyle{unsrtnat}
\bibliography{literature}{}


\end{document}

literature.bib

@misc{mynaviEneplat2022,
    title = {\begin{CJK*}{UTF8}{MS Mincho}自家発電・自家消費を促進するパナソニックの{V2H蓄電システム}「eneplat」\end{CJK*}},
    howpublished = {\url{https://news.mynavi.jp/article/20221205-panasonic/}},
    language = {ja},
    urldate = {2023-08-02},
    note = {Accessed: 2023-08-02},
    journal = {\begin{CJK*}{UTF8}{MS Mincho}マイナビニュース\end{CJK*}},
    author = {\begin{CJK*}{UTF8}{MS Mincho}{諸山泰三}\end{CJK*}},
    month = dec,
    year = {2022},
}

Delete your temporary .bbl and .aux files before you try this so that LaTeX / PDFLaTeX recreate them properly. Also make sure your .tex and .bib are in UTF-8 (e.g., check with Notepad++), but usually, this should already be the case.

Result:

It is a nice trick that works with bibtex. – Doug Liu Jan 02 '24 at 17:07 — Doug Liu, Jan 02 '24 at 17:07

How to create multilingual (English, Japanese) bibliographies with biblatex, biber and polyglossia

4 Answers4

Linked