4

This question emphases the issue for correct presentation of mixed English-RTL (and, in particular, Hebrew) in different Windows editors. In order to illustrate the issue, this is LaTex compilation result:

enter image description here

This is the presentation of the text in TeXstudio, TeXworks and Notepad++ (the three ones I have recently checked).

enter image description here

This is the result of starting new line.

enter image description here

The preferable single-line text presentation (and the input) is:

.בדיקה $x^2$ שלום

By now, the only editor (not useful for LaTex) I have verified to correctly support it is MS Word.

Moshe Gueta
  • 435
  • 1
  • 4
  • 15
  • You've not said exactly what the input is here, particularly in terms of RTL/LTR markers. I'm surprised you are getting the two $ next to each other without some invisible markup. – Joseph Wright May 29 '19 at 06:37
  • I have edited the question. Yes, editor presents the two $ next to each other without any invisible markup. – Moshe Gueta May 29 '19 at 07:46
  • you could try winedt, it has bidi support. But I have no idea if it works as expected (I do find direction changes rather confusing and never really understood how the input and output should relate). – Ulrike Fischer May 29 '19 at 08:29
  • 1
    In case that somebody knows Persian, I have addressed the issue and a solution in here – Hosein Rahnama Aug 23 '23 at 10:03
  • @MosheGueta: This mostly seems like a statement, rather than a question. So, what is the question here? Are you looking for editors that correctly support RTL writing with LaTeX? Please clarify. – Werner Aug 31 '23 at 15:55
  • The support for bi-directionality in most editors is far from good. When it comes to editing Hebrew in LaTeX I usually prefers writing in LyX. – Guy Jun 01 '19 at 13:51
  • @Werner: Looking for the correct RTL presentation with LaTeX. – Moshe Gueta Sep 04 '23 at 09:49

2 Answers2

5

I don't think it's so clear that what you have seen is incorrect, even though certainly not ideal when working on a LaTeX file.

I'm giving examples of Emacs (available on "all" operating systems, certainly on MS Windows) and how you there can change the behaviour.

(BTW, I wonder if the input you give really has turned out correct here, because there is a period first (!). I've used that as it appears when cut-and-pasting from the question, in spite of my doubts.)

When I cut and paste the part "The preferable single-line text ..." from your question into Emacs it is shown as

enter image description here

with the same dollars-shown-next-to-each-other as you had observered. We know that $x^2$ here is one unit, but I don't think the Unicode Bidirectional Algorithm (UBA) (which for example Emacs implements) is supposed to see that.

The "$" sign has the bidi status "European number separator" which is a "weak" type. It will be seen by conforming programs when parsing the text for numbers. So the text seems to have (normal hebrew text) + (space) + (a number) + (normal left-to-right text) + (a number) + (space) + (normal hebrew text). I haven't followed all steps of the algorithm to their conclusion, but I think it's correct that only the "number" that follows immediately after the obvious left-to-right text also is in that left-to-right part of this rtl-text.

You are better off using \( \) instead of $ $ in LaTeX anyway and that will not look as strange. For me in Emacs

enter image description here

There is nothing inherently ltr about the parens so \( here is written right-to-left with a mirrored "(". Only the inside, x^2, is in ltr-mode. Having "\" to the right of the command would be normal, right, as in this where another command is added:

enter image description here

Normally the directionality of each paragraph is determined by how it starts. That can give strange results in a Hebrew LaTeX file where some lines start with (Hebrew) text and some start with (ltr) commands. So if \emph{} had been used for the first word instead that would change everything!

That can be fixed by inserting bidi control characters in the beginning of paragraphs to explicitly set how they should be. In Emacs you could also tell once and for all which directionality should be for the file by setting bidi-paragraph-direction. So I end with showing one short complete document as Emacs shows it without setting that variable and when setting it explicitly to show this text ltr or rtl.

enter image description here enter image description here enter image description here

pst
  • 4,674
1

This mainly is related to how an editor uses Left to Right Mark (LRM) unprintable Unicode character u+200E. In case that somebody knows Persian, I have addressed the issue and a solution with complete details in here. I will try to give a summary of that solution, which is for TeXstudio. The main idea is to use Trigger for calling macros. Triggers are indeed regular expressions.

Make sure your Bi-Di setting in TeXstudio is as follows. You can access this setting through Options -> Configure TeXstudio -> Adv. Editor -> Bi-Di.

enter image description here

Create the following two macros via Macros -> Edit Macros. The first one is for handling inline equations as shown below. You should type $.$ for writing an inline equation, then the macro is called automatically, and replaces that with LRM$$LRM and places the cursor at the middle of the dollar signs. Now, inline equations will look as you expect. Here, the Trigger is

\$\.\$

and the corresponding script is

%SCRIPT
editor.write("\u200E$$\u200E")
cursor.shift(-2)

enter image description here

The second one is for handling inline commands like \lr and \text.... As we have disabled automatic insertion of LRM character by TeXstudio, we should also take care of inline commands to look right. Here, I have set the Trigger to

\\((text.+)|(lr))\{.*\}

and the corresponding script is

%SCRIPT
editor.write("\u200E");
editor.write(triggerMatches[0]);
editor.write("\u200E");
var pos = cursor.columnNumber();
cursor.selectColumns(pos, pos + 1);
cursor.removeSelectedText();

enter image description here

You may find a better regular expression the covers more cases but this does the job for me. Now, as you type closing curly bracket, in for example \lr{RTL-text}, the macro is called and encloses the whole inline command with LRM so the result is LRM\lr{RTL-text}LRM. Notice that if you use the auto-completer, you should type } to trigger the macro. If you press to pass over the inserted } by the auto-completer then the macro is not triggered.

Now, everything should work fine as the following video shows. Here, I am typing in Persian.

enter image description here