Learn how computers handle mixed RTL and LTR text using the Unicode Bidirectional Algorithm.

Karim Benali
Senior frontend developer with 10+ years building RTL-first applications.
What happens when Arabic text contains English words? Or when a Hebrew sentence includes numbers? When left-to-right and right-to-left text appear together, things get complicated fast.
Consider this mixed text:
The word "مرحبا" means hello.The Arabic word "مرحبا" (marhaba) needs to be rendered right-to-left, while the surrounding English is left-to-right. How does a computer know which direction each character should flow?
The answer is the Unicode Bidirectional Algorithm (BiDi)—a sophisticated set of rules that governs how mixed-direction text is displayed. Understanding BiDi is essential for any developer working with multilingual text.
Writing systems have different inherent directions:
When text from different directional systems appears together, the rendering engine must determine:
Consider an Arabic sentence with an English brand name:
Right-to-left base direction:
← أنا أستخدم Microsoft Word يومياً ←This sentence should be read as:
The challenge is displaying this correctly while maintaining logical character order in memory.
The Unicode BiDi Algorithm (UBA), defined in Unicode Standard Annex #9, specifies exactly how to determine and render text direction. It's implemented in every modern browser, operating system, and text rendering engine.
Every Unicode character has a bidirectional class property. Main categories:
| Class | Name | Examples |
|---|---|---|
| L | Left-to-Right | A-Z, Latin letters |
| R | Right-to-Left | Hebrew letters |
| AL | Arabic Letter | Arabic letters |
| EN | European Number | 0-9 |
| AN | Arabic Number | ٠-٩ (Arabic-Indic) |
| ET | European Number Terminator | # $ % |
| ES | European Number Separator | + - |
| CS | Common Number Separator | , . : |
| NSM | Nonspacing Mark | Combining diacritics |
| BN | Boundary Neutral | Formatting characters |
| B | Paragraph Separator | Line breaks |
| S | Segment Separator | Tab |
| WS | Whitespace | Space |
| ON | Other Neutral | Most punctuation |
The algorithm assigns an embedding level to each character:
The base paragraph level is typically determined by the first strong directional character (L, R, or AL).
The BiDi algorithm consists of several phases:
Find the base direction by scanning for the first strong character:
P1. Split text into paragraphs
P2. Find first strong character (L, R, or AL)
P3. If L: paragraph level = 0 (LTR)
If R or AL: paragraph level = 1 (RTL)
If none found: use default paragraph directionProcess explicit directional formatting codes:
And the newer isolate controls:
Handle characters whose direction depends on context:
W1. Examine NSM (combining marks)
W2. Change EN (European number) to AN (Arabic number) after AL
W3. Change AL to R
W4. Handle separators between numbers
W5. Handle terminators around numbers
W6. Change remaining separators to ON
W7. Change EN to L when preceded by LHandle spaces, punctuation, and other neutral characters:
N1. Neutrals between same-direction characters take that direction
N2. Remaining neutrals take the embedding directionAdjust levels based on character types:
I1. For LTR levels: R → level+1, AN/EN → level+2
I2. For RTL levels: L/EN/AN → level+1Finally, reverse runs at each level:
L1. Reset whitespace levels
L2. Find highest level
L3. From highest to lowest, reverse each run at that level
L4. Result is visual orderLet's trace through: car means سيارة
Characters and types:
c-a-r- -m-e-a-n-s- -س-ي-ا-ر-ة
L L L WS L L L L L WS AL AL AL AL ALParagraph direction: First strong is c (L), so LTR (level 0)
Resolve levels:
c a r m e a n s س ي ا ر ة
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1Reorder: Reverse the RTL run (odd level):
Display: car means ةراي س → car means سيارة
(Arabic letters in reverse order = correct Arabic reading order)Use these attributes and characters:
<!-- Set base direction -->
<p dir="rtl">Arabic paragraph with English here</p>
<!-- Isolate embedded text -->
<p>The word <bdi>مرحبا</bdi> means hello.</p>
<!-- Override algorithm -->
<span dir="ltr">Force LTR direction</span>Insert these characters to control bidi behavior:
const LRM = '\u200E'; // Left-to-Right Mark
const RLM = '\u200F'; // Right-to-Left Mark
const LRE = '\u202A'; // Left-to-Right Embedding
const RLE = '\u202B'; // Right-to-Left Embedding
const PDF = '\u202C'; // Pop Directional Formatting
const LRO = '\u202D'; // Left-to-Right Override
const RLO = '\u202E'; // Right-to-Left Override
const LRI = '\u2066'; // Left-to-Right Isolate
const RLI = '\u2067'; // Right-to-Left Isolate
const FSI = '\u2068'; // First Strong Isolate
const PDI = '\u2069'; // Pop Directional Isolate
// Example: Ensure English in RTL context
const text = `مرحباً ${LRI}Microsoft${PDI} العالم`;<!-- Product name should stay LTR -->
<p dir="rtl">
أنا أستخدم <span dir="ltr">iPhone 15 Pro</span> كل يوم
</p>Email addresses should always be LTR:
<p dir="rtl">
البريد الإلكتروني: <bdi dir="ltr">user@example.com</bdi>
</p><p dir="rtl">
السرعة: <bdi dir="ltr">120 km/h</bdi>
</p>/* Unicode-bidi property */
.isolate {
unicode-bidi: isolate;
}
.embed {
unicode-bidi: embed;
}
.override {
unicode-bidi: bidi-override;
}
.plaintext {
unicode-bidi: plaintext; /* Use P rules only */
}Punctuation is "neutral," taking direction from context:
Problem: "Hello, world" in RTL context
Wrong: "Hello, world" ← punctuation moves
Right: "Hello, world" ← with proper isolationSolution: Use <bdi> or isolate controls.
Numbers read LTR even in RTL text:
Arabic: الرقم ٤٢ صحيح (The number 42 is correct)
Reading order: right-to-left, but 42 stays as "42" not "24"But be careful with ranges:
Problem: 10-20 in RTL
Wrong: 20-10 ← hyphen moves, numbers reversed
Right: 10-20 ← preserve with LTR isolationMaximum embedding depth is 125 levels. Exceeding this causes undefined behavior.
<!-- Don't nest too deeply! -->
<p dir="rtl">
<span dir="ltr">
<span dir="rtl">
<!-- Keep it simple -->
</span>
</span>
</p>Without isolation, directional text can "leak":
<!-- Problem -->
<p>User: مستخدم (3 new messages)</p>
<!-- The "(" may move next to "مستخدم" -->
<!-- Solution -->
<p>User: <bdi>مستخدم</bdi> (3 new messages)</p>Use these strings to test BiDi handling:
const testStrings = [
// Simple mixed
'Hello مرحبا World',
// Numbers
'The price is ٤٢ dollars',
// Punctuation
'Quote: "مرحبا بك"',
// Nested
'English (عربي (nested) نص) more',
// Challenging punctuation
'Item #123 - خاص (special)',
];Rendering should be consistent across:
BiDi is automatic but imperfect: The Unicode BiDi algorithm handles most cases but needs help with edge cases.
Use HTML controls: dir attribute and <bdi> element provide semantic, accessible solutions.
Isolate, don't override: <bdi> and isolate controls are safer than embeddings or overrides.
Test with real content: Synthetic tests miss real-world complexity.
Neutral characters need context: Punctuation, spaces, and numbers behave based on surrounding text direction.
Learn how computers handle mixed RTL and LTR text using the Unicode Bidirectional Algorithm.

Karim Benali
Senior frontend developer with 10+ years building RTL-first applications.
What happens when Arabic text contains English words? Or when a Hebrew sentence includes numbers? When left-to-right and right-to-left text appear together, things get complicated fast.
Consider this mixed text:
The word "مرحبا" means hello.The Arabic word "مرحبا" (marhaba) needs to be rendered right-to-left, while the surrounding English is left-to-right. How does a computer know which direction each character should flow?
The answer is the Unicode Bidirectional Algorithm (BiDi)—a sophisticated set of rules that governs how mixed-direction text is displayed. Understanding BiDi is essential for any developer working with multilingual text.
Writing systems have different inherent directions:
When text from different directional systems appears together, the rendering engine must determine:
Consider an Arabic sentence with an English brand name:
Right-to-left base direction:
← أنا أستخدم Microsoft Word يومياً ←This sentence should be read as:
The challenge is displaying this correctly while maintaining logical character order in memory.
The Unicode BiDi Algorithm (UBA), defined in Unicode Standard Annex #9, specifies exactly how to determine and render text direction. It's implemented in every modern browser, operating system, and text rendering engine.
Every Unicode character has a bidirectional class property. Main categories:
| Class | Name | Examples |
|---|---|---|
| L | Left-to-Right | A-Z, Latin letters |
| R | Right-to-Left | Hebrew letters |
| AL | Arabic Letter | Arabic letters |
| EN | European Number | 0-9 |
| AN | Arabic Number | ٠-٩ (Arabic-Indic) |
| ET | European Number Terminator | # $ % |
| ES | European Number Separator | + - |
| CS | Common Number Separator | , . : |
| NSM | Nonspacing Mark | Combining diacritics |
| BN | Boundary Neutral | Formatting characters |
| B | Paragraph Separator | Line breaks |
| S | Segment Separator | Tab |
| WS | Whitespace | Space |
| ON | Other Neutral | Most punctuation |
The algorithm assigns an embedding level to each character:
The base paragraph level is typically determined by the first strong directional character (L, R, or AL).
The BiDi algorithm consists of several phases:
Find the base direction by scanning for the first strong character:
P1. Split text into paragraphs
P2. Find first strong character (L, R, or AL)
P3. If L: paragraph level = 0 (LTR)
If R or AL: paragraph level = 1 (RTL)
If none found: use default paragraph directionProcess explicit directional formatting codes:
And the newer isolate controls:
Handle characters whose direction depends on context:
W1. Examine NSM (combining marks)
W2. Change EN (European number) to AN (Arabic number) after AL
W3. Change AL to R
W4. Handle separators between numbers
W5. Handle terminators around numbers
W6. Change remaining separators to ON
W7. Change EN to L when preceded by LHandle spaces, punctuation, and other neutral characters:
N1. Neutrals between same-direction characters take that direction
N2. Remaining neutrals take the embedding directionAdjust levels based on character types:
I1. For LTR levels: R → level+1, AN/EN → level+2
I2. For RTL levels: L/EN/AN → level+1Finally, reverse runs at each level:
L1. Reset whitespace levels
L2. Find highest level
L3. From highest to lowest, reverse each run at that level
L4. Result is visual orderLet's trace through: car means سيارة
Characters and types:
c-a-r- -m-e-a-n-s- -س-ي-ا-ر-ة
L L L WS L L L L L WS AL AL AL AL ALParagraph direction: First strong is c (L), so LTR (level 0)
Resolve levels:
c a r m e a n s س ي ا ر ة
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1Reorder: Reverse the RTL run (odd level):
Display: car means ةراي س → car means سيارة
(Arabic letters in reverse order = correct Arabic reading order)Use these attributes and characters:
<!-- Set base direction -->
<p dir="rtl">Arabic paragraph with English here</p>
<!-- Isolate embedded text -->
<p>The word <bdi>مرحبا</bdi> means hello.</p>
<!-- Override algorithm -->
<span dir="ltr">Force LTR direction</span>Insert these characters to control bidi behavior:
const LRM = '\u200E'; // Left-to-Right Mark
const RLM = '\u200F'; // Right-to-Left Mark
const LRE = '\u202A'; // Left-to-Right Embedding
const RLE = '\u202B'; // Right-to-Left Embedding
const PDF = '\u202C'; // Pop Directional Formatting
const LRO = '\u202D'; // Left-to-Right Override
const RLO = '\u202E'; // Right-to-Left Override
const LRI = '\u2066'; // Left-to-Right Isolate
const RLI = '\u2067'; // Right-to-Left Isolate
const FSI = '\u2068'; // First Strong Isolate
const PDI = '\u2069'; // Pop Directional Isolate
// Example: Ensure English in RTL context
const text = `مرحباً ${LRI}Microsoft${PDI} العالم`;<!-- Product name should stay LTR -->
<p dir="rtl">
أنا أستخدم <span dir="ltr">iPhone 15 Pro</span> كل يوم
</p>Email addresses should always be LTR:
<p dir="rtl">
البريد الإلكتروني: <bdi dir="ltr">user@example.com</bdi>
</p><p dir="rtl">
السرعة: <bdi dir="ltr">120 km/h</bdi>
</p>/* Unicode-bidi property */
.isolate {
unicode-bidi: isolate;
}
.embed {
unicode-bidi: embed;
}
.override {
unicode-bidi: bidi-override;
}
.plaintext {
unicode-bidi: plaintext; /* Use P rules only */
}Punctuation is "neutral," taking direction from context:
Problem: "Hello, world" in RTL context
Wrong: "Hello, world" ← punctuation moves
Right: "Hello, world" ← with proper isolationSolution: Use <bdi> or isolate controls.
Numbers read LTR even in RTL text:
Arabic: الرقم ٤٢ صحيح (The number 42 is correct)
Reading order: right-to-left, but 42 stays as "42" not "24"But be careful with ranges:
Problem: 10-20 in RTL
Wrong: 20-10 ← hyphen moves, numbers reversed
Right: 10-20 ← preserve with LTR isolationMaximum embedding depth is 125 levels. Exceeding this causes undefined behavior.
<!-- Don't nest too deeply! -->
<p dir="rtl">
<span dir="ltr">
<span dir="rtl">
<!-- Keep it simple -->
</span>
</span>
</p>Without isolation, directional text can "leak":
<!-- Problem -->
<p>User: مستخدم (3 new messages)</p>
<!-- The "(" may move next to "مستخدم" -->
<!-- Solution -->
<p>User: <bdi>مستخدم</bdi> (3 new messages)</p>Use these strings to test BiDi handling:
const testStrings = [
// Simple mixed
'Hello مرحبا World',
// Numbers
'The price is ٤٢ dollars',
// Punctuation
'Quote: "مرحبا بك"',
// Nested
'English (عربي (nested) نص) more',
// Challenging punctuation
'Item #123 - خاص (special)',
];Rendering should be consistent across:
BiDi is automatic but imperfect: The Unicode BiDi algorithm handles most cases but needs help with edge cases.
Use HTML controls: dir attribute and <bdi> element provide semantic, accessible solutions.
Isolate, don't override: <bdi> and isolate controls are safer than embeddings or overrides.
Test with real content: Synthetic tests miss real-world complexity.
Neutral characters need context: Punctuation, spaces, and numbers behave based on surrounding text direction.