Line Break Transformation Rules v1

Line Break Transformation Rules

When line breaks are collapsible, they are transformed into either a space (U+0020), a zero-width space (U+200B), or no character depending the script of the first character on each side of the line break. The script of each character is determined by Unicode [[UNICODE]]. Characters such as punctuation that belong to the COMMON and INHERITED scripts are ignored; the next character is examined instead.

If a character on either side of the line break belongs to a script in which the space character (U+0020) is used as a word separator, then the line break is converted to a space (U+0020). Examples of such scripts include Latin, Arabic, and Hangul
Otherwise, if a character on either side of the line break belongs to a script (other than Han, Hiragana, and Katakana) in which there is no visible word separator, then the line break is converted to a zero-width space (U+200B). Examples of such scripts include Thai and Khmer.
Otherwise, if a character on either side of the line break belongs to the Han, Hiragana, or Katakana scripts, in which there is no word separator, then the line break is removed.
Otherwise, the line break is converted to a space (U+0020).