Whitespace Handling

This concept page describes how Fonto Editor handles whitespace characters in XML content.

Historically, Fonto Editor has made no assumptions regarding the relevance of whitespace characters in XML documents. When any XML content is loaded in Fonto Editor, any newlines, tabs and spaces are visualized as such. We recognize that this can make interop with other desktop XML editors more difficult, especially with source-level editors that pretty-print the XML source for readability. In Fonto Editor 7.10 we aim to improve this situation. While there is no fundamental specification that governs how whitespace in XML should work, there are multiple specifications that have similar (but not identical) semantics. The de facto standards include the HTML whitespace collapsing rules, which in themselves are not unambiguously defined across multiple standards and implementations, and partial definitions such as the behavior imposed by the xml:space attribute. Taking these into account, we have defined a notion of "dangerous whitespace", that is, whitespace that is at risk of "disappearing" or being considered insignificant in one of these common specifications.

The basic idea is that the CMS should take care to never send content containing such dangerous whitespace to Fonto Editor. In turn, Fonto Editor will make sure to not allow the user to ever insert such whitespace.

Dangerous whitespace

Fonto defines whitespace as tab, line feed, carriage return and space (code points 0x9, 0xa, 0xd and 0x20 respectively). This definition is based on the one used by XML for normalizing attribute values, and is also used when validating XML Schema simple types. Note that some standards define additional whitespace characters. For example, HTML also considers the form feed (0xc) character to be whitespace. If you depend on such cases, please let us know. We may consider adding the ability to extend this set in the future.

Dangerous whitespace is defined as any not explicitly significant whitespace:

  • at the start of an element

  • at the end of an element

  • any whitespace sequences of 2 or more characters in other places (i.e., between non-whitespace characters and child elements)

Normalization rules

Fonto defines three modes for handling whitespace in XML content:

  • Ignore - all whitespace is removedIn Fonto, this always happens for all whitespace text nodes in positions where the schema does not allow any text. It does not make sense to allow whitespace to be entered in those positions, so Fonto Editor will never allow the cursor to be placed in such positions.

  • Normalize - only non-dangerous whitespace is allowed (as single spaces between either non-whitespace characters or child elements)Any spaces at the start and end of each element are removed. Consecutive spaces and sequences of one or more other whitespace characters are replaced by a single space. We expect the CMS to apply this normalization before the document is sent to Fonto Editor. When editing, Fonto will ensure this invariant is maintained.

  • Preserve - all whitespace is keptAll white space is inserted as-is. This was the editing behavior in Fonto Editor prior to version 7.10, and will still be used for appropriate elements (see below).

In Fonto Editor, the whitespace mode only affects editing. Fonto Editor will continue to render any whitespace in the XML document (independent of the mode). This means that, if the CMS fails to remove dangerous whitespace, Fonto Editor will only do so when the content in question is touched by the author.

The whitespace mode for a given text node that is part of the document is determined as follows:

  • If the parent element does not support text, whitespace is ignored.

  • If the whitespace CVK property was configured for a selector matching the parent element, return that value.

  • If the whiteSpace facet for the simple type (or a base type) of the parent element, if any, is set to “preserve”, return preserve. Please note that, in accordance with the XML Schema specification, the whiteSpace facet defaults to "preserve" for the xs:string simple type.

  • Find the closest inclusive ancestor of the parent element that either has a configured whitespace CVK property, or otherwise provides a value for the xml:space attribute (either set in the current document, or a default value provided by the schema). If there is such an ancestor, use (as applicable) the overridden value, normalize if xml:space=”default”, or preserve if xml:space=”preserve”.

  • Otherwise, normalize all whitespace in the element.

Families supporting the newline break token automatically set the whitespace CVK property to “preserve” when break tokens are used.

Editing experience

To allow uninterrupted typing at the end of elements, Fonto Editor uses a “virtual space” that is only inserted when typing another character makes it possible to do so without breaking the invariant. It is not currently possible to enter tabs, line feeds or carriage return characters in Fonto Editor using the keyboard.

As the boundaries of elements configured using the inline formatting family are not visible, nor very significant, Fonto Editor can move whitespace into and out of such elements as necessary to maintain the whitespace invariant. This includes merging adjacent elements only separated by whitespace.

Configuring whitespace handling

By default, Fonto Editor will automatically start normalizing whitespace in content that is edited in order to not insert any whitespace that is considered dangerous.

In certain elements, such as code blocks, all whitespace is significant, and should therefore not be removed. Whether to apply whitespace normalization is determined automatically based on the document and schema information. In particular, we will consider the whitespace facet of the schema’s simple types, as well as the xml:space attribute (either the actual value in the XML or the default value as defined by the schema).

We provide a number of new APIs for more control over this behavior:

  • A new CVK property, for use in configureProperties etc, called whitespace.This can be set to "auto" (the default), "normalize" or "preserve", and is used to manually override the whitespace mode for text in a specific element and its descendants if the schema does not provide enough information for those elements. In the case of descendants, the closest ancestor which provides any value (configuration, content or schema) wins. If possible, we recommend expressing significant whitespace in the schema (e.g., using a default value for xml:space) over the use of this property, as that will allow other XML tooling to benefit from the same information. Families that support newline break tokens automatically set whitespace to “preserve” when configured to use newline break tokens. Code Example:


    configureProperties(sxModule, 'self::p', { whitespace: 'normalize' });
  • An operation normalize-whitespace This can be used to manually clean up whitespace in a given subtree of the DOM when the CMS does not apply filtering. This operation accepts only a contextNodeId for the root of the subtree, and follows the same logic as the automatic cleanup, except on all nodes rather than just changed ones. If possible, we recommend performing whitespace filtering on the CMS, in order to avoid unnecessary changes being introduced that would show up in environments like Fonto Document History.

Filtering Whitespaces in the CMS

If you are not using external tools and only use Fonto Editor to edit XML, then the CMS does not have to be configured to apply the normalization rules. As the document has only been modified or viewed inside the editor, the document will already follow the whitespace normalization rules.

If you are using external tools that modify the XML document, then the CMS has to apply the normalization rules before sending the document to Fonto Editor, Fonto Document History or Fonto Feedback. If the CMS does not apply the normalization rules, these products will think that there are changes to the document, as the editor will apply the normalization rules when content is edited.

In the following API, always apply the normalization rules before sending it to Fonto products:

GET /document

GET /document/revision

GET /document/compare