Configure clipboard behavior



This guide describes how to configure the different translations used when pasting into FontoXML from different sources.

FontoXML supports pasting content from various sources, including FontoXML documents, plain text and HTML.

Introduction

Fonto can interact with the browser's clipboard to read pasted content from it and to write copied content to it. The data (or content) found on the clipboard can be represented in a few different formats. The simplest format, and the one that is present virtually every past action, is the text/plain format. This format contains a plain text representation of the copied content.

A somewhat more advanced format is the text/html format. This format contains the copied content, represented as HTML. This format allows for copying and pasting content containing structure and formatting from one application to another. The HTML representation is only available when content is copied from an application that is capable of converting its content to an HTML representation.

Then there is the application/xml format. This format is used by Fonto when you are copying and pasting content within Fonto itself.

Copying and pasting plain text

Some applications only export plain text to the clipboard. Pasting plain text is the fallback when pasting XML and HTML fails.

Naturally, plain text does not contain any markup like bold text formatting or structures such as tables. The paste pipeline can be configured to extract as much markup as it possibly can.

A common source of plain text are PDF documents. These can not be imported directly, because soft wraps are made hard by splitting a paragraphs into multiple lines, separated with newlines, carriage returns, and the likes. Any markup and layout information in the PDF is unfortunately not preserved by commonly used PDF viewers when content is copied to the clipboard.

Using the default

By default, we use a very basic configuration, which fits copying from most PDF documents. This configuration is able to parse the soft-wrapped lines exported by most PDF renderers into single paragraphs, and paste them.

This default configuration does the following:

  • Lines containing words are given the Paragraph flag.
  • Lines not ending with a β€˜.’ are joined with the following line, to remove soft-wraps.
  • Paragraphs are outputted as the configurable paragraph element.
  • If that fails, they are outputted as plain text.

For DITA editors, the default configuration is completed by configuring the name of elements representing paragraphs. For other editors, this can be configured like this inside config/configuration.js:

configuration.js
import configurationManager from 'fontoxml-configuration/src/configurationManager.js';

configurationManager.set(
    'paragraph-node-name-for-pasting' /*Your paragraph counterpart element name here*/
);

Configuring for your source documents

The default is very basic and makes close to no assumptions on the pasted content. Most real world PDFs contain more information, like recognizable lists, titles, and word wrap using hyphenation. The paste pipeline is very powerful and can use this information to further annotate pasted content, which can save an author time.

Basic structure

The paste pipeline consist of two parts: importing content to an annotated list of chunks, and outputting it again.

The intermediate structure is annotated using flags. These flags describe a possible trait of the chunk. Examples of these are (ordered) list items, titles and paragraphs. A single chunk can have multiple flags, to signal a list item can be converted back to a paragraph if a list is not allowed in the current context.

Example configuration

The default configuration is a good starting point. It looks like this:

import namespaceManager from 'fontoxml-dom-namespaces/src/namespaceManager.js';
import ImportStream from 'fontoxml-pipelined-importer/src/ImportStream.js';
import createChunkOutputProcessor from 'fontoxml-pipelined-importer/src/createChunkOutputProcessor.js';
import createPlainTextOutputProcessor from 'fontoxml-pipelined-importer/src/createPlainTextOutputProcessor.js';
import createRegexpAnnotateInputProcessor from 'fontoxml-pipelined-importer/src/createRegexpAnnotateInputProcessor.js';
import createRemoveWordWrapInputProcessor from 'fontoxml-pipelined-importer/src/createRemoveWordWrapInputProcessor.js';

function createParagraph(blueprint, positionInParagraph, document) {
    var paragraphElement = namespaceManager.createElement(
        document /* Your paragraph counterpart element name here */
    );
    positionInParagraph.setAtBegin(paragraphElement);

    return paragraphElement;
}

var inputProcessors = [
    // Everything is a paragraph
    createRegexpAnnotateInputProcessor(/\w/, 'Paragraph'),
    // Remove line wraps until we see a '.', join using a ' ' character
    createRemoveWordWrapInputProcessor(null, /\.$/, ' ')
];

var outputProcessors = [
    // Prefer to make paragraphs
    createChunkOutputProcessor('Paragraph', createParagraph),
    // If it fails, plain text is better then nothing
    createPlainTextOutputProcessor()
];

export default new ImportStream(inputProcessors, outputProcessors);

Wiring it all together

  1. Open install.js.
  2. Import the new pasteImportStream.
  3. Import fontoxml-clipboard/src/pasteImportStreamManager.js.
  4. Call pasteImportStreamManager.setPasteImportStream with the custom pasteImportStream.
  5. Test the configuration by pasting some text, from a source that only supplies plain text (a terminal or notepad for instance).

Example configuration: APA style

Most PDF documents adhere to a style guide, defining for instance how lists are written, what kind of bullets are used, etc. This information can be used to provide an optimum fit for the pipeline. For example, the APA style guide can be leveraged like this:

Lists

The APA style guide defines the following rules for numbered lists:

  • A number, indicating the index.
  • A full stop and two spaces, delimiting the index.
  • Some text, wrapping into multiple lines if needed.
  • A full stop, to end the list item.

We can use this information to recognize APA lists like this:

var inputProcessors = [
	createRegexpAnnotateInputProcessor(/^\s*\(\d+\)\s\s\.\w/, 'NumericalListItem'),
	createCleanChunksInputProcessor(['NumericalListItem'],
	function cleanListNumber (chunk) {
		// (optional indenting cruft)(digits)(.  )(actual contents)
		var match = chunk.text.match(/^\s*(\d+\.\s\s*(.*)$/);
		if (!match) { return; }
		chunk.text = match[2];
	}),
	// Join list items until we see a full stop
	createRemoveWordWrapInputProcessor(['NumericalListItem'], /\./, ' ')
];

Titles

The APA style guide defines titles should be cased like this:

  • Capitalize the first word of the title/heading and of any subtitle/subheading
  • Capitalize all β€œmajor” words (nouns, verbs, adjectives, adverbs, and pronouns) in the title/heading, including the second part of hyphenated major words (e.g., Self-Report not Self-report)
  • Capitalize all words of four letters or more.

Because the risk of normal paragraphs looking similar to titles is quite small, we can use this information to our advantage:

var inputProcessors = [
	createRegexpAnnotateInputProcessor(/[A-Z][a-z\-]*\s(([A-Z][a-z\-]+)|[a-z]{0..3})/, 'Title'),
];

Of course, PDF is not the only source of plain text. For example, terminal outputs adhere to some implicit style. Be creative and even these things can be pasted with minimal loss.

Processors

Input processors

The following input processors are supplied by the platform:

By adhering to the PipelinedImporterInputProcessor interface, it is possible to write these processors on the application level.

Output processors

The following output processors are supplied by the platform:

By adhering to the PipelinedImporterOutputProcessor interface, it is possible to write these processors on the application level.

Copying and pasting HTML

Copying and pasting HTML is a bit more complicated than copying and pasting plain text. HTML can not be pasted directly into the HTML DOM. Therefore, pivot models are used to translate HTML from and to XML. These work in the same fashion as a pivot language used in translating human languages. Instead of having to write conversions between all possible languages, only a single two-way translation to and from the pivot model is needed.

Some applications (like Word, Excel and others) will place HTML on the clipboard when one copies from it. When an author pastes this content in Fonto, the following configuration will be used to convert this to a structure that can be pasted.

Pivot nodes

The different types of pivot nodes are heavily inspired by our CVK families:

Block Can contain text and inlines. Like HTML p.
Frame Contains blocks and groups, like HTML div or section.
Group Contains a collection of frames, like HTML ul or ol.
Inline Contains other inlines and text nodes. Analogue to HTML b or strong.
Table A table. Set the isTable property to true in your configuration to use a table-flow add-on to translate between table formats.
Text Leaf node, contains text directly.

Flags

Flags can be used to further retain information of pasted HTML fragments. Using these flags, bold text pasted from most other editors will remain bold. The following flags are available:

Flag name Present of pivot nodes of type
underline

inline

bold inline
italic inline
small-caps inline
strikethrough inline
subscript inline
superscript inline
unordered-list group
ordered-list group

Types of pivot transformers

Pivot nodes are created from the given DOM (both HTML and XML) using a set of configurable transformers. These transformers are highly configurable as not all pivot nodes have got a counterpart in all different schemas and because pivot node trees often do not align perfectly with the DOM tree.

These transformers handle both transformation from DOM to pivot nodes and vice versa. The DOM to pivot nodes transformation is used to transform HTML DOMs, but will be used to transform between XML DOMs in the future.

A transformer is configured using a JavaScript object. Refer to the API documentation of TransformerConfiguration for more information on this object. These configuration objects should refer to elements using the qualifiedName property. The NamespaceManager will be used to resolve these qualified names. See Configure namespaces for more information on this subject.

Examples

Matching inlines

In this example, any strong XML element will be converted to an inline pivot node. When a pivot model is being transformed to XML, any occurrence of an inline pivot node will be converted to a strong XML element.

{ qualifiedName: 'strong', type: 'inline' }

Matching inlines using flags

In this example, any i XML element will be converted to an inline pivot node with the italic flag set. When a pivot model is being transformed to XML, only those inline pivot nodes with the italic flag set will be converted to an i XML element. Any other inline pivot node will be ignored by this particular transformer.

{ qualifiedName: 'i', type: 'inline', 'flags': ['italic'] }

This example can be extended to convert multiple types of inline formatting to a pivot node. In the extended example, the i and u XML elements will be transformed in the same fashion as in the original example. The b XML element in this example will be converted to an inline pivot node without any flags. This configuration also has the effect that any inline pivot node, regardless of its flags, will be converted to a b XML element.

{ qualifiedName: 'i', type: 'inline', 'flags': ['italic'] },
{ qualifiedName: 'u', type: 'inline', 'flags': ['underline']},
{ qualifiedName: 'b', type: 'inline'}

Matching definition list-like structures

In this example, a dl XML element will be converted to a group pivot node. This particular configuration expects the dl XML element to have one or more dd and/or dt XML child elements. Both these elements are converted to frame pivot nodes. When the qualifiedName property for a transformer is set to null, the transformer will be ignored for converting XML to pivot models. The other way around, it will transform block pivot node to nothing. This property can be used to iterate over unwanted pivot models.

{
	qualifiedName: 'dl',
	type: 'group',
	contents: [
		{ qualifiedName: 'dd', type: 'frame', contents: [{ qualifiedName: null, type: 'block' }] },
		{ qualifiedName: 'dt', type: 'frame', contents: [{ qualifiedName: null, type: 'block' }] }
	]
}

In this example, the contents property define which nodes may be found as childnodes in a dl XML element. When the contents property is not defined, the full list of configured transformers is used.

Matching tables

Matching tables is relatively easy. In this example, a table XML element will be converted to a special table pivot node. This transformer requires a cellQualifiedName and a isTable property to be set. For tables that use a "table figure" element, such as the CALS <table> element, we recommend using a selector for this element rather than for the descendant (<tgroup> in CALS) that contains the actual tabular structure:

{
	qualifiedName: 'table',
	cellQualifiedName: 'td',
	isTable: true
}

Contextual Transformers

To further configure transformers, contextual ones can be used. These transformers only trigger in a specified context. They can be used to, for instance, paste list items as answers in an educational context.

Usage

Use case: do not attempt to insert paragraphs under codeblocks. They won't fit because a codeblock only allows plain text.

import pivotModelTransformerManager from 'fontoxml-pivot-model/src/pivotModelTransformerManager.js';

export default function configureSxModule(sxModule) {
    // Ignore blocks under code blocks, so that we do not to insert paragraphs under <codeblock>
    pivotModelTransformerManager.registerContextualTransformer(
        sxModule,
        'ancestor-or-self::codeblock',
        { qualifiedName: null, type: 'block' }
    );
}

The selector providing the context targets a node, so the ancestor-or-self axis is needed to be able to provide this transformer if the cursor is, for instance, in a text node in the codeblock.

Building your own pivot model translation

The pivot model architecture and the HTML configuration is included in the platform. Follow these steps to configure pivot model translations for your schema.

  1. Open a configureSxModule file
  2. Include the pivotTransformerManager
  3. Create a pivot model translation configuration using the examples above.
  4. Call the pivotTransformerManager with the sxModule and the pivot model configuration

The following is the configuration used by the DITA sandbox editor. Though it is DITA, it should be easily adaptable to other schemas:

configureSxModule.js
import pivotModelTransformerManager from 'fontoxml-pivot-model/src/pivotModelTransformerManager.js';

export default function configureSxModule(sxModule) {
	sxModule.markAsAddon();

	pivotModelTransformerManager.registerTransformers(sxModule, [
		{
			qualifiedName: 'table',
			isTable: true,
			cellQualifiedName: 'entry'
		},
		{
			qualifiedName: 'b',
			flags: ['bold'],
			type: 'inline'
		},
		{
			qualifiedName: 'u',
			flags: ['underline'],
			type: 'inline'
		},
		{
			qualifiedName: 'i',
			flags: ['italic'],
			type: 'inline'
		},
		// Bold is the preferred inline
		{
			qualifiedName: 'b',
			type: 'inline'
		},
		{
			qualifiedName: 'p',
			type: 'block'
		},
		{
			qualifiedName: null,
			type: 'frame'
		},
		{
			qualifiedName: 'ul',
			type: 'group',
			flags: ['unordered-list'],
			contents: [
				{
					qualifiedName: 'li',
					type: 'frame'
				}
			]
		},
		{
			qualifiedName: 'ol',
			type: 'group',
			flags: ['ordered-list'],
			contents: [
				{
					qualifiedName: 'li',
					type: 'frame'
				}
			]
		},
		{
			qualifiedName: 'ul',
			type: 'group',
			contents: [
				{
					qualifiedName: 'li',
					type: 'frame'
				}
			]
		}
	]);

	// Ignore blocks under codeBlocks
	pivotModelTransformerManager.registerContextualTransformer(
		sxModule,
		'ancestor-or-self::codeblock',
		{
			qualifiedName: null,
			type: 'block'
		}
	);
}

Pasting from Fonto

Internal clipboard

Besides pasting plain text or HTML it is also possible to paste XML fragments across and inside FontoXML instances. This attempts to insert the XML on the clipboard at the current cursor position, manipulating the XML to make it fit.

When pasting XML attributes and their values are preserved. This may break referential integrity, or introduce duplicate ids across different documents. If this creates unwanted situations, additional configuration is necessary to edit the pasted attributes. This can be configured using the fontoxml-clipboard/registerPastedNodesFilter API. See registerPastedNodesFilter for more information on this functionality.

This method accepts a callback which is executed during a paste action from the internal clipboard. The callback is passed nodes which were copied and a blueprint. The API allows you to edit the nodes, using the blueprint, and expects you to return the new top level nodes. By removing a node you prevent it from being pasted. Care should be taken to not create a schema-invalid construction, this would make the paste fail and fall back to pasting plain text, losing a lot of information.

Example

The id attribute is referenced using an idRef attribute on other nodes. Blindly pasting nodes with these attributes can introduce unexpected references, even more so if the id attribute is not globally unique. Pasted content should be provided with new ids. References to these nodes should be repaired.

import registerPastedNodesFilter from 'fontoxml-clipboard/src/registerPastedNodesFilter.js';
import evaluateXPathToNodes from 'fontoxml-selectors/src/evaluateXPathToNodes.js';

registerPastedNodesFilter(function(topLevelNodes, blueprint) {
	topLevelNodes.forEach(function(node) {
		blueprint.setAttribute(nodeWithId, 'id', newId);
	});

	evaluateXPathToNodes('.//*[@id]', node, blueprint).forEach(function(nodeWithId) {
		var oldId = blueprint.getAttribute(nodeWithId, 'id');
		var newId = ''; /* generate a new, unique Id */

		blueprint.setAttribute(nodeWithId, 'id', newId);
		evaluateXPathToNodes('.//*[@idRef = $id]', node, blueprint, { id: oldId }).forEach(function(
			nodeWithRef
		) {
			blueprint.setAttribute(nodeWithId, 'idRef', newId);
		});
	});

	return topLevelNodes;
});

Controlling what is copied

Fonto only includes the content within the current selection when copying or cutting content. The nearest block element or the the closest node to contain the entire selection, whichever is the outermost element, will be included.

This behaviour can be controlled by configuring addParentOnCopy and addDescendantsOnCopy properties for an element. These properties are configured automatically for lists and tables. You can use addParentOnCopy to ensure that the parent element of a given node is included on copy. You can use addDescendantOnCopy to ensure all descendants of a given node are included on copy.

Example: Including a parent element

Given the following DITA snippet:

<fig>
    <image />
</fig>

Assume that the schema does not allow an <image> element to exist on its own, it always needs to be wrapped in a <fig> element. Without additional configuration, Fonto allows for copying only the <image> element, but a user won't be able to paste it anywhere due to schema limitations.

Using the addParentOnCopy property, you can force Fonto to always include the <image> element's parent node:

configureProperties(sxModule, 'self::image[parent::fig]', {
    addParentOnCopy: true
});

The selector used in the example above is more specific than only selecting any <image> element. This is done to ensure that the parent of an <image> element is only included when the <image> element has a <fig> element parent.

Example: Including all descendants

Given the following TEI snippet:

<person><firstname>Foo</firstname> <lastname>Bar</lastname></person>

Assume that the schema does not allow either the <firstname> or the <lastname> to exist on their own. Also, the <person> element must contain both these elements. Without additional configuration, Fonto allows for copying only the <firstname> or the <lastname> elements, but the user won't be able to paste them anywhere due to schema limitations.

Using both the addParentOnCopy and addDescendantsOnCopy, you can force Fonto to always include the <person> parent element and all its descendants when copying only the <firstname> or <lastname> elements. This example also shows how these properties "chain" together; When copying the <firstname> element, it will include its parent <person> element which in turn will include all of its descendants, including the <lastname> element.

configureProperties(sxModule, 'self::person', {
    addDescendantsOnCopy: true
});
configureProperties(sxModule, 'self::firstname[parent::person]', {
    addParentOnCopy: true
});
configureProperties(sxModule, 'self::lastname[parent::person]', {
    addParentOnCopy: true
});
Was this page helpful?