RegexAnnotator

The RegexAnnotator can be used to perform analysis based on a .NET regular expression.

Configuration

Attribute

Description

Required

Default

annotationTypeId

The type identifier to set on the added annotations.

The attribute is not required because you have the option to map capture groups to annotations instead of annotating the complete match.

No

N/A

pattern

The .NET regular expression that will be used to perform as the search pattern.

Yes

N/A

ignoreCase

Should upper case and lower case text be treated as equivalent (case insensitivity)?

No

False

multiline

Enables Multiline mode. This configures the regular expression engine to interpret the ^ and $ language elements to match the beginning and end of a line, instead of the beginning and end of the input string.

The regular expression engine does not, however, match the carriage return/line feed character combination. To successfully match them, use the subexpression \r?$ instead of just $.

No

False

Make sure to register the given annotationTypeId as a custom annotation inside the editor.

Capture groups

You can use matched subexpressions or named matched subexpressions to capture the contents of a subexpression. The captured content can be found in the metadata object or can be mapped to its own annotation.

Matched Subexpressions

When using matched subexpressions, a number will be given to each group, starting with 1 and being incremented for each following group.

Expression: ([a-zA-Z0-9_.+-]+)@(fontoxml\.com)

Metadata:

JSON

{
    "match": "contact@fontoxml.com",
    "captures": {
        "1": [
            {
                "value": "contact"
            }
        ],
        "2": [
            {
                "value": "fontoxml.com"
            }
        ]
    }
}

Named Matched Subexpressions

When using named matched subexpressions, the given group name is used as a key in the captures object.

Expression: (?<username>[a-zA-Z0-9_.+-]+)@(?<domain>fontoxml\.com)

Metadata:

JSON

{
    "match": "contact@fontoxml.com",
    "captures": {
        "username": [
            {
                "value": "contact"
            }
        ],
        "domain": [
            {
                "value": "fontoxml.com"
            }
        ]
    }
}

Map a Subexpression to its own annotation

You have the ability to map subexpressions to their own annotations using the configuration option mapCaptureGroup inside the regexAnnotator.

Attribute

Description

Required

Default

name

The capture group name.

Yes

N/A

annotationTypeId

The type identifier to set on the added annotations.

Yes

N/A

Produces

Annotation types

The RegexAnnotator produces annotation types as configured with the annotationTypeId attribute in the regexAnnotator itself and/or its mapCaptureGroups.

Metadata

JSON

{
	"match": "[The captured substring]",
	"captures": {}
}

Example configuration

The following example shows a RegexAnnotator that annotates fontoxml.com email addresses. It also captures the local part of the email address and annotates them separately.

Note that the pattern is XML encoded.

XML

<regexAnnotator annotationTypeId="fontoxml-email" pattern="(?&lt;localname&gt;[a-zA-Z0-9_.+-])+@fontoxml\.com">
	 <mapCaptureGroup name="localname" annotationTypeId="fontoxml-localname"/>
</regexAnnotator>