Dictionary Annotator
The Dictionary
is an annotator that can be used for word matching. It will load one of the supported dictionary formats, either from a file or HTTP resource, and annotates all occurrences of the patterns. Fonto
is case sensitive when searching for dictionary occurrences.
Patterns are strings consisting of one or more characters. Note that wildcard or regular expressions are NOT supported by this annotator. Matching is done on whole words, where a word boundary is based on the following characters [A-Za-z0-9], or more specifically the characters as supported by the .NET Char.isLetterOrDigit() method.
Configuration
The Dictionary
must be configured with a source and format.
Source
The dictionary source can be one of the following, and have configuration attributes themselves.
file Source
The file
loads the dictionary from a file on disk. The file will be automatically monitored for changes.
|
The relative path to the dictionary file, relative to the |
Yes |
N/A |
Note that when using FDT to start Fonto Content Quality, your dictionary file is being copied to the Docker image and that copy is being watched instead of your local dictionary file. Local changes will thus not be detected while running.
http Source
The http
loads the dictionary using a HTTP request. It will periodically check for changes.
|
The url from which to request the dictionary. See the endpoint API definition below. This attribute also supports environment variables. |
Yes |
N/A |
Format
The dictionary format can be one of the following, and have configuration attributes themselves. The dictionaries are case sensitive.
word List Format
The word list format uses a plain text file which consists of one pattern per line. A line is defined as one or more characters followed by a line end ("\n", "\r", or "\r\n").
|
The type identifier to set on the added annotations. |
Yes |
N/A |
Word list file example:
Other
foo
bar
baz
Annotation metadata format:
JavaScript
{
"pattern": "The pattern as found in the dictionary file, e.g. foo"
}
solr Synonym Format
The Solr synonym format uses a Solr synonyms file which contains one or more synonyms per line. A line is defined as one or more characters followed by a line end ("\n", "\r", or "\r\n").
Only the explicit mapping syntax is supported, this can be recognized by the "=>
" on a line. All other lines are ignored.
The comma can be escaped with a backslash if it is part of your matching pattern or replacement.
|
The type identifier to set on the added annotations. |
Yes |
N/A |
Solr synonym file example:
Other
# All occurrences of foo should be replaced by bar.
foo => bar
# All occurrences of foo should be replaced by bar or baz.
foo => bar
foo => baz
# All occurrences of foo and bar should be replaced by baz.
foo, bar => baz
# All occurrences of "foo, bar" should be replaced by baz.
foo\, bar => baz
# All occurrences of foo and bar should be replaced by baz or foobar.
foo, bar => baz, foobar
# All occurrences of foo and bar should be replaced by "baz, foobar".
foo, bar => baz\, foobar
# Not supported: inexplicit mappings
foo, bar, baz
Annotation metadata format:
JavaScript
{
"synonyms": [
"The synonym(s) as found in the dictionary on the right hand side of the =>, e.g. foo"
],
"pattern": "The pattern as found in the Solr synonym file, e.g. foo"
}
xml Format
Use this format to create a dictionary based any given XML file using XPath's.
Fonto Content Quality supports W3C XPath 1.0 expressions. See https://www.w3.org/TR/1999/REC-xpath-19991116/ for details.
|
The XPath expression to select the items from which to create the dictionary. |
Yes |
N/A |
patternQuery |
The XPath expression to select the pattern for each item. |
Yes |
N/A |
annotationTypeId |
The type identifier to set on the added annotations. |
Yes |
N/A |
Mapping metadata:
Within the <xmlFormat> configuration you are allowed to configure mappings using the <mapping> element to add data to the Annotation metadata:
|
The XPath expression to select the metadata value for each item. |
Yes |
N/A |
metadataName |
The metadata property name. |
Yes |
N/A |
XML file example:
Other
<xml>
<products>
<product name="Fonto Content Quality" url="https://www.fontoxml.com/fonto-content-quality/">
<owner>Fonto</owner>
</product>
<product name="Fonto Review" url="https://www.fontoxml.com/fonto-review/">
<owner>Fonto</owner>
</product>
</products>
</xml>
json Format
Use this format to create a dictionary based any given JSON file using JSONPath's.
See https://goessner.net/articles/JsonPath/ for details on JSONPath.
|
The JSONPath expression to select the items from which to create the dictionary. |
Yes |
N/A |
patternQuery |
The JSONPath expression to select the pattern for each item. |
Yes |
N/A |
annotationTypeId |
The type identifier to set on the added annotations. |
Yes |
N/A |
Mapping metadata:
Within the <jsonFormat> configuration you are allowed to configure mappings using the <mapping> element to add data to the Annotation metadata:
|
The JSONPath expression to select the metadata value for each item. |
Yes |
N/A |
metadataName |
The metadata property name. |
Yes |
N/A |
JSON file example:
Other
{
"abbreviations": [
{
"pattern": "API",
"shortened-for": "Application programming interface",
"description": "In computer programming, an application programming interface (API) is a set of subroutine definitions, communication protocols, and tools for building software."
}
]
}
Example configuration
XML
<!-- Dictionary sequence -->
<sequential>
<parallel>
<dictionaryAnnotator>
<httpSource url="http://my-server/dictionaries/solr/synonym.txt"/>
<solrSynonymFormat annotationTypeId="demo:part" />
</dictionaryAnnotator>
<dictionaryAnnotator>
<fileSource relativePath="wordlistdictionary.txt"/>
<wordListFormat annotationTypeId="demo:product" />
</dictionaryAnnotator>
</parallel>
<dictionaryAnnotator>
<fileSource relativePath="dictionary.xml" />
<xmlFormat itemsQuery="//products/product" patternQuery="@name" annotationTypeId="dictionary-item">
<mapping valueQuery="./owner" metadataName="owner"/>
<mapping valueQuery="@url" metadataName="website"/>
</xmlFormat>
</dictionaryAnnotator>
<dictionaryAnnotator>
<fileSource relativePath="dictionary.json" />
<jsonFormat itemsQuery="$.abbreviations[*]" patternQuery="pattern" annotationTypeId="dictionary-item">
<mapping valueQuery="shortened-for" metadataName="fully-written"/>
<mapping valueQuery="description" metadataName="explanation"/>
</jsonFormat>
</dictionaryAnnotator>
</sequential>
Editor
Make sure to register the returned annotation types as custom annotations inside the editor.
H TT P source AP I
When using the , you must be able to handle the following request and need to respond accordingly.
G ET {endpoint}
Serves a dictionary file. This request is made by Content Quality, not by the Fonto editor.
Parameters
Request
Headers | ||
---|---|---|
If-None-Match |
Optional |
Contains the entity-tag value as received from a previous request to this endpoint, if any. This header will not be set on the first request after starting Content Quality. |
Response
Status |
Reason and model | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
200 |
The dictionary file is returned as is, in the encoding and content type as understood by the configured dictionary annotator format.
| |||||||||
304 |
The dictionary has not been changed, based on the If-None-Match request header and the current entity-tag value of the dictionary file. | |||||||||
400 |
Bad Request. | |||||||||
500 |
Any error in the 500 range indicates a problem with the dictionary endpoint. |
When the request fails or has a timeout, a new request is made after the time as determined by the max-age (or its default value). In the time between the failed request and a new successful request, the annotator will not work and annotating will fail.
See the documentation for If-None-Match, ETag, and Cache-Control max-age. Most notably, depending on your webserver framework, you must be aware that the entity-tag value in the ETag and If-None-Match headers are surrounded by double quotes as per specification.
Examples
Request
Headers:
Other
If-None-Match: "x234dff"
Response
Headers:
Other
Cache-Control: max-age=300
ETag: "x234dff"
Body:
Other
foo
bar
baz