Compositors are types of analytics that are composed of other analytics. They distinguish themselves from each other based on the way the execute their analytics.
The sequential analytic executes the containing analytics in order.
The following example shows a Sequential compositor containing a RegexAnnotator that annotates FontoXML email addresses, followed by a RemoveTextAnnotationsIntersectingXmlElements filter that removes the annotations that overlap with an Anchor element.
<sequential> <regexAnnotator annotationTypeId="fontoxml-email" pattern="[a-zA-Z0-9_.+-]+@fontoxml\.com"/> <removeTextAnnotationsIntersectingXmlElements elements="a"/> </sequential>
The parallel analytic executes the containing analytics in parallel. It is recommended to use this for non-CPU bound analytics that have no dependency on each other. No dependency means that not a single analytic inside the parallel analytic requires the annotations of an annotator that is also inside that same parallel analytic.
The following example shows a Parallel compositor containing two HttpApiAnnotators. By adding them in a parallel compositor the HTTP requests will be executed in parallel improving the performance.
<parallel> <httpApiAnnotator endpoint="https://my-custom-annotator/annotate"/> <httpApiAnnotator endpoint="https://my-other-custom-annotator/annotate"/> </parallel>
The partitioner analytic partitions the fragments that are being analysed based on a given set of annotations and exposes those new fragments within the scoped context of the analytic. Annotators within a partitioner will only be able to analyse fragments that overlap with the given set of annotations. It is useful for annotating only the fragments that overlap with a certain annotation, for example executing a RegexAnnotator only on the text that overlaps with the result of an XPathAnnotator.
The annotation that is used to partition the fragments will not be included within the scope of the analytic.
Only newly added annotations will be made available after executing the partitioner. Any preexisting annotations that are removed within the partitioner, will not be removed after the execution of the partitioner finished.
The following example shows an XPathAnnotator that annotates <h1> elements, followed by a Partitioner that partitions based on those annotated headings. Within the partitioner is a RegexAnnotator that only annotates numbers within the headings. The result is that only the numbers that are within a <h1> will be annotated.
<sequential> <xpathAnnotator annotationTypeId="heading" test="self::h1" /> <partitioner annotationTypeIds="heading"> <regexAnnotator annotationTypeId="heading-number" pattern="[0-9]+" /> </partitioner> </sequential>
It is also possible to combine compositors. In the following example a Parallel compositor is executing two branches in parallel; The Sequential compositor (containing the RegexAnnotators) and the LanguageToolAnnotator.
<parallel> <sequential> <regexAnnotator annotationTypeId="fontoxml-email" pattern="[a-zA-Z0-9_.+-]+@fontoxml\.com"/> <regexAnnotator annotationTypeId="google-email" pattern="[a-zA-Z0-9_.+-]+@google\.com"/> <removeTextAnnotationsIntersectingXmlElements elements="a"/> </sequential> <languageToolAnnotator baseUrl="https://my-languagetool-instance:8010/v2/"> <spellingErrorMapping categories="TYPOS"/> </languageToolAnnotator> </parallel>