What is it?
Segmentation Rules eXchange. An open XML-based standard that describes how translation and other language-processing tools should split text fragments for processing.
Why is it important?
SRX helps in achieving better translation reuse at the sentence level from translation memory (TM) engines.
Why does a technical communicator need to know this?
Translation memory (TM) is a technology that reduces localization costs by reusing previous translations. TM benefits are usually larger when working with sentences instead of paragraphs. Different tools split paragraphs into sentences in different ways when using their default settings. SRX was created by LISA (Localization Industry Standards Association) to standardize the rules used to segment text.
Understanding how tools break large texts into smaller pieces is important for maximizing the reuse of TM assets when using multiple translation tools.
SRX knowledge is useful when working with text that includes specialized abbreviations that localization tools don’t contemplate in their default configurations. Adjusting the rules according to the text produces more legible segments and enhances TM leveraging at a later stage.
SRX rules use regular expressions to indicate where to break text and what exceptions should be considered. The regular expressions used in SRX are defined in the specification document, which includes an appendix that contains examples[SRX spec][SRX editor].
The SRX standard definition and XML schema are currently available at the Globalization and Localization Association (GALA) web site[SRX schema].
References
- [SRX spec] SRX 2.0 Specification
- [SRX schema] SRX 2.0 XML Schema
- [SRX editor] SRXEditor: a free cross-platform editor of segmentation rules: Includes a sample file in SRX 2.0 format with a default set of segmentation rules supporting most standard cases. It also includes segmentation rules specific for 16 languages.