Article last updated on 17 April 2023 (version 32.23)
The Controlled vocabulary lets you create word lists with preferred terms / keywords. Terms are hierarchically organised like a tree, with broader and narrower terms. Your lists can be used in different ways while keywording your files online, but also to automatically translate words, to remove or replace words, or to add synonyms while metadata is being processed by the background data processing software on the server. Regardless of how your files enter the system.
A controlled vocabulary is an established list of standardized terminology for use in indexing and retrieval of information. A controlled vocabulary ensures that a subject will be described using the same preferred term each time it is indexed and this will make it easier to find all information about a specific topic during the search process. The Infradox XS Controlled vocabulary however, is used for several other purposes as well – as is described in this article. Controlled vocabularies typically include preferred and variant terms for a defined scope or a specific domain. You can build your word lists as a taxonomy. A taxonomy is an orderly classification for a defined domain, also known as a faceted vocabulary. It comprises controlled vocabulary terms (generally only preferred terms) organised into a hierarchical structure. Each term in a taxonomy is in one or more parent/child (broader/narrower) relationships to other terms in the taxonomy. There can be different types of parent/child relationships, such as whole/part, genus/species, or instance relationships. In good practice, all children of a given parent share the same type of relationship.
It is recommended that you read this entire article before you start entering terms in your Controlled vocabulary.
Building the Controlled vocabulary
You can organise your word lists five levels deep. The Controlled vocabulary view shows five boxes. The leftmost box represents the so called Root, and it has the broadest (parent) terms. The Root terms are very important because these are used to define the scope or domain of the terms that are created for it. You can add a virtually unlimited number of Root terms. An example of a four level deep hierarchy is Food > Fruit > Tropical fruit > Banana. Where Food is the broadest (or Root) term, and Banana is the narrowest term in this hierarchy. Another well known example of a Root term is Animal kingdom as a container to define all animals with Vertebrates and Invertebrates as its immediate narrower terms.
If you want to use the CV to find and replace wrongly spelled words, then you should create a separate root node for it and make it hidden from the user interface – so that such terms will not appear in e.g. the keyword tree dialog.
To be able to make changes to the vocabulary, you’ll have to lock it for editing. Click the Lock button at the top. The page will reload and it will show your name. When you are done editing, click the button again (which is now labeled Unlock).
Once you have added a Root term (click the + icon at the top of the box), you can start adding narrower terms for it in the 2nd box. First click the root term for which you want to add narrower terms, then click on the + icon at the top of the 2nd box to add a single term. Or click on the double ++ icon to add multiple terms at once. If you add a term with the single + icon, you’ll be able to also add synonyms for the new term. If you add many terms at once, you’ll be able to add synonyms later by selecting a term and then click on the edit icon that will appear on the right of the selected term.
Synonyms and related terms
Synonyms are terms that have the same meaning as the terms that you have defined as your preferred terms. For example, you may want to use the term Conflict as opposed to War. Or think of Holland as another name for the official term The Netherlands. The background processing software can be configured to automatically replace found words with your preferred terms. So if the word Holland is found in the keywords, it can be automatically replaced with The Netherlands. But it is also possible to add synonyms or preferred terms. E.g. if the term War is found, the preferred term Conflict can be added and if the preferred term Conflict is found, the synonym term War can be added.
Other uses of the synonyms field
You can also use the synonyms function to add commonly misspelled words, short hand codes and related terms, so that these can be automatically added or so that these can be used to find preferred terms. It is also possible to add synonyms to other fields than the field for which a rule is configured. E.g. you can use one of the metadata fields to store terms that you don’t want to see on your website, but that you do want indexed/searchable, so that a file is found when the preferred term is used in a search, but also when one of its synonyms is used instead.This way the Controlled vocabulary can accommodate two categories of information: information intended for display to end users and information intended for retrieval. You can also use this mechanism to automatically add terms for which shorthand codes are found. E.g. if your keyworders use ISO country codes as opposed to full country names, then you can maintain a list of all official country names, and you can add the ISO codes to the synonyms.
Organising your word list hierarchy
Before you start adding terms, it is important to carefully think about how you are going to organise / group your terms. One of the reasons is that you can connect any level of your terms to fields in the metadata repository. This way you can use the Controlled vocabulary to select properties describing a photo, video or other file type. For example, you can configure one of the metadata fields to allow for photographic terms only. It is recommended to create one Root term as a container for such use. So for example, create a root term called Fields and then add a narrower term called Photographic terms. Then add the terms that may be selected for the field that you have connected in the metadata editing dialog or the upload pages. For example, Abstract, Blurred, Close-up and so on. You can configure another metadata field specifically for Composition. Create a narrower term for the root term Fields (which is actually just a container in this example) and call it Composition. Next you can add narrower terms for Composition such as Full length, Waist up, Mid section, Head and shoulders, Face only and so on. Note that this is just an example of how you can use the Controlled vocabulary. You don’t have to configure metadata fields to hold specific information, all the terms can go into the keywords field just as well.
You can link the a level in the repository to a field in the metadata repository so that users can select a value from a drop down box. In 31.3 or later, you can also use “flat value” drop downs. For example, if you have created a “Film” top level with a child node “Color” and Color has the child node “Kodak” which has the child nodes “Portra 160, Portra 400, Portra 800” – then the linked field drop down will show the following options:
Color, Kodak, Portra 160
Color, Kodak, Portra 400
Color, Kodak, Portra 800
This is of course just a very simple example, but it should make clear how you can use this function, see screen shot for an example:
Terms may appear in different hierarchies. For example the term Woman may appear in Business as Businesswoman, in Gender as Female with Woman added as a synonym and so on. You can search for terms by entering a word in the box at the top. Any matches will display including their broader term / context. You can then click on any of the terms to go to in the tree view. All of the terms broader terms will load and expand automatically.
Importing terms from a CSV file
Version 32.23 or later.
You can import terms from a CSV (comma separated values) file that you can prepare with software such as Excel. When you are done, save your xls file as quote delimited, comma separated CSV file. The column titles are:
term1, syn1, term2, syn2, term3, syn3, term4, syn4, term5, syn5
Column term1 must always exist, all other columns are optional. The columns that start with “syn” are for synonyms (e.g. syn1 has the synonyms for term1). These must be comma separated values always.
Example CSV file:
The process will always add the root terms (column term1) first, before processing narrower terms (columns term2, term3, term4 and term5). It will not add a term if it already exists.
The example above will be processed as follows (assuming that none of the terms already exist):
A root term “Film” will be created first, next the narrower term “Colour” is created as a narrower term of “Film”. The narrower term “Kodak” will be added as a narrower term to “Colour” – and then 3 narrower terms are added to the “Film\Colour\Kodak” node. No synonyms are added for these terms because the file has no such data.
For the last row, a root term “Countries” will be added, with a narrower term “Europe” and the narrower term “The Netherlands” is added with the synonyms “Holland,Nederland” to the term Europe.
Note that the columns syn1,syn2,syn3,syn4,syn5 and term5 could have been left out.
Upload your CSV file with File manager to the “\vocabulary\import” folder. To process a file that you have uploaded, go to the CV in backoffice and click on the “import” button in the toolbar.
It is recommended to process files with no more than 5,000 rows. If you need to import more rows then create and import several smaller files.
CSV import FAQ
- If a term is already found, will the synonyms be updated with the synonyms in the import file?
No, synonyms are added to new terms only.
- Do I need to add columns for synonyms or for narrower terms that I’m not using?
No, only “term1” is required.
- The import dialog reports 0 rows imported, what’s wrong?
Your file is using incorrect column names/titles, it doesn’t have column names/titles, or the data is not quote delimited and comma separated.
- What happens with duplicate terms in my import file?
Only the first unique term will be processed, any other rows with the same term are ignored.
- I want to import terms as narrower terms that already exist, how do I do this?
Prepare your CSV file as described above, you can use the import function to add narrower terms to already existing nodes (terms).
- Can I use the import function to update existing terms?
No, the import function is to add terms only.
Translating your vocabulary
The controlled vocabulary can be translated for all the languages (locales) that you have enabled in Site configuration. By default, when you add terms in the primary locale, copies are added to all other locales that you can then edit by switching to a different locale. To switch to a different locale, simply select one in the dropdown box on the top right. The CV is now in Translation mode. You can not add terms when in translation mode.
If you want to translate your lists off-line, then you can schedule an export job (using job server in Back office). When you are done translating the exported cvs file, you can upload your translated file again with job server. It will automatically apply your updates to the selected locale.
You can create a data processing rule to automatically translate terms found in one or more source fields, by use of your CV. The translated terms can be added to the source field or to a separate field.
Background processing of metadata
From here on the Controlled vocabulary is referred to as the CV.
When a file enters the system or when a file’s metadata is updated, the metadata is processed on the server. Background processing involves many different steps. E.g. your metadata is analysed to automatically create search filters, to create galleries, to extract keywords for the live suggestions functions, for the similar files function and so on.
You can also configure background processing to – among other things – enrich your data by use of the CV and/or to translate terms (e.g. keywords). To set this up, go to Site configuration and then click Metadata processing in the sidebar. You can create as many rules as you need. The process of adding and changing such rules is described in the article Data processing rules. This paragraph explains how you can create a processing rule using your CV to replace or to add preferred and synonym terms. Note that you can create several rules, e.g. one for each field that you want to process.
Processing rules for vocabulary processing
To add a processing rule, click on New in the toolbar. In the properties dialog, it is recommended to create a new group for rules that you configure for CV processing. You can do this by simply entering a title in the input box “enter new group name”. When you save your processing rule, the group will be automatically created and when you add additional rules you’ll be able to select this group in the dropdown box. Also enter a short description that reminds you what this processing rule is for. In this example, we’ll be processing the keywords so select Keywords for the content field property. You can use this rule for new files only, for updates only or for both. Generally speaking you should check both boxes (Inserts and Updates). Leave the box Before supplier matching unchecked. Leave all other settings on the first tabsheet unchecked and don’t enter anything in the input boxes. If you want to limit this rule so that it processes files from certain suppliers or groups only, then you can configure this on the supplier conditions tabsheet. The same is true for the conditions that you can configure on the processing conditions tabsheet. You can find more information about this in the article Data processing rules.
Click on the Vocabulary tab to start configuring your new processing rule. Tick the box Enable vocabulary processing. Below is an explanation of the settings that you can use:
- Add synonyms
If you enable this function, then each word that is found in the content field (which you have selected on the first tabsheet) is looked up in the CV. The other settings on this tabsheet are used to configure what has to be done if there’s a match and how to look for terms in the CV.
- Match preferred terms only
As described in the above paragraphs, the terms that you enter in the CV are considered preferred terms. I.e. the terms that you want your keyworders to use. A term may be found by its synonyms as well by unticking this setting. For example if you have The Netherlands as a preferred term, and the word Holland is found in the keywords (or another field for which you are creating a processing rule), then the term will be found if it exists in the synonyms field for The Netherlands. Normally the setting Match preferred term only should be off.
- Replace found synonym with preferred term in the input source
As described above, if a term is found in the synonyms that you’ve added to your preferred term, then this synonym will be replaced with the preferred term. Using the above example, Holland will be removed and The Netherlands will be added. Note that if the preferred term exists in the content field also, then the synonym will be removed without adding the preferred term again.
- Add terms from vocabulary to…
If the option Add synonyms is enabled, and there’s a match in your CV, then you have the terms added to the same field but also to a different field that you must select in the dropdown box. This makes it possible to add synonyms, commonly misspelled words and/or related terms to a separate field that is searchable but not visible. Note that you make fields invisible for users in The metadata repository.
- Add parent/broader term
If you select this option, then the immediate broader term of a term that is found in the CV will be added if it doesn’t already exists in the content field. For example, if banana is found and it’s defined as a narrower term of tropical fruit, then tropical fruit will be added.
Processing rules for vocabulary translation
To configure automatic translation of words by use of the CV, add a new processing rule (as described above) select a source field on the first tab sheet, and then click on the tab sheet “Vocabulary translation“. You can now select the source language and the target language (e.g. from English to French). The translated terms can be added to the source field or you can output the translations to a different field. If you use the latter option, then you can clear the target field first by selecting this option next to the target field dropdown box. The translation function only looks for exact matches in the preferred words, it doesn’t scan for words in the synonyms. You can however choose to have any synonyms that are entered for the found term added to the output field.
Limiting data processing scope
As described in the previous paragraphs, the so called Root terms can be used to provide scope. You can limit searching for terms to specific levels in your CV’s hierarchy as opposed to searching the entire CV. So if you are creating a processing rule for a metadata field that you are using for a specific purpose, then you can configure the processing rule to look for terms in a selected level only. E.g. if we have a field in which we want country names only, and there’s a root term called World which describes all countries as narrower terms, then limit searching to the level World. Note that you always select a Parent term because the data processing function will search within this term only, i.e. for narrower terms created for the selected broader term.
E.g. World (root term providing scope) > Western Hemisphere (narrower term) > Europe (narrower term) > The Netherlands (narrower term).
Controlled vocabulary Data processing scenario’s
Below are some examples of configuring data processing rules that work with the Controlled vocabulary