Read Me Page
Validate WCAG, Section 508, HTML, CSS, Links, and Spelling

Introduction

Selecting the spell check option on the Main tab turns on Total Validator's spell checking system. By default Total Validator detects the language used for every word on every web page that is validated, and then uses a dictionary for that language, together with some rules, to spell check it.

If a word isn't found in the matching dictionary then a spelling mistake is displayed in the results. A list suggested corrections may also be displayed as well.

The following page details how to get the best out of the spell checking system, using the many options available on the Spell check tab. These have changed considerably since v10, so users of older versions of Total Validator may also need to check the Migration section.

top

Internal dictionaries

Six internal dictionaries are built into Total Validator to cover five West European languages: English, French, German, Italian and Spanish. The six internal dictionaries are named after the language codes they apply to: en-US, en-GB, fr, de, it, es. These will also be used to check words for any specific languages codes, such as fr-CA and fr-FR as well. English is a special case: The en-US dictionary will be used for all en languages such as en-CA, except for en-GB which will always use the en-GB dictionary. But you can switch to using the en-GB dictionary for all en languages (except for en-US of course).

All dictionaries used for spell checking are just plain, UTF-8 encoded, text files consisting of one word per line. To save having to list every possible variation of each word for each language, rules are used specific to each supported language to detect plurals, apostrophes, and other similar features, so that a single word in the dictionary may match many actual words that are found on web pages. For example 'address' will match 'addresses'.

top

Language detection

In order to determine which dictionary to use, the system looks at the language code in the 'lang' attribute of the element containing it. For example:

<p lang='en-CA'>This will be detected as Canadian English</p>

If there is no 'lang' attribute, it looks at the parent element, and then it's parent, right up to the <html> element:

<div lang='en-CA'><p><span>This will still be detected as Canadian English</span></p></div>

If there are no 'lang' attributes, then the system looks for a <meta> tag specifying the language: <meta http-equiv='Content-Language' content='en'>. If that doesn't exist, it then looks for a Content-Language HTTP header in the page response. If nothing is still found then, if it has been set, it will use the Default language option. If it fails to find any matching language code then words in the element will not be spell checked.

Note that if the detected language code is blank (''), malformed (!rubbish!), or one for which there is no dictionary (zh-Hans-CN), then the words will not be checked. Special words such as upper-case words, words in attributes, and words with digits in, will also be ignored, unless you set the appropriate What to check option to include them.

As mentioned above, dictionaries for the top-level language are special in that they will match any subset of that language, so the French fr dictionary, will match fr-CA and fr-FR as well as fr. But the Ignore languages option can be used to prevent specific language codes, such as fr-CA, being checked. Better still, if you provide your Own dictionary for fr-CA, this will be used in place of the top-level one for that specific code. Beware that if you provide your Own dictionary for fr it will be used for all subsets such as fr-CA and fr-FR.

top

Results

When a word is checked against a dictionary and no match found, it will be marked as 'misspelt' on the results pages. But if you think that the word is correct, you can click on it to add it to a correction dictionary in the Corrections folder. Future validations will then use these correction dictionaries, so these words will no longer be marked as misspelt.

As you view each results page any words that you have already clicked on will not be marked as misspelt on any page you view, making it quicker to correct mistakes.

If you have a lot of specialist words on your website, then to save clicking on every misspelt word on every page. Then (just once) you could use the Save misspelt words option to save every misspelt word to a correction dictionary. You should then check the dictionaries that are created to ensure that all the words in them really are correctly spelt for future validations.

Correction dictionaries are named (in lower-case) after the language code used the check the words within them, together with a .dic suffix. For example fr-CA words will be saved in a file called fr-ca.dic (even if the fr dictionary was used to check them). The Show language code option may be used to display the language code (and hence the file a word will be saved to) on the results page. This may be useful if you have pages with lots of different languages on them and you wish to know which files words will be saved to, which may be useful as described in the next section.

top

Correction dictionaries

As described above, correction dictionaries may be created by Total Validator when you click on 'misspelt' words on the results pages, or use the Save misspelt words option. You can also add words to these files, or create your own files. But please note that these must be plain text files consisting of one word per line, ideally with no duplicates, and must be saved using UTF-8 encoding, otherwise some of the words within them may be ignored.

When a spell check runs, the words in any file in the Corrections folder prefixed with a matching language code, and with the suffix .dic, will be used. This means that words in fr-ca.mydictionary.dic, fr-ca.dic, and fr.dic, will be used to check fr-ca words.

If a file is found in the corrections folder which ends with the suffix .dic, but has no prefix, or uses an invalid code, then it is added to all of the dictionaries. This is so you can create correction dictionaries for global words like "Google".

top

Creating your own dictionaries

You can also add your own dictionaries for languages not built into Total Validator, for specific country codes such as fr-CA, or even replace our with your own. To do this add a list of paths to your dictionary files using the Own dictionaries option.

As with all dictionaries these must be plain text files consisting of one word per line, ideally with no duplicates, and must be saved using UTF-8 encoding. Dictionary file names must also start with a valid language code prefix ending with a ".", and the whole file name must end with the suffix .dic, otherwise they will be ignored. For example fr-ca.dic and pt-PT.mydictionary.dic are okay, but not fr-CA, nor fr-CA-dic, nor fr-CA-mydictionary.dic. Any files which also appear on the Ignore languages option, will be completely ignored.

Dictionary file names are also case-insensitive, so if you list fr-ca.dic,fr-CA.dic only one of these will be used (with no guarantee which one). Also if there is a naming conflict such as fr.mydic.dic,fr.mydic2.dic, only one of these will be used. Beware that any dictionaries with names which match the internal ones will be used instead of them, such as fr.dic and en-GB.dic.

Just like the internal dictionaries, any dictionary with a language code which is just the name of the language, such as pt.dic will be used to check words in any country or region specific languages codes such as pt-BR as well as pt, unless you supply a specific dictionary for the specific code like pt-br.dic.

If you supply a dictionary for a language matching one the five West European languages, such as fr-CA, then the special language specific rules we have, such as detecting plurals will also be applied. But for other languages you will have to list every variation of every word. Also we have only tested the system with Western languages so there is no guarantee our system will work with languages which are significantly different (please let us know if they don't and we will try to fix things).

Fast creation

A quick way of creating your own dictionary is to use the Save foreign words option. With this option, all of the words which are marked with a valid language code, but for which no matching dictionary exists, will be saved to the Corrections folder. These are saved in files named after the matching language code with a .dic suffix. For example pt-br.dic.

You can also do this for country specific or regional language codes for which there is a top-level dictionary, by listing the code in the Ignore languages option. For example, to create your own dictionary for fr-CA, use fr-CA in the Ignore languages option and then use the Save foreign words option. Then the top-level fr dictionary will ignore fr-CA words, and they will be saved to the fr-ca.dic dictionary file.

Note that for some glyph-based languages like Chinese, all the words are upper case, so you may need to set the "UPPER CASE words" option to ensure that words are saved, and use the same option when spell checking them.

As these dictionaries are stored in the Corrections folder, it may be better to move them somewhere else where they are less likely to be overwritten. Also you will still need to list them in the Own dictionaries option for them to be used, and remember to remove the language code from the Ignore languages option, so they are no longer ignored.

top

Ignoring words

You can skip spell checking for any words where there is a matching dictionary, using one of three methods:

  • Add the matching language code to the Ignore languages option. Note that adding a top-level language code such as fr will skip all country and region specific codes as well
  • Mark the section with the HTML5 spellcheck="false" attribute
  • Mark the section using the -tv-ignore:E031 or -tv-ignore-spellcheck class attributes as described in Ignoring issues

top

Migrating from pre v11 versions of Total Validator

The internal dictionaries are now stored in the dics folder, rather than the dicts folder in the application folder, so if you've changed any of these you may need to move them into the new folder and rename them to match the new names.

Similarly corrections are now stored by default in the dics folder, rather than the dicts folder in the results folder, so you may need to move these and rename them using the new filename format in order for them to be used.

The Command line options have changed, so that using some of the old ones will throw an error and stop the validation from running. This has been done so that confusing results are not produced, so you will need to change any existing scripts that using spell checking. This also applies to the Embedded version.

top