Read Me Page
Validate WCAG, Section 508, HTML, CSS, Links, and Spelling

Introduction

Selecting the spell check option on the Main tab turns on Total Validator's spell checking system. By default Total Validator detects the language used for every word on every web page that is validated, and then uses a dictionary for that language, together with some rules, to spell check it.

If a word isn't found in the matching dictionary then a spelling mistake is displayed in the results. A list suggested corrections may also be displayed as well.

The following page details how to get the best out of the spell checking system, using the many options available on the Spell check tab. These have changed considerably since v10, so users of older versions of Total Validator may also need to check the Migration section.

top

Internal dictionaries

Six internal dictionaries are built into Total Validator to cover five West European languages: English, French, German, Italian and Spanish. The six internal dictionaries are named after the language codes they apply to: en-US, en-GB, fr, de, it, es. These will also be used to check words for any country or region specific languages codes such as fr-CA and fr-FR as well. English is a special case in that the en-US dictionary will be used for en words (except for en-GB). But you can switch to using the en-GB dictionary for en words instead (except for en-US of course).

All dictionaries used for spell checking are just plain, UTF-8 encoded, text files consisting of one word per line. But to save having to list every possible variation of each word for each language, rules are used specific to each supported language to detect plurals, apostrophes, and other similar features. So that a single word in the dictionary may match many actual words that are found on web pages. For example 'address' will match 'addresses'.

top

Language detection

In order to work out which dictionary to use to spell check a word, the system looks at the language code in the 'lang' attribute of the element containing it. For example:

<p lang='en-CA'>This will be detected as Canadian English</p>

If there is no 'lang' attribute, it looks at the parent element, and then it's parent, right up to the <html> element:

<div lang='en-CA'><p><span>This will still be detected as Canadian English</span></p></div>

If none of the parent elements right up to <html> has a 'lang' attribute, then the system looks for a <meta> tag specifying the language: <meta http-equiv='Content-Language' content='en'>. If that doesn't exist, it then looks for a Content-Language HTTP header in the page response. If nothing is still found then, if it has been set, it will use the Default language option. If it fails to find any matching language code then the word will not be spell checked.

Note that if the detected language code is blank (''), malformed (!rubbish!), or one for which there is no dictionary (zh-Hans-CN), then the word will also be ignored. Special words such as upper-case words, words in attributes, and words with digits in, will also be ignored, unless you set the appropriate What to check option to include them.

As mentioned above, dictionaries for the top-level language are special in that they will match any subset of that language. So the French fr dictionary, will match fr-CA and fr-FR as well as fr. But the Ignore languages option can be used to prevent specific language codes, such as fr-CA, being checked. Better still, if you provide your Own dictionary for fr-CA, this will be used in place of the top-level one for that specific code. But if you provide your Own dictionary for fr it will be used for all subsets such as fr-CA and fr-FR.

top

Results

When a word is checked against a dictionary and no match found, it will be marked as 'misspelt' on the results pages. But if you think that the word is correct, you can click on it to add it to a correction dictionary in the Corrections folder. Future validations will then use these correction dictionaries, so these words will no longer be marked as misspelt.

Also as you view each results page any words that you have already clicked on will not be marked as misspelt on any page you view, making it quicker to correct mistakes.

If you have a lot of specialist words on your website, then to save clicking on every misspelt word on every page. Then as a one-off you could use the Save misspelt words option to automatically save every misspelt word to a correction dictionary. In this case your validation results will show no spelling errors at all. You should then check these dictionaries to ensure that all the words in them really are correctly spelt.

Correction dictionaries are named (in lower-case) after the language code used the check the words within them, together with a .dic suffix. For example fr-CA words will be saved in a file called fr-ca.dic (even if the fr dictionary was actually used to check them). The Show language code option may be used to display the language code (and hence the file a word will be saved to) on the results page. This may be useful if you have pages with lots of different languages on them and you wish to know which files words will be saved to, which may be useful as described in the next section.

top

Correction dictionaries

As described above, correction dictionaries will be automatically created as you click on 'misspelt' words on the results pages, or use the Save misspelt words option. You may also manually add words to these files, or create your own files. But please note that these must be plain text files consisting of one word per line, ideally with no duplicates, and must be saved using UTF-8 encoding, otherwise some of the words within them may be ignored.

When a spell check is run the words in any file in the Corrections folder prefixed with a valid language code, and with the suffix .dic, such as fr-ca.dic, will be added to the matching dictionary. So words in fr-ca.dic will be added to the fr-ca dictionary, assuming there is a fr-ca dictionary. Note that the physical dictionaries themselves are never actually changed. The corrections words are added in memory each time a spell check is run.

For top-level languages such as the internal ones en, fr, de, es, it things work slightly differently because these are used to check all the country and region specific language codes as well. So for these all the words from country and region specific correction dictionaries are added to the top level dictionary. So words in fr.dic, fr-fr.dic, and fr-ca.dic will be added to the fr dictionary, unless you've provided your own specific fr-fr dictionary, in which case words from fr-fr.dic will just be added to that one instead.

If a file is found in the corrections folder which ends with the suffix .dic, but has no prefix, or uses an invalid code, then it is added to all of the dictionaries. This is so you can create one correction dictionaries for global words like "Google". If a corrections dictionary has a valid code, but there is no matching dictionary, then it is simply ignored. Also any files which are on the list of Own dictionaries are also ignored, so they are not added twice.

top

Supplying your own dictionaries

As well as using the dictionaries supplied with Total Validator you can also add your own dictionaries for other languages, for specific country codes such as fr-CA, or even replace the internal dictionaries with your own. To do this add a list of paths to your own dictionaries using the Own dictionaries option.

As with all dictionaries these must be plain text files consisting of one word per line, ideally with no duplicates, and must be saved using UTF-8 encoding. Dictionary file names must also start with a valid language code prefix ending with a ".", and the whole file name must end with the suffix .dic, otherwise they will be ignored. For example fr-ca.dic and pt-PT.mydictionary.dic, but not fr-CA, nor fr-CA-dic, nor fr-CA-mydictionary.dic. Any which also appear on the Ignore languages option, will be completely ignored.

Dictionary file names are also case-insensitive. So if you list fr-ca.dic,fr-CA.dic only one of these will be used (with no guarantee which one). Also if there is a naming conflict such as fr.mydic.dic,fr.mydic2.dic, only one of these will be used. Also any dictionaries with names which match the internal ones will be used in place of them, such as fr.dic and en-GB.dic.

Just like the internal dictionaries, any dictionary with a language code which is just the name of the language, such as pt.dic will be used to check words in any country or region specific languages codes such as pt-BR as well as pt, unless you supply a specific dictionary for the specific code like pt-br.dic.

If you supply a dictionary for a language matching one the five West European languages, such as fr-CA, then the special language specific rules we have, such as detecting plurals will also be applied. But for other languages you will have to list every variation of every word. Also we have only tested the system with Western languages so there is no guarantee our system will work with languages which are significantly different (please let us know if they don't and we will try to fix things).

Creating your own dictionaries

A quick way of creating your own dictionary is to use the Save foreign words option. With this option, all of the words which are marked with a valid language code, but for which no matching dictionary exists, will be saved to the Corrections folder. Just like corrections, these are saved in files named after the matching language code with a .dic suffix. For example pt-br.dic.

You can also do this for country specific or regional language codes for which there is a top-level dictionary by listing the code in the Ignore languages option. For example, to create your own dictionary for fr-CA, list fr-CA in the Ignore languages option and then use the Save foreign words option. Then the top-level fr dictionary will ignore fr-CA words, and they will be saved to a fr-ca.dic dictionary file.

Note that for some glyph-based languages like Chinese, all the words are upper case. So you may need to set the "UPPER CASE words" option to ensure that words are saved, and use the same option when spell checking them.

As these dictionaries are stored in the Corrections folder, it may be better to move them somewhere else where they are less likely to be overwritten. Also you will still need to list them in the Own dictionaries option for them to be used, and remember to remove the language code from the Ignore languages option, so they are no longer ignored.

top

Ignoring words

You can skip spell checking for any words where there is a matching dictionary, using one of three methods:

  • Add the matching language code to the Ignore languages option. Note that adding a top-level language code such as fr will skip all country and region specific codes as well
  • Mark the section with the HTML5 spellcheck="false" attribute
  • Mark the section using the -tv-ignore:E031 or -tv-ignore-spellcheck class attributes as described in Ignoring issues

top

Migrating from pre v11 versions of Total Validator

The internal dictionaries are now stored in the dics folder, rather than the dicts folder in the application folder. So if you've changed any of these you may need to move them into the new folder and rename them to match the new names.

Similarly corrections are now stored by default in the dics folder, rather than the dicts folder in the results folder. So you may need to move these and rename them using the new filename format in order for them to be used.

The Command line options have changed so that using some of the old ones will throw an error and stop the validation from running. This has been done so that confusing results are not produced. So you will need to change any existing scripts that using spell checking. This also applies to the Embedded version.

top