--Your friends at LectureNotes

Note for Spellings - S by Placement Factory

  • Spellings - S
  • Note
  • Verbal Ability
  • Placement Preparation
  • Uploaded 1 year ago
0 User(s)
Download PDFOrder Printed Copy

Share it with your friends

Leave your Comments

Text from page-1

spelling* Stephan Hennig† 25th May 2013 Abstract This package supports spell-checking of TEX documents compiled with the LuaTEX engine. It can give visual feedback in pdf output similar to wysiwyg word processors. The package relies on an external spell-checker application that can check a plain text file and output a list of bad spellings. The package should work with most spell-checkers, even dumb, TEX-unaware ones. Warning! This package is in a very early state. Everything may change! Contents 1 Introduction 2 Usage 2.1 Work-flow . 2.2 Word lists . 2.3 Match rules 2.4 Highlighting mistakes . . 1 2.5 2.6 2.7 2.8 1 . . . . . . . . . . . . . . . spellling . . . . . Text output . . . . . Text extraction . . . Code point mapping Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 3 LanguageTool support 3 3.1 Installation . . . . . . . . 4 3.2 Usage . . . . . . . . . . . . . . 7 4 Bugs Introduction Ther1 are three main approaches to spell-checking TEX documents: 1. checking spelling in the .tex source file, * This document describes the spelling package v0.41. sh2d@arcor.de 1 A footnote containing mispellings. † 1 8 9 9 10 11 11 13 14

Text from page-2

2. converting a .tex file to another format, for which a proved spellchecking solution exists, 3. checking spelling after a .tex file has been processed by TEX. All of these approaches have their strengths and weaknesses. This package follows the third approach, providing some unique features: • In traditional solutions, text is extracted from typeset dvi, ps or pdf files, including hyphenated words. To avoid (lots of) false positives being reported by the spell-checker, hyphenation needs to be switched off during the TEX run. That is, one doesn’t work on the original document any more. In contrast to that, the spelling package works transparently on the original .tex source file. Text is extracted during typesetting, after LuaTEX has applied its catcode and macro machinery, but before hyphenation takes place. • The spelling package can highlight words with known incorrect spelling in pdf output, giving visual feedback similar to wysiwyg word processors.2 2 Usage The spelling package requires the LuaTEX engine. All functionality of the package is implemented in Lua. The LATEX interface, which is described below, is effectively a wrapper around the Lua interface. Implementing such wrappers for other formats shouldn’t be too difficult. The author is a LATEX -only user, though, and therefore grateful for contributions. By the way, the LATEX package needs some polishing, too, e. g., a key-value interface is desirable. Patches welcome! 2.1 Work-flow Here’s a short outline of how using the spelling package fits into the general process of compiling a document with LuaTEX: 1. After loading the package in the preamble of a .tex source file, a list of bad spellings is read from a file (if that file exists). 2 Currently, only colouring words is implemented. 2

Text from page-3

2. During the LuaTEX run, text is extracted from pages and all words are checked against the list of bad spellings. Words with a known incorrect spelling are highlighted in pdf output. 3. At the end of the LuaTEX run, in addition to the pdf file, a text file is written, containing most of the text of the typeset document. 4. The text file is then checked by your favourite external spell-checker application, e. g., Aspell or Hunspell. The spell-checker should be able to write a list of bad spellings to a file. Otherwise, visual feedback in pdf output won’t work. 5. Visually minded people may now compile their document a second time. This time, the new list of bad spellings is read-in and words with incorrect spelling found by the spell-checker should now be highlighted in pdf output. Users can then apply the necessary corrections to the .tex source file. Whatever way spell-checker output is employed, users not interested in visual feedback (because their spell-checker has an interactive mode only or because they prefer grabbing bad spellings from a file directly) can also benefit from this package. Using it, LuaTEX writes a pure text file that is particularly well suited as spell-checker input, because it contains no hyphenated words (and neither macros, nor active characters). That way, any spell-checker application, even TEX-unaware ones, can be used to check spelling of TEX documents. 2.2 Word lists As described above, after loading the spelling package, a list of bad spellings is read from a file 〈jobname〉.spell.bad, if that file exists. Words found in this file are stored in an internal list of bad spellings and are later used for highlighting spelling mistakes in pdf output. Additionally, a list of good spellings is read from a file 〈jobname〉.spell.good, if that file exists. Words found in the latter file are stored in an internal list of good spellings. File format for both files is one word per line. Files must be in the utf-8 encoding. Letter case is significant. A word in the document is highlighted, if it occurs in the internal list of bad spellings, but not in the internal list of good spellings. That is, known good spellings take precedence over known bad spellings. Users can load additional files containing lists of bad or good spellings with macros \spellingreadbad and \spellingreadgood. Argument to \spellingreadbad \spellingreadgood 3

Text from page-4

both macros is a file name. If a file cannot be found, a warning is written to the console and log file and compilation continues. As an example, the command \spellingreadgood{myproject.whitelist} reads words from a file myproject.whitelist and adds them to the list of good spellings. Known good spellings can be used to deal with words wrongly reported as bad spellings by the spell-checker (false positives). But note, most spell-checkers also provide means to deal with unknown words via additional dictionaries. It is recommended to configure your spell-checker to report as few false positives as possible. 2.3 Match rules This section describes an advanced feature. You may safely skip this section upon first reading. The spelling package provides an additional way to deal with bad and good spellings, match rules. Match rules can be used to employ regular patterns within certain ‘words’. A typical example are bibliographic references like Lin86, which are often flagged by spell-checkers, but need not be highlighted as they are generated by TEX. There are two kinds of rules, bad and good rules. A rule is a Lua function whose boolean return value indicates whether a word matches the rule. A bad rule should return a true value for all strings identified as bad spellings, otherwise a false value. A good rule should return a true value for all strings identified as good spellings, otherwise a false value. A word in the document is highlighted if it matches any bad rule, but no good rule. Function arguments are a raw string and a stripped string. The raw string is a string representing a word as it is found in the document possibly surrounded by punctuation characters. The stripped string is the same string with surrounding punctuation already stripped. As an example, the rule in Listing 1 matches all words consisting of exactly three letters. The function matches the stripped string against the Lua string pattern ^%a%a%a$ via function unicode.utf8.find from the Selene Unicode library. The latter function is a utf-8 capable version of Lua’s built-in function string.find. It returns nil (a false value) if there has been no match and a number (a true value) if there has been a match. The pattern %a represents a character class matching a single letter. Characters ^ and $ are anchors for the beginning and the end of the string in question. 4

Lecture Notes