Penguin
Blame: findaffix(1)
EditPageHistoryDiffInfoLikePages
Annotated edit history of findaffix(1) version 1, including all changes. View license author blame.
Rev Author # Line
1 perry 1 ISPELL
2 !!!ISPELL
3 NAME
4 SYNOPSIS
5 DESCRIPTION
6 ENVIRONMENT
7 FILES
8 SEE ALSO
9 BUGS
10 AUTHOR
11 VERSION
12 ----
13 !!NAME
14
15
16 ispell, buildhash, munchlist, findaffix, tryaffix, icombine, ijoin - Interactive spelling checking
17 !!SYNOPSIS
18
19
20 __ispell__ [[''common-flags''] [[__-M__|__-N__]
21 [[__-L__''context''__]__ [[__-V__] files__
22 ispell__ [[''common-flags''] __-l
23 ispell__ [[''common-flags''] [[__-f__ file]
24 [[__-s__] {__-a__|__-A__}__
25 ispell__ [[__-d__ ''file''] [[__-w__ ''chars'']
26 __-c
27 ispell__ [[__-d__ ''file''] [[__-w__ ''chars'']
28 __-e__[[__e__]__
29 ispell__ [[__-d__ ''file''] __-D
30 ispell -v__[[__v__]
31
32
33 ''common-flags'':
34
35
36 [[__-t__] [[__-n__] [[__-h__] [[__-b__] [[__-x__]
37 [[__-B__] [[__-C__] [[__-P__] [[__-m__] [[__-S__]
38 [[__-d__ ''file''] [[__-p__ ''file''] [[__-w__
39 ''chars''] [[__-W__ ''n''] [[__-T__
40 ''type'']
41
42
43 __buildhash__ [[__-s__] ''dict-file affix-file
44 hash-file''__
45 buildhash -s__ ''count affix-file''
46
47
48 __munchlist__ [[__-l__ ''aff-file''] [[__-c__
49 ''conv-file''] [[__-T__ ''suffix''] [[__-s__
50 ''hash-file''] [[__-D__] [[__-v__] [[__-w__
51 ''chars''] [[''files'']
52
53
54 __findaffix__ [[__-p__|__-s__] [[__-f__]
55 [[__-c__] [[__-m__ ''min''] [[__-M__ ''max'']
56 [[__-e__ ''elim''] [[__-t__ ''tabchar'']
57 [[__-l__ ''low''] [[''files'']
58
59
60 __tryaffix__ [[__-p__|__-s]__ [[__-c__]
61 ''expanded-file affix''[[''+addition'']
62
63
64 __icombine__ [[__-T__ ''type'']
65 [[''aff-file'']
66
67
68 __ijoin__ [[__-s__|__-u__] ''join-options file1
69 file2''
70 !!DESCRIPTION
71
72
73 ''Ispell'' is fashioned after the ''spell'' program
74 from ITS (called ''ispell'' on Twenex systems.) The most
75 common usage is
76 ''ispell'' will display each word which does not appear
77 in the dictionary at the top of the screen and allow you to
78 change it. If there are
79 ''
80
81
82 R
83
84
85 Replace the misspelled word completely.
86
87
88 Space
89
90
91 Accept the word this time only.
92
93
94 A
95
96
97 Accept the word for the rest of this ''ispell''
98 session.
99
100
101 I
102
103
104 Accept the word, capitalized as it is in the file, and
105 update private dictionary.
106
107
108 U
109
110
111 Accept the word, and add an uncapitalized (actually, all
112 lower-case) version to the private dictionary.
113
114
115 0-''n''
116
117
118 Replace with one of the suggested words.
119
120
121 L
122
123
124 Look up words in system dictionary (controlled by the WORDS
125 compilation option).
126
127
128 X
129
130
131 Write the rest of this file, ignoring misspellings, and
132 start next file.
133
134
135 Q
136
137
138 Exit immediately and leave the file unchanged.
139
140
141 !
142
143
144 Shell escape.
145
146
147 ^L
148
149
150 Redraw screen.
151
152
153 ^Z
154
155
156 Suspend ispell.
157
158
159 ?
160
161
162 Give help screen.
163
164
165 If the __-M__ switch is specified, a one-line mini-menu
166 at the bottom of the screen will summarize these options.
167 Conversely, the __-N__ switch may be used to suppress the
168 mini-menu. (The minimenu is displayed by default if
169 ''ispell'' was compiled with the MINIMENU option, but
170 these two switches will always override the
171 default).
172
173
174 If the __-L__ flag is given, the specified number is used
175 as the number of lines of context to be shown at the bottom
176 of the screen (The default is to calculate the amount of
177 context as a certain percentage of the screen size). The
178 amount of context is subject to a system-imposed
179 limit.
180
181
182 If the __-V__ flag is given, characters that are not in
183 the 7-bit ANSI printable character set will always be
184 displayed in the style of
185 __ispell'' thinks that these characters are legal ISO
186 Latin-1 on your system. This is useful when working with
187 older terminals. Without this switch, ''ispell'' will
188 display 8-bit characters
189 ''
190
191
192 -l__,
193 __-a__, and __-A__ options (see below) also accepts
194 the following
195 __
196
197
198 __-t__
199
200
201 The input file is in TeX or LaTeX format.
202
203
204 __-n__
205
206
207 The input file is in nroff/troff format.
208
209
210 __-h__
211
212
213 The input file is in html format. (This works well for XML
214 and SGML format, too.)
215
216
217 __-g__
218
219
220 The input file is in Debian control file format. Ispell will
221 ignore everything outside the Description(s).
222
223
224 __-b__
225
226
227 Create a backup file by appending
228
229
230 __-x__
231
232
233 Don't create a backup file.
234
235
236 __-B__
237
238
239 Report run-together words with missing blanks as spelling
240 errors.
241
242
243 __-C__
244
245
246 Consider run-together words as legal compounds.
247
248
249 __-P__
250
251
252 Don't generate extra root/affix combinations.
253
254
255 __-m__
256
257
258 Make possible root/affix combinations that aren't in the
259 dictionary.
260
261
262 __-S__
263
264
265 Sort the list of guesses by probable
266 correctness.
267
268
269 __-d__ file
270
271
272 Specify an alternate dictionary file. For example, use __-d
273 deutsch__ to choose a German dictionary in a German
274 installation.
275
276
277 __-p__ file
278
279
280 Specify an alternate personal dictionary.
281
282
283 __-w__ chars
284
285
286 Specify additional characters that can be part of a
287 word.
288
289
290 __-W__ n
291
292
293 Specify length of words that are always legal.
294
295
296 __-T__ type
297
298
299 Assume a given formatter type for all files.
300
301
302 The __-n__ and __-t__ options select whether
303 ''ispell'' runs in nroff/troff (__-n__) or TeX/LaTeX
304 (__-t__) input mode (This does not work for html
305 (__-h__) mode. However html-mode is assumed for any files
306 with a
307 __-n__ switch.
308 In TeX/LaTeX mode, whenever a backslash (
309 __ispell'' will skip to the next whitespace or
310 TeX/LaTeX delimiter. Certain commands contain arguments
311 which should not be checked, such as labels and reference
312 keys as are found in the cite command, since they contain
313 arbitrary, non-word arguments. Spell checking is also
314 suppressed when in math mode. Thus, for example,
315 given
316
317
318 chapter {This is a Ckapter} cite{SCH86}
319
320
321 ''ispell'' will find
322 ''-t__ option does not recognize the
323 TeX comment character
324 __ispell'' was
325 compiled with __IGNOREBIB__ defined. Otherwise, the
326 bibliography will be checked but the reference key will
327 not.
328
329
330 References for the tib(1) bibliography system, that
331 is, text between a ``[[.'' or ``
332 ''
333
334
335 The __-b__ and __-x__ options control whether
336 ''ispell'' leaves a backup (.bak) file for each input
337 file. The .bak file contains the pre-corrected text. If
338 there are file opening / writing errors, the .bak file may
339 be left for recovery purposes even with the __-x__
340 option. The default for this option is controlled by the
341 DEFNOBACKUPFLAG installation option.
342
343
344 The __-B__ and __-C__ options control how
345 ''ispell'' handles run-together words, such as
346 ''-B__ is
347 specified, such words will be considered as errors, and
348 ''ispell'' will list variations with an inserted blank or
349 hyphen as possible replacements. If __-C__ is specified,
350 run-together words will be considered to be legal compounds,
351 so long as both components are in the dictionary, and each
352 component is at least as long as a language-dependent
353 minimum (3 characters, by default). This is useful for
354 languages such as German and Norwegian, where many compound
355 words are formed by concatenation. (Note that compounds
356 formed from three or more root words will still be
357 considered errors). The default for this option is
358 language-dependent; in a multi-lingual installation the
359 default may vary depending on which dictionary you
360 choose.
361
362
363 The __-P__ and __-m__ options control when
364 ''ispell'' automatically generates suggested root/affix
365 combinations for possible addition to your personal
366 dictionary. (These are the entries in the
367 ''-P__ is
368 specified, such guesses are displayed only if ''ispell''
369 cannot generate any possibilities that match the current
370 dictionary. If __-m__ is specified, such guesses are
371 always displayed. This can be useful if the dictionary has a
372 limited word list, or a word list with few suffixes.
373 However, you should be careful when using this option, as it
374 can generate guesses that produce illegal words. The default
375 for this option is controlled by the dictionary file
376 used.
377
378
379 The __-S__ option suppresses ''ispell'''s normal
380 behavior of sorting the list of possible replacement words.
381 Some people may prefer this, since it somewhat enhances the
382 probability that the correct word will be
383 low-numbered.
384
385
386 The __-d__ option is used to specify an alternate hashed
387 dictionary file, other than the default. If the filename
388 does not contain a
389 __ispell'', a dictionary of ''/dev/null'' is illegal,
390 because the dictionary contains the affix table. If you need
391 an effectively empty dictionary, create a one-entry list
392 with an unlikely string (e.g.,
393 ''
394
395
396 The __-p__ option is used to specify an alternate
397 personal dictionary file. If the file name does not begin
398 with
399 __-p__ switch nor
400 the WORDLIST environment variable is given, ''ispell''
401 will search for a personal dictionary in both the current
402 directory and $HOME, creating one in $HOME if none is found.
403 The preferred name is constructed by appending
404 ''
405
406
407 If the __-p__ option is ''not'' specified,
408 ''ispell'' will look for personal dictionaries in both
409 the current directory and the home directory. If
410 dictionaries exist in both places, they will be merged. If
411 any words are added to the personal dictionary, they will be
412 written to the current directory if a dictionary already
413 existed in that place; otherwise they will be written to the
414 dictionary in the home directory.
415
416
417 The __-w__ option may be used to specify characters other
418 than alphabetics which may also appear in words. For
419 instance, __-w__
420 __
421
422
423 n007n012
424
425
426 Numeric digits other than the three following
427 Ispell''
428 will typically be used with input from a file, meaning that
429 preserving parity for possible 8 bit characters from the
430 input text is OK. If you specify the -l option, and actually
431 type text from the terminal, this may create problems if
432 your stty settings preserve parity.
433
434
435 The __-W__ option may be used to change the length of
436 words that ''ispell'' always accepts as legal. Normally,
437 ''ispell'' will accept all 1-character words as legal,
438 which is equivalent to specifying ''-W 1__.
439 __-W 0__.
440 __-W 3__
441 __ispell'' will only generate words that are in the
442 dictionary as suggested replacements for words; this
443 prevents the list from becoming too long. Obviously, this
444 option can be very dangerous, since short misspellings may
445 be missed. If you use this option a lot, you should probably
446 make a last pass without it before you publish your
447 document, to protect yourself against errors.
448
449
450 The __-T__ option is used to specify a default formatter
451 type for use in generating string characters. This switch
452 overrides the default type determined from the file name.
453 The ''type'' argument may be either one of the unique
454 names defined in the language affix file (e.g.,
455 __nroff__) or a file suffix including the dot (e.g.,
456 __.tex__). If no __-T__ option appears and no type can
457 be determined from the file name, the default string
458 character type declared in the language affix file will be
459 used.
460
461
462 The __-l__ or __ispell'' is
463 used to produce a list of misspelled words from the standard
464 input.
465
466
467 The __-a__ option is intended to be used from other
468 programs through a pipe. In this mode, ''ispell'' prints
469 a one-line version identification message, and then begins
470 reading lines of input. For each input line, a single line
471 is written to the standard output for each word checked for
472 spelling on the line. If the word was found in the main
473 dictionary, or your personal dictionary, then the line
474 contains only a '*'. If the word was found through affix
475 removal, then the line contains a '+', a space, and the root
476 word. If the word was found through compound formation
477 (concatenation of two words, controlled by the __-C__
478 option), then the line contains only a '-'.
479
480
481 If the word is not in the dictionary, but there are near
482 misses, then the line contains an '
483
484
485 [[prefix+] root [[-prefix] [[-suffix] [[+suffix]
486
487
488 (e.g.,
489 pfx'' and ''sfx'' is a string.
490 Also, each near miss or guess is capitalized the same as the
491 input word unless such capitalization is illegal; in the
492 latter case each near miss is capitalized correctly
493 according to the dictionary.
494
495
496 Finally, if the word does not appear in the dictionary, and
497 there are no near misses, then the line contains a '#', a
498 space, the misspelled word, a space, and the character
499 offset from the beginning of the line. Each sentence of text
500 input is terminated with an additional blank line,
501 indicating that ''ispell'' has completed processing the
502 input line.
503
504
505 These output lines can be summarized as
506 follows:
507
508
509 OK:
510
511
512 *
513
514
515 Root:
516
517
518 +
519
520
521 Compound:
522
523
524 -
525
526
527 Miss:
528
529
530
531
532 Guess:
533
534
535 ?
536
537
538 None:
539
540
541 #
542
543
544 For example, a dummy dictionary containing the words
545
546
547 (#) International Ispell Version 3.0.05 (beta), 08/10/91
548
549
550 This mode is also suitable for interactive use when you want
551 to figure out the spelling of a single word.
552
553
554 The __-A__ option works just like __-a__, except that
555 if a line begins with the string
556 __INCLUDE_STRING__ (the ampersands, if any, must be
557 included).
558
559
560 When in the __-a__ mode, ''ispell'' will also accept
561 lines of single words prefixed with any of '*', '
562 ''ispell'' to insert the word into the
563 user's dictionary (similar to the I command). A line
564 starting with '''ispell'' to insert an
565 all-lowercase version of the word into the user's dictionary
566 (similar to the U command). A line starting with '@' causes
567 ''ispell'' to accept this word in the future (similar to
568 the A command). A line starting with '+', followed
569 immediately by __tex__ or __nroff__ will cause
570 ''ispell'' to parse future input according the syntax of
571 that formatter. A line consisting solely of a '+' will place
572 ''ispell'' in TeX/LaTeX mode (similar to the __-t__
573 option) and '-' returns ''ispell'' to nroff/troff mode
574 (but these commands are obsolete). However, string character
575 type is ''not'' changed; the '~' command must be used to
576 do this. A line starting with '~' causes ''ispell'' to
577 set internal parameters (in particular, the default string
578 character type) based on the filename given in the rest of
579 the line. (A file suffix is sufficient, but the period must
580 be included. Instead of a file name or suffix, a unique
581 name, as listed in the language affix file, may be
582 specified.) However, the formatter parsing is ''not''
583 changed; the '+' command must be used to change the
584 formatter. A line prefixed with '#' will cause the personal
585 dictionary to be saved. A line prefixed with '!' will turn
586 on ''terse'' mode (see below), and a line prefixed with
587 '%' will return ''ispell'' to normal (non-terse) mode.
588 Any input following the prefix characters '+', '-', '#',
589 '!', or '%' is ignored, as is any input following the
590 filename on a '~' line. To allow spell-checking of lines
591 beginning with these characters, a line starting with '^'
592 has that character removed before it is passed to the
593 spell-checking code. It is recommended that programmatic
594 interfaces prefix every data line with an uparrow to protect
595 themselves against future changes in
596 ''ispell''.
597
598
599 To summarize these:
600
601
602 *
603
604
605 Add to personal dictionary
606
607
608 @
609
610
611 Accept word, but leave out of dictionary
612
613
614 #
615
616
617 Save current personal dictionary
618
619
620 ~
621
622
623 Set parameters based on filename
624
625
626 +
627
628
629 Enter TeX mode
630
631
632 -
633
634
635 Exit TeX mode
636
637
638 !
639
640
641 Enter terse mode
642
643
644 %
645
646
647 Exit terse mode
648
649
650 ^
651
652
653 Spell-check rest of line
654
655
656 In ''terse'' mode, ''ispell'' will not print lines
657 beginning with '*', '+', or '-', all of which indicate
658 correct words. This significantly improves running speed
659 when the driving program is going to ignore correct words
660 anyway.
661
662
663 The __-s__ option is only valid in conjunction with the
664 __-a__ or __-A__ options, and only on BSD-derived
665 systems. If specified, ''ispell'' will stop itself with a
666 __SIGTSTP__ signal after each line of input. It will not
667 read more input until it receives a __SIGCONT__ signal.
668 This may be useful for handshaking with certain text
669 editors.
670
671
672 The __-f__ option is only valid in conjunction with the
673 __-a__ or __-A__ options. If __-f__ is specified,
674 ''ispell'' will write its results to the given file,
675 rather than to standard output.
676
677
678 The __-v__ option causes ''ispell'' to print its
679 current version identification on the standard output and
680 exit. If the switch is doubled, ''ispell'' will also
681 print the options that it was compiled with.
682
683
684 The __-c__, __-e__[[__1-4__], and __-D__ options
685 of ''ispell'', are primarily intended for use by the
686 ''munchlist'' shell script. The __-c__ switch causes a
687 list of words to be read from the standard input. For each
688 word, a list of possible root words and affixes will be
689 written to the standard output. Some of the root words will
690 be illegal and must be filtered from the output by other
691 means; the ''munchlist'' script does this. As an example,
692 the command:
693
694
695 echo BOTHER | ispell -c
696
697
698 produces:
699
700
701 BOTHER BOTHE/R BOTH/R
702
703
704 The __-e__ switch is the reverse of __-c__; it expands
705 affix flags to produce a list of words. For example, the
706 command:
707
708
709 echo BOTH/R | ispell -e
710
711
712 produces:
713
714
715 BOTH BOTHER
716
717
718 An optional expansion level can also be specified. A level
719 of 1 (__-e1__) is the same as __-e__ alone. A level of
720 2 causes the original root/affix combination to be prepended
721 to the line:
722
723
724 BOTH/R BOTH BOTHER
725
726
727 A level of 3 causes multiple lines to be output, one for
728 each generated word, with the original root/affix
729 combination followed by the word it creates:
730
731
732 BOTH/R BOTH
733 BOTH/R BOTHER
734
735
736 A level of 4 causes a floating-point number to be appended
737 to each of the level-3 lines, giving the ratio between the
738 length of the root and the total length of all generated
739 words including the root:
740
741
742 BOTH/R BOTH 2.500000
743 BOTH/R BOTHER 2.500000
744
745
746 Finally, the __-D__ flag causes the affix tables from the
747 dictionary file to be dumped to standard
748 output.
749
750
751 Unless your system administrator has suppressed the feature
752 to save space, ''ispell'' is aware of the correct
753 capitalizations of words in the dictionary and in your
754 personal dictionary. As well as recognizing words that must
755 be capitalized (e.g., George) and words that must be
756 all-capitals (e.g., NASA), it can also handle words with
757 ''
758
759
760 Normally, this feature will not cause you surprises, but
761 there is one circumstance you need to be aware of. If you
762 use
763 ispell'',
764 and it will suggest the capitalized version. You must then
765 compare the actual spellings by eye, and then type
766 ''
767
768
769 The rules for capitalization are as follows:
770
771
772 (1)
773
774
775 Any word may appear in all capitals, as in
776 headings.
777
778
779 (2)
780
781
782 Any word that is in the dictionary in all-lowercase form may
783 appear either in lowercase or capitalized (as at the
784 beginning of a sentence).
785
786
787 (3)
788
789
790 Any word that has
791
792
793 __buildhash__
794
795
796 The ''buildhash'' program builds hashed dictionary files
797 for later use by ''ispell.'' The raw word list (with
798 affix flags) is given in ''dict-file'', and the the affix
799 flags are defined by ''affix-file''. The hashed output is
800 written to ''hash-file''. The formats of the two input
801 files are described in ispell(5). The __-s__
802 (silent) option suppresses the usual status messages that
803 are written to the standard error device.
804
805
806 __munchlist__
807
808
809 The ''munchlist'' shell script is used to reduce the size
810 of dictionary files, primarily personal dictionary files. It
811 is also capable of combining dictionaries from various
812 sources. The given ''files'' are read (standard input if
813 no arguments are given), reduced to a minimal set of roots
814 and affixes that will match the same list of words, and
815 written to standard output.
816
817
818 Input for munchlist contains of raw words (e.g from your
819 personal dictionary files) or root and affix combinations
820 (probably generated in earlier munchlist runs). Each word or
821 root/affix combination must be on a separate
822 line.
823
824
825 The __-D__ (debug) option leaves temporary files around
826 under standard names instead of deleting them, so that the
827 script can be debugged. Warning: this option can eat up an
828 enormous amount of temporary file space.
829
830
831 The __-v__ (verbose) option causes progress messages to
832 be reported to stderr so you won't get nervous that
833 ''munchlist'' has hung.
834
835
836 If the __-s__ (strip) option is specified, words that are
837 in the specified ''hash-file'' are removed from the word
838 list. This can be useful with personal
839 dictionaries.
840
841
842 The __-l__ option can be used to specify an alternate
843 ''affix-file'' for munching dictionaries in languages
844 other than English.
845
846
847 The __-c__ option can be used to convert dictionaries
848 that were built with an older affix file, without risk of
849 accidentally introducing unintended affix combinations into
850 the dictionary.
851
852
853 The __-T__ option allows dictionaries to be converted to
854 a canonical string-character format. The suffix specified is
855 looked up in the affix file (__-l__ switch) to determine
856 the string-character format used for the input file; the
857 output always uses the canonical string-character format.
858 For example, a dictionary collected from TeX source files
859 might be converted to canonical format by specifying __-T
860 tex__.
861
862
863 The __-w__ option is passed on to
864 ''ispell''.
865
866
867 __findaffix__
868
869
870 The ''findaffix'' shell script is an aid to writers of
871 new language descriptions in choosing affixes. The given
872 dictionary ''files'' (standard input if none are given)
873 are examined for possible prefixes (__-p__ switch) or
874 suffixes (__-s__ switch, the default). Each
875 commonly-occurring affix is presented along with a count of
876 the number of times it appears and an estimate of the number
877 of bytes that would be saved in a dictionary hash file if it
878 were added to the language table. Only affixes that generate
879 legal roots (found in the original input) are
880 listed.
881
882
883 If the
884
885
886 strip/add/count/bytes
887
888
889 where ''strip'' is the string that should be stripped
890 from a root word before adding the affix, ''add'' is the
891 affix to be added, ''count'' is a count of the number of
892 times that this ''strip''/''add'' combination appears,
893 and ''bytes'' is an estimate of the number of bytes that
894 might be saved in the raw dictionary file if this
895 combination is added to the affix file. The field separator
896 in the output will be the tab character specified by the
897 __-t__ switch; the default is a slash
898 (__
899
900
901 If the __-c__ (
902 __
903
904
905 -strip+add
906
907
908 where ''strip'', ''add'', ''count'', and
909 ''bytes'' are as before, and ''''
910 represents the ASCII tab character.
911
912
913 The method used to generate possible affixes will also
914 generate longer affixes which have common headers or
915 trailers. For example, the two words
916 min''). To prevent cluttering
917 the output with such affixes, any affix pair that shares a
918 common header (or, for prefixes, trailer) string longer than
919 ''elim'' characters (default 1) will be suppressed. You
920 may want to set
921 ''findaffix'' run.
922
923
924 Normally, the affixes are sorted according to the estimate
925 of bytes saved. The __-f__ switch may be used to cause
926 the affixes to be sorted by frequency of
927 appearance.
928
929
930 To save output file space, affixes which occur fewer than 10
931 times are eliminated; this limit may be changed with the
932 __-l__ switch. The __-M__ switch specifies a maximum
933 affix length (default 8). Affixes longer than this will not
934 be reported. (This saves on temporary disk space and makes
935 the script run faster.)
936
937
938 Affixes which generate stems shorter than 3 characters are
939 suppressed. (A stem is the word after the ''strip''
940 string has been removed, and before the ''add'' string
941 has been added.) This reduces both the running time and the
942 size of the output file. This limit may be changed with the
943 __-m__ switch. The minimum stem length should only be set
944 to 1 if you have a ''lot'' of free time and disk space
945 (in the range of many days and hundreds of
946 megabytes).
947
948
949 The ''findaffix'' script requires a non-blank
950 field-separator character for internal use. Normally, this
951 character is a slash (
952 ''-t__
953 switch.
954
955
956 Ispell dictionaries should be expanded before being fed to
957 ''findaffix''; in addition, characters that are not in
958 the English alphabet (if any) should be translated to
959 lowercase.
960
961
962 __tryaffix__
963
964
965 The ''tryaffix'' shell script is used to estimate the
966 effectiveness of a proposed prefix (__-p__ switch) or
967 suffix (__-s__ switch, the default) with a given
968 ''expanded-file''. Only one affix can be tried with each
969 execution of ''tryaffix'', although multiple arguments
970 can be used to describe varying forms of the same affix flag
971 (e.g., the __D__ flag for English can add either ''D''
972 or ''ED'' depending on whether a trailing E is already
973 present). Each word in the expanded dictionary that ends (or
974 begins) with the chosen suffix (or prefix) has that suffix
975 (prefix) removed; the dictionary is then searched for root
976 words that match the stripped word. Normally, all matching
977 roots are written to standard output, but if the __-c__
978 (count) flag is given, only a statistical summary of the
979 results is written. The statistics given are a count of
980 words the affix potentially applies to and an estimate of
981 the number of dictionary bytes that a flag using the affix
982 would save. The estimate will be high if the flag generates
983 words that are currently generated by other affix flags
984 (e.g., in English, ''bathers'' can be generated by either
985 ''bath/X'' or ''bather/S'').
986
987
988 The dictionary file, ''expanded-file'', must already be
989 expanded (using the __-e__ switch of ''ispell'') and
990 sorted, and things will usually work best if uppercase has
991 been folded to lower with 'tr'.
992
993
994 The ''affix'' arguments are things to be stripped from
995 the dictionary file to produce trial roots: for English,
996 ''con'' (prefix) and ''ing'' (suffix) are examples.
997 The ''addition'' parts of the argument are letters that
998 would have been stripped off the root before adding the
999 affix. For example, in English the affix ''ing'' normally
1000 strips ''e'' for words ending in that letter (e.g.,
1001 ''like'' becomes ''liking'') so we might
1002 run:
1003
1004
1005 tryaffix ing ing+e
1006
1007
1008 to cover both cases.
1009
1010
1011 All of the shell scripts contain documentation as commentary
1012 at the beginning; sometimes these comments contain useful
1013 information beyond the scope of this manual
1014 page.
1015
1016
1017 It is possible to install ''ispell'' in such a way as to
1018 only support ASCII range text if desired.
1019
1020
1021 __icombine__
1022
1023
1024 The ''icombine'' program is a helper for
1025 ''munchlist''. It reads a list of words in dictionary
1026 format (roots plus flags) from the standard input, and
1027 produces a reduced list on standard output which combines
1028 common roots found on adjacent entries. Identical roots
1029 which have differing flags will have their flags combined,
1030 and roots which have differing capitalizations will be
1031 combined in a way which only preserves important
1032 capitalization information. The optional ''aff-file''
1033 specifies a language file which defines the character sets
1034 used and the meanings of the various flags. The __-T__
1035 switch can be used to select among alternative string
1036 character types by giving a dummy suffix that can be found
1037 in an __altstringtype__ statement.
1038
1039
1040 __ijoin__
1041
1042
1043 The ''ijoin'' program is a re-implementation of
1044 join(1) which handles long lines and 8-bit characters
1045 correctly. The __-s__ switch specifies that the
1046 sort(1) program used to prepare the input to
1047 ''ijoin'' uses signed comparisons on 8-bit characters;
1048 the __-u__ switch specifies that sort(1) uses
1049 unsigned comparisons. All other options and behaviors of
1050 join(1) are duplicated as exactly as possible based
1051 on the manual page, except that ''ijoin'' will not handle
1052 newline as a field separator. See the join(1) manual
1053 page for more information.
1054 !!ENVIRONMENT
1055
1056
1057 DICTIONARY
1058
1059
1060 Default dictionary to use, if no __-d__ flag is
1061 given.
1062
1063
1064 WORDLIST
1065
1066
1067 Personal dictionary file name
1068
1069
1070 INCLUDE_STRING
1071
1072
1073 Code for file inclusion under the __-A__
1074 option
1075
1076
1077 TMPDIR
1078
1079
1080 Directory used for some of munchlist's temporary
1081 files
1082 !!FILES
1083
1084
1085 /usr/lib/ispell/default.hash
1086
1087
1088 Hashed dictionary (may be found in some other local
1089 directory, depending on the system).
1090
1091
1092 /usr/lib/ispell/default.aff
1093
1094
1095 Affix-definition file for ''munchlist''
1096
1097
1098 /usr/dict/web2 or /usr/share/dict/words
1099
1100
1101 For the Lookup function (depending on the WORDS compilation
1102 option).
1103
1104
1105 $HOME/.ispell_''hashfile''
1106
1107
1108 User's private dictionary
1109
1110
1111 .ispell_''hashfile''
1112
1113
1114 Directory-specific private dictionary
1115 !!SEE ALSO
1116
1117
1118 spell(1), egrep(1), look(1),
1119 join(1), sort(1), ''sq''(1L),
1120 ''tib''(1L), ''ispell''(5L),
1121 ''english''(5L)
1122 !!BUGS
1123
1124
1125 It takes several to many seconds for ''ispell'' to read
1126 in the hash table, depending on size.
1127
1128
1129 When all options are enabled, ''ispell'' may take several
1130 seconds to generate all the guesses at corrections for a
1131 misspelled word; on slower machines this time is long enough
1132 to be annoying.
1133
1134
1135 The hash table is stored as a quarter-megabyte (or larger)
1136 array, so a PDP-11 or 286 version does not seem
1137 likely.
1138
1139
1140 ''Ispell'' should understand more ''troff'' syntax,
1141 and deal more intelligently with contractions.
1142
1143
1144 Although small personal dictionaries are sorted before they
1145 are written out, the order of capitalizations of the same
1146 word is somewhat random.
1147
1148
1149 When the __-x__ flag is specified, ''ispell'' will
1150 unlink any existing .bak file.
1151
1152
1153 There are too many flags, and many of them have non-mnemonic
1154 names.
1155
1156
1157 ''Munchlist'' does not deal very gracefully with
1158 dictionaries which contain
1159 ''
1160
1161
1162 ''Findaffix'' and ''munchlist'' require tremendous
1163 amounts of temporary file space for large dictionaries. They
1164 do respect the TMPDIR environment variable, so this space
1165 can be redirected. However, a lot of the temporary space
1166 needed is for sorting, so TMPDIR is only a partial help on
1167 systems with an uncooperative sort(1).
1168 (
1169 ''munchlist''
1170 takes 10 to 40 times the original dictionary's size in Kb.
1171 (The larger ratio is for dictionaries that already have
1172 heavy affix use, such as the one distributed with
1173 ''ispell''). ''Munchlist'' is also very slow; munching
1174 a normal-sized dictionary (15K roots, 45K expanded words)
1175 takes around an hour on a small workstation. (Most of this
1176 time is spent in sort(1), and ''munchlist'' can
1177 run much faster on machines that have a more modern
1178 ''sort'' that makes better use of the memory available to
1179 it.) ''Findaffix'' is even worse; the smallest English
1180 dictionary cannot be processed with this script in a mere
1181 50Kb of free space, and even after specifying switches to
1182 reduce the temporary space required, the script will run for
1183 over 24 hours on a small workstation.
1184 !!AUTHOR
1185
1186
1187 Pace Willisson (pace@mit-vax), 1983, based on the PDP-10
1188 assembly version. That version was written by R. E. Gorin in
1189 1971, and later revised by W. E. Matson (1974) and W. B.
1190 Ackerman (1978).
1191
1192
1193 Collected, revised, and enhanced for the Usenet by Walt
1194 Buehring, 1987.
1195
1196
1197 Table-driven multi-lingual version by Geoff Kuenning,
1198 1987-88.
1199
1200
1201 Large dictionaries provided by Bob Devine
1202 (vianet!devine).
1203
1204
1205 A complete list of contributors is too large to list here,
1206 but is distributed with the ispell sources in the file
1207 !!VERSION
1208
1209
1210 The version of ispell described by this manual page is
1211 International Ispell Version 3.1.00, 10/08/93.
1212 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.