Penguin
Annotated edit history of file(1) version 1, including all changes. View license author blame.
Rev Author # Line
1 perry 1 FILE
2 !!!FILE
3 NAME
4 SYNOPSIS
5 DESCRIPTION
6 OPTIONS
7 FILES
8 ENVIRONMENT
9 SEE ALSO
10 STANDARDS CONFORMANCE
11 MAGIC DIRECTORY
12 EXAMPLES
13 HISTORY
14 LEGAL NOTICE
15 BUGS
16 AVAILABILITY
17 ----
18 !!NAME
19
20
21 file - determine file type
22 !!SYNOPSIS
23
24
25 __file__ [[ __-bciknsvzL__ ] [[ __-f__
26 ''namefile'' ] [[ __-m__ ''magicfiles'' ]
27 ''file'' ...__
28 file -C__ [[ __-m__ magicfile ]
29 !!DESCRIPTION
30
31
32 This manual page documents version 3.37-3.1 of the
33 __file__ command.
34
35
36 __File__ tests each argument in an attempt to classify
37 it. There are three sets of tests, performed in this order:
38 filesystem tests, magic number tests, and language tests.
39 The ''first'' test that succeeds causes the file type to
40 be printed.
41
42
43 The type printed will usually contain one of the words
44 __text__ (the file contains only printing characters and
45 a few common control characters and is probably safe to read
46 on an ASCII terminal), __executable__ (the
47 file contains the result of compiling a program in a form
48 understandable to some UNIX kernel or
49 another), or __data__ meaning anything else (data is
50 usually `binary' or non-printable). Exceptions are
51 well-known file formats (core files, tar archives) that are
52 known to contain binary data. When adding local definitions
53 to ''/etc/magic'', __preserve these keywords__. People
54 depend on knowing that all the readable files in a directory
55 have the word ``text'' printed. Don't do as Berkeley did and
56 change ``shell commands text'' to ``shell script''. Note
57 that the file ''/usr/share/misc/magic'' is built
58 mechanically from a large number of small files in the
59 subdirectory ''Magdir'' in the source distribution of
60 this program.
61
62
63 The filesystem tests are based on examining the return from
64 a stat(2) system call. The program checks to see if
65 the file is empty, or if it's some sort of special file. Any
66 known file types appropriate to the system you are running
67 on (sockets, symbolic links, or named pipes (FIFOs) on those
68 systems that implement them) are intuited if they are
69 defined in the system header file
70 ''''.
71
72
73 The magic number tests are used to check for files with data
74 in particular fixed formats. The canonical example of this
75 is a binary executable (compiled program) ''a.out'' file,
76 whose format is defined in ''a.out.h'' and possibly
77 ''exec.h'' in the standard include directory. These files
78 have a `magic number' stored in a particular place near the
79 beginning of the file that tells the UNIX
80 operating system that the file is a binary executable, and
81 which of several types thereof. The concept of `magic
82 number' has been applied by extension to data files. Any
83 file with some invariant identifier at a small fixed offset
84 into the file can usually be described in this way. The
85 information identifying these files is read from
86 ''/etc/magic'' and the compiled magic file
87 ''/usr/share/misc/magic.mgc ,'' or
88 ''/usr/share/misc/magic'' if the compile file does not
89 exist.
90
91
92 If a file does not match any of the entries in the magic
93 file, it is examined to see if it seems to be a text file.
94 ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character
95 sets (such as those used on Macintosh and IBM PC systems),
96 UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
97 character sets can be distinguished by the different ranges
98 and sequences of bytes that constitute printable text in
99 each set. If a file passes any of these tests, its character
100 set is reported. ASCII, ISO-8859-x, UTF-8, and
101 extended-ASCII files are identified as ``text'' because they
102 will be mostly readable on nearly any terminal; UTF-16 and
103 EBCDIC are only ``character data'' because, while they
104 contain text, it is text that will require translation
105 before it can be read. In addition, __file__ will attempt
106 to determine other characteristics of text-type files. If
107 the lines of a file are terminated by CR, CRLF, or NEL,
108 instead of the Unix-standard LF, this will be reported.
109 Files that contain embedded escape sequences or overstriking
110 will also be identified.
111
112
113 Once __file__ has determined the character set used in a
114 text-type file, it will attempt to determine in what
115 language the file is written. The language tests look for
116 particular strings (cf ''names.h'') that can appear
117 anywhere in the first few blocks of a file. For example, the
118 keyword __.br__ indicates that the file is most likely a
119 troff(1) input file, just as the keyword
120 __struct__ indicates a C program. These tests are less
121 reliable than the previous two groups, so they are performed
122 last. The language test routines also test for some
123 miscellany (such as tar(1) archives).
124
125
126 Any file that cannot be identified as having been written in
127 any of the character sets listed above is simply said to be
128 ``data''.
129 !!OPTIONS
130
131
132 ''-b, --brief''
133
134
135 Do not prepend filenames to output lines.
136
137
138 ''-c, --checking-printout''
139
140
141 Cause a checking printout of the parsed form of the magic
142 file. This is usually used in conjunction with __-m__ to
143 debug a new magic file before installing it.
144
145
146 ''-C, --compile''
147
148
149 Write a magic.mgc output file that contains a pre-parsed
150 version of file.
151
152
153 ''-f, --files-from namefile''
154
155
156 Read the names of the files to be examined from
157 ''namefile'' (one per line) before the argument list.
158 Either ''namefile'' or at least one filename argument
159 must be present; to test the standard input, use ``-'' as a
160 filename argument.
161
162
163 ''-i, --mime''
164
165
166 Causes the file command to output mime type strings rather
167 than the more traditional human readable ones. Thus it may
168 say ``text/plain; charset=us-ascii'' rather than ``ASCII
169 text''. In order for this option to work, file changes the
170 way it handles files recognised by the command itself (such
171 as many of the text file types, directories etc), and makes
172 use of an alternative ``magic'' file. (See ``FILES''
173 section, below).
174
175
176 ''-k, --keep-going''
177
178
179 Don't stop at the first match, keep going.
180
181
182 ''-m, --magic-file list''
183
184
185 Specify an alternate list of files containing magic numbers.
186 This can be a single file, or a colon-separated list of
187 files.
188
189
190 ''-n, --no-buffer''
191
192
193 Force stdout to be flushed after checking each file. This is
194 only useful if checking a list of files. It is intended to
195 be used by programs that want filetype output from a
196 pipe.
197
198
199 __-v__
200
201
202 Print the version of the program and exit.
203
204
205 ''-z, --uncompress''
206
207
208 Try to look inside compressed files.
209
210
211 ''-L, --dereference''
212
213
214 This option causes symlinks to be followed, as the
215 like-named option in ls(1). (on systems that support
216 symbolic links).
217
218
219 ''-s, --special-files''
220
221
222 Normally, __file__ only attempts to read and determine
223 the type of argument files which stat(2) reports are
224 ordinary files. This prevents problems, because reading
225 special files may have peculiar consequences. Specifying the
226 __-s__ option causes __file__ to also read argument
227 files which are block or character special files. This is
228 useful for determining the filesystem types of the data in
229 raw disk partitions, which are block special files. This
230 option also causes __file__ to disregard the file size as
231 reported by stat(2) since on some systems it reports
232 a zero size for raw disk partitions.
233
234
235 ''--help''
236
237
238 Print a help message and exit.
239
240
241 ''--version''
242
243
244 Print version information and exit.
245 !!FILES
246
247
248 ''/usr/share/misc/magic.mgc''
249
250
251 Default compiled list of magic numbers
252
253
254 ''/usr/share/misc/magic''
255
256
257 Default list of magic numbers
258
259
260 ''/usr/share/misc/magic.mime''
261
262
263 Default list of magic numbers, used to output mime types
264 when the -i option is specified.
265
266
267 ''/etc/magic''
268
269
270 Local additions to magic wisdom.
271 !!ENVIRONMENT
272
273
274 The environment variable __MAGIC__ can be used to set the
275 default magic number files.
276 !!SEE ALSO
277
278
279 magic(5) - description of magic file format.__
280 strings__(1), od(1), __hexdump(1)__ - tools for
281 examining non-textfiles.
282 !!STANDARDS CONFORMANCE
283
284
285 This program is believed to exceed the System V Interface
286 Definition of FILE(CMD), as near as one can determine from
287 the vague language contained therein. Its behaviour is
288 mostly compatible with the System V program of the same
289 name. This version knows more magic, however, so it will
290 produce different (albeit more accurate) output in many
291 cases.
292
293
294 The one significant difference between this version and
295 System V is that this version treats any white space as a
296 delimiter, so that spaces in pattern strings must be
297 escaped. For example,
298 in an existing magic file would have to be changed to
299 In addition, in this version, if a pattern string contains a
300 backslash, it must be escaped. For example
301 0 string begindata Andrew Toolkit document
302 in an existing magic file would have to be changed to
303 0 string \begindata Andrew Toolkit document
304
305
306 SunOS releases 3.2 and later from Sun Microsystems include a
307 file(1) command derived from the System V one, but
308 with some extensions. My version differs from Sun's only in
309 minor ways. It includes the extension of the `
310 __
311 __
312 !!MAGIC DIRECTORY
313
314
315 The magic file entries have been collected from various
316 sources, mainly USENET, and contributed by various authors.
317 Christos Zoulas (address below) will collect additional or
318 corrected magic file entries. A consolidation of magic file
319 entries will be distributed periodically.
320
321
322 The order of entries in the magic file is significant.
323 Depending on what system you are using, the order that they
324 are put together may be incorrect.
325 !!EXAMPLES
326
327
328 $ file file.c file /dev/hda
329 file.c: C program text
330 file: ELF 32-bit LSB executable, Intel 80386, version 1,
331 dynamically linked, not stripped
332 /dev/hda: block special
333 $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
334 /dev/hda: x86 boot sector
335 /dev/hda1: Linux/i386 ext2 filesystem
336 /dev/hda2: x86 boot sector
337 /dev/hda3: x86 boot sector, extended partition table
338 /dev/hda4: Linux/i386 ext2 filesystem
339 /dev/hda5: Linux/i386 swap file
340 /dev/hda6: Linux/i386 swap file
341 /dev/hda7: Linux/i386 swap file
342 /dev/hda8: Linux/i386 swap file
343 /dev/hda9: empty
344 /dev/hda10: empty
345 $ file -i file.c file /dev/hda
346 file.c: text/x-c
347 file: application/x-executable, dynamically linked (uses shared libs), not stripped
348 /dev/hda: application/x-not-regular-file
349 !!HISTORY
350
351
352 There has been a __file__ command in every
353 UNIX since at least Research Version 6 (man
354 page dated January 16, 1975). The System V version
355 introduced one significant major change: the external list
356 of magic number types. This slowed the program down slightly
357 but made it a lot more flexible.
358
359
360 This program, based on the System V version, was written by
361 Ian Darwin
362
363
364 John Gilmore revised the code extensively, making it better
365 than the first version. Geoff Collyer found several
366 inadequacies and provided some magic file entries.
367 Contributions by the `
368
369
370 Guy Harris, guy@netapp.com, made many changes from 1993 to
371 the present.
372
373
374 Primary development and maintenance from 1990 to the present
375 by Christos Zoulas (christos@astron.com).
376
377
378 Altered by Chris Lowth, chris@lowth.com, 2000: Handle the
379 ``-i'' option to output mime type strings and using an
380 alternative magic file and internal logic.
381
382
383 Altered by Eric Fischer (enf@pobox.com), July, 2000, to
384 identify character codes and attempt to identify the
385 languages of non-ASCII files.
386
387
388 The list of contributors to the
389 !!LEGAL NOTICE
390
391
392 Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
393 Covered by the standard Berkeley Software Distribution
394 copyright; see the file LEGAL.NOTICE in the source
395 distribution.
396
397
398 The files ''tar.h'' and ''is_tar.c'' were written by
399 John Gilmore from his public-domain __tar__ program, and
400 are not covered by the above license.
401 !!BUGS
402
403
404 There must be a better way to automate the construction of
405 the Magic file from all the glop in Magdir. What is it?
406 Better yet, the magic file should be compiled into binary
407 (say, ndbm(3) or, better yet, fixed-length
408 ASCII strings for use in heterogenous network
409 environments) for faster startup. Then the program would run
410 as fast as the Version 7 program of the same name, with the
411 flexibility of the System V version.
412
413
414 __File__ uses several algorithms that favor speed over
415 accuracy, thus it can be misled about the contents of text
416 files.
417
418
419 The support for text files (primarily for programming
420 languages) is simplistic, inefficient and requires
421 recompilation to update.
422
423
424 There should be an ``else'' clause to follow a series of
425 continuation lines.
426
427
428 The magic file and keywords should have regular expression
429 support. Their use of ASCII TAB as a field
430 delimiter is ugly and makes it hard to edit the files, but
431 is entrenched.
432
433
434 It might be advisable to allow upper-case letters in
435 keywords for e.g., troff(1) commands vs man page
436 macros. Regular expression support would make this
437 easy.
438
439
440 The program doesn't grok FORTRAN . It should
441 be able to figure FORTRAN by seeing some
442 keywords which appear indented at the start of line. Regular
443 expression support would make this easy.
444
445
446 The list of keywords in ''ascmagic'' probably belongs in
447 the Magic file. This could be done by using some keyword
448 like `*' for the offset value.
449
450
451 Another optimisation would be to sort the magic file so that
452 we can just run down all the tests for the first byte, first
453 word, first long, etc, once we have fetched it. Complain
454 about conflicts in the magic file entries. Make a rule that
455 the magic entries sort based on file offset rather than
456 position within the magic file?
457
458
459 The program should provide a way to give an estimate of
460 ``how good'' a guess is. We end up removing guesses (e.g.
461 ``From '' as first 5 chars of file) because they are not as
462 good as other guesses (e.g. ``Newsgroups:'' versus
463 ``Return-Path:''). Still, if the others don't pan out, it
464 should be possible to use the first guess.
465
466
467 This program is slower than some vendors' file commands. The
468 new support for multiple character codes makes it even
469 slower.
470
471
472 This manual page, and particularly this section, is too
473 long.
474 !!AVAILABILITY
475
476
477 You can obtain the original author's latest version by
478 anonymous FTP on __ftp.astron.com__ in the directory
479 ''/pub/file/file-X.YY.tar.gz''
480
481
482 This __Debian__ version adds long options and corrects
483 some bugs. It can be obtained from every site carrying a
484 __Debian__ distribution (ftp.debian.org and
485 mirrors).
486 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.