version 1 showing authors affecting page license.
.
Rev |
Author |
# |
Line |
1 |
perry |
1 |
FILE |
|
|
2 |
!!!FILE |
|
|
3 |
NAME |
|
|
4 |
SYNOPSIS |
|
|
5 |
DESCRIPTION |
|
|
6 |
OPTIONS |
|
|
7 |
FILES |
|
|
8 |
ENVIRONMENT |
|
|
9 |
SEE ALSO |
|
|
10 |
STANDARDS CONFORMANCE |
|
|
11 |
MAGIC DIRECTORY |
|
|
12 |
EXAMPLES |
|
|
13 |
HISTORY |
|
|
14 |
LEGAL NOTICE |
|
|
15 |
BUGS |
|
|
16 |
AVAILABILITY |
|
|
17 |
---- |
|
|
18 |
!!NAME |
|
|
19 |
|
|
|
20 |
|
|
|
21 |
file - determine file type |
|
|
22 |
!!SYNOPSIS |
|
|
23 |
|
|
|
24 |
|
|
|
25 |
__file__ [[ __-bciknsvzL__ ] [[ __-f__ |
|
|
26 |
''namefile'' ] [[ __-m__ ''magicfiles'' ] |
|
|
27 |
''file'' ...__ |
|
|
28 |
file -C__ [[ __-m__ magicfile ] |
|
|
29 |
!!DESCRIPTION |
|
|
30 |
|
|
|
31 |
|
|
|
32 |
This manual page documents version 3.37-3.1 of the |
|
|
33 |
__file__ command. |
|
|
34 |
|
|
|
35 |
|
|
|
36 |
__File__ tests each argument in an attempt to classify |
|
|
37 |
it. There are three sets of tests, performed in this order: |
|
|
38 |
filesystem tests, magic number tests, and language tests. |
|
|
39 |
The ''first'' test that succeeds causes the file type to |
|
|
40 |
be printed. |
|
|
41 |
|
|
|
42 |
|
|
|
43 |
The type printed will usually contain one of the words |
|
|
44 |
__text__ (the file contains only printing characters and |
|
|
45 |
a few common control characters and is probably safe to read |
|
|
46 |
on an ASCII terminal), __executable__ (the |
|
|
47 |
file contains the result of compiling a program in a form |
|
|
48 |
understandable to some UNIX kernel or |
|
|
49 |
another), or __data__ meaning anything else (data is |
|
|
50 |
usually `binary' or non-printable). Exceptions are |
|
|
51 |
well-known file formats (core files, tar archives) that are |
|
|
52 |
known to contain binary data. When adding local definitions |
|
|
53 |
to ''/etc/magic'', __preserve these keywords__. People |
|
|
54 |
depend on knowing that all the readable files in a directory |
|
|
55 |
have the word ``text'' printed. Don't do as Berkeley did and |
|
|
56 |
change ``shell commands text'' to ``shell script''. Note |
|
|
57 |
that the file ''/usr/share/misc/magic'' is built |
|
|
58 |
mechanically from a large number of small files in the |
|
|
59 |
subdirectory ''Magdir'' in the source distribution of |
|
|
60 |
this program. |
|
|
61 |
|
|
|
62 |
|
|
|
63 |
The filesystem tests are based on examining the return from |
|
|
64 |
a stat(2) system call. The program checks to see if |
|
|
65 |
the file is empty, or if it's some sort of special file. Any |
|
|
66 |
known file types appropriate to the system you are running |
|
|
67 |
on (sockets, symbolic links, or named pipes (FIFOs) on those |
|
|
68 |
systems that implement them) are intuited if they are |
|
|
69 |
defined in the system header file |
|
|
70 |
''''. |
|
|
71 |
|
|
|
72 |
|
|
|
73 |
The magic number tests are used to check for files with data |
|
|
74 |
in particular fixed formats. The canonical example of this |
|
|
75 |
is a binary executable (compiled program) ''a.out'' file, |
|
|
76 |
whose format is defined in ''a.out.h'' and possibly |
|
|
77 |
''exec.h'' in the standard include directory. These files |
|
|
78 |
have a `magic number' stored in a particular place near the |
|
|
79 |
beginning of the file that tells the UNIX |
|
|
80 |
operating system that the file is a binary executable, and |
|
|
81 |
which of several types thereof. The concept of `magic |
|
|
82 |
number' has been applied by extension to data files. Any |
|
|
83 |
file with some invariant identifier at a small fixed offset |
|
|
84 |
into the file can usually be described in this way. The |
|
|
85 |
information identifying these files is read from |
|
|
86 |
''/etc/magic'' and the compiled magic file |
|
|
87 |
''/usr/share/misc/magic.mgc ,'' or |
|
|
88 |
''/usr/share/misc/magic'' if the compile file does not |
|
|
89 |
exist. |
|
|
90 |
|
|
|
91 |
|
|
|
92 |
If a file does not match any of the entries in the magic |
|
|
93 |
file, it is examined to see if it seems to be a text file. |
|
|
94 |
ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character |
|
|
95 |
sets (such as those used on Macintosh and IBM PC systems), |
|
|
96 |
UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC |
|
|
97 |
character sets can be distinguished by the different ranges |
|
|
98 |
and sequences of bytes that constitute printable text in |
|
|
99 |
each set. If a file passes any of these tests, its character |
|
|
100 |
set is reported. ASCII, ISO-8859-x, UTF-8, and |
|
|
101 |
extended-ASCII files are identified as ``text'' because they |
|
|
102 |
will be mostly readable on nearly any terminal; UTF-16 and |
|
|
103 |
EBCDIC are only ``character data'' because, while they |
|
|
104 |
contain text, it is text that will require translation |
|
|
105 |
before it can be read. In addition, __file__ will attempt |
|
|
106 |
to determine other characteristics of text-type files. If |
|
|
107 |
the lines of a file are terminated by CR, CRLF, or NEL, |
|
|
108 |
instead of the Unix-standard LF, this will be reported. |
|
|
109 |
Files that contain embedded escape sequences or overstriking |
|
|
110 |
will also be identified. |
|
|
111 |
|
|
|
112 |
|
|
|
113 |
Once __file__ has determined the character set used in a |
|
|
114 |
text-type file, it will attempt to determine in what |
|
|
115 |
language the file is written. The language tests look for |
|
|
116 |
particular strings (cf ''names.h'') that can appear |
|
|
117 |
anywhere in the first few blocks of a file. For example, the |
|
|
118 |
keyword __.br__ indicates that the file is most likely a |
|
|
119 |
troff(1) input file, just as the keyword |
|
|
120 |
__struct__ indicates a C program. These tests are less |
|
|
121 |
reliable than the previous two groups, so they are performed |
|
|
122 |
last. The language test routines also test for some |
|
|
123 |
miscellany (such as tar(1) archives). |
|
|
124 |
|
|
|
125 |
|
|
|
126 |
Any file that cannot be identified as having been written in |
|
|
127 |
any of the character sets listed above is simply said to be |
|
|
128 |
``data''. |
|
|
129 |
!!OPTIONS |
|
|
130 |
|
|
|
131 |
|
|
|
132 |
''-b, --brief'' |
|
|
133 |
|
|
|
134 |
|
|
|
135 |
Do not prepend filenames to output lines. |
|
|
136 |
|
|
|
137 |
|
|
|
138 |
''-c, --checking-printout'' |
|
|
139 |
|
|
|
140 |
|
|
|
141 |
Cause a checking printout of the parsed form of the magic |
|
|
142 |
file. This is usually used in conjunction with __-m__ to |
|
|
143 |
debug a new magic file before installing it. |
|
|
144 |
|
|
|
145 |
|
|
|
146 |
''-C, --compile'' |
|
|
147 |
|
|
|
148 |
|
|
|
149 |
Write a magic.mgc output file that contains a pre-parsed |
|
|
150 |
version of file. |
|
|
151 |
|
|
|
152 |
|
|
|
153 |
''-f, --files-from namefile'' |
|
|
154 |
|
|
|
155 |
|
|
|
156 |
Read the names of the files to be examined from |
|
|
157 |
''namefile'' (one per line) before the argument list. |
|
|
158 |
Either ''namefile'' or at least one filename argument |
|
|
159 |
must be present; to test the standard input, use ``-'' as a |
|
|
160 |
filename argument. |
|
|
161 |
|
|
|
162 |
|
|
|
163 |
''-i, --mime'' |
|
|
164 |
|
|
|
165 |
|
|
|
166 |
Causes the file command to output mime type strings rather |
|
|
167 |
than the more traditional human readable ones. Thus it may |
|
|
168 |
say ``text/plain; charset=us-ascii'' rather than ``ASCII |
|
|
169 |
text''. In order for this option to work, file changes the |
|
|
170 |
way it handles files recognised by the command itself (such |
|
|
171 |
as many of the text file types, directories etc), and makes |
|
|
172 |
use of an alternative ``magic'' file. (See ``FILES'' |
|
|
173 |
section, below). |
|
|
174 |
|
|
|
175 |
|
|
|
176 |
''-k, --keep-going'' |
|
|
177 |
|
|
|
178 |
|
|
|
179 |
Don't stop at the first match, keep going. |
|
|
180 |
|
|
|
181 |
|
|
|
182 |
''-m, --magic-file list'' |
|
|
183 |
|
|
|
184 |
|
|
|
185 |
Specify an alternate list of files containing magic numbers. |
|
|
186 |
This can be a single file, or a colon-separated list of |
|
|
187 |
files. |
|
|
188 |
|
|
|
189 |
|
|
|
190 |
''-n, --no-buffer'' |
|
|
191 |
|
|
|
192 |
|
|
|
193 |
Force stdout to be flushed after checking each file. This is |
|
|
194 |
only useful if checking a list of files. It is intended to |
|
|
195 |
be used by programs that want filetype output from a |
|
|
196 |
pipe. |
|
|
197 |
|
|
|
198 |
|
|
|
199 |
__-v__ |
|
|
200 |
|
|
|
201 |
|
|
|
202 |
Print the version of the program and exit. |
|
|
203 |
|
|
|
204 |
|
|
|
205 |
''-z, --uncompress'' |
|
|
206 |
|
|
|
207 |
|
|
|
208 |
Try to look inside compressed files. |
|
|
209 |
|
|
|
210 |
|
|
|
211 |
''-L, --dereference'' |
|
|
212 |
|
|
|
213 |
|
|
|
214 |
This option causes symlinks to be followed, as the |
|
|
215 |
like-named option in ls(1). (on systems that support |
|
|
216 |
symbolic links). |
|
|
217 |
|
|
|
218 |
|
|
|
219 |
''-s, --special-files'' |
|
|
220 |
|
|
|
221 |
|
|
|
222 |
Normally, __file__ only attempts to read and determine |
|
|
223 |
the type of argument files which stat(2) reports are |
|
|
224 |
ordinary files. This prevents problems, because reading |
|
|
225 |
special files may have peculiar consequences. Specifying the |
|
|
226 |
__-s__ option causes __file__ to also read argument |
|
|
227 |
files which are block or character special files. This is |
|
|
228 |
useful for determining the filesystem types of the data in |
|
|
229 |
raw disk partitions, which are block special files. This |
|
|
230 |
option also causes __file__ to disregard the file size as |
|
|
231 |
reported by stat(2) since on some systems it reports |
|
|
232 |
a zero size for raw disk partitions. |
|
|
233 |
|
|
|
234 |
|
|
|
235 |
''--help'' |
|
|
236 |
|
|
|
237 |
|
|
|
238 |
Print a help message and exit. |
|
|
239 |
|
|
|
240 |
|
|
|
241 |
''--version'' |
|
|
242 |
|
|
|
243 |
|
|
|
244 |
Print version information and exit. |
|
|
245 |
!!FILES |
|
|
246 |
|
|
|
247 |
|
|
|
248 |
''/usr/share/misc/magic.mgc'' |
|
|
249 |
|
|
|
250 |
|
|
|
251 |
Default compiled list of magic numbers |
|
|
252 |
|
|
|
253 |
|
|
|
254 |
''/usr/share/misc/magic'' |
|
|
255 |
|
|
|
256 |
|
|
|
257 |
Default list of magic numbers |
|
|
258 |
|
|
|
259 |
|
|
|
260 |
''/usr/share/misc/magic.mime'' |
|
|
261 |
|
|
|
262 |
|
|
|
263 |
Default list of magic numbers, used to output mime types |
|
|
264 |
when the -i option is specified. |
|
|
265 |
|
|
|
266 |
|
|
|
267 |
''/etc/magic'' |
|
|
268 |
|
|
|
269 |
|
|
|
270 |
Local additions to magic wisdom. |
|
|
271 |
!!ENVIRONMENT |
|
|
272 |
|
|
|
273 |
|
|
|
274 |
The environment variable __MAGIC__ can be used to set the |
|
|
275 |
default magic number files. |
|
|
276 |
!!SEE ALSO |
|
|
277 |
|
|
|
278 |
|
|
|
279 |
magic(5) - description of magic file format.__ |
|
|
280 |
strings__(1), od(1), __hexdump(1)__ - tools for |
|
|
281 |
examining non-textfiles. |
|
|
282 |
!!STANDARDS CONFORMANCE |
|
|
283 |
|
|
|
284 |
|
|
|
285 |
This program is believed to exceed the System V Interface |
|
|
286 |
Definition of FILE(CMD), as near as one can determine from |
|
|
287 |
the vague language contained therein. Its behaviour is |
|
|
288 |
mostly compatible with the System V program of the same |
|
|
289 |
name. This version knows more magic, however, so it will |
|
|
290 |
produce different (albeit more accurate) output in many |
|
|
291 |
cases. |
|
|
292 |
|
|
|
293 |
|
|
|
294 |
The one significant difference between this version and |
|
|
295 |
System V is that this version treats any white space as a |
|
|
296 |
delimiter, so that spaces in pattern strings must be |
|
|
297 |
escaped. For example, |
|
|
298 |
in an existing magic file would have to be changed to |
|
|
299 |
In addition, in this version, if a pattern string contains a |
|
|
300 |
backslash, it must be escaped. For example |
|
|
301 |
0 string begindata Andrew Toolkit document |
|
|
302 |
in an existing magic file would have to be changed to |
|
|
303 |
0 string \begindata Andrew Toolkit document |
|
|
304 |
|
|
|
305 |
|
|
|
306 |
SunOS releases 3.2 and later from Sun Microsystems include a |
|
|
307 |
file(1) command derived from the System V one, but |
|
|
308 |
with some extensions. My version differs from Sun's only in |
|
|
309 |
minor ways. It includes the extension of the ` |
|
|
310 |
__ |
|
|
311 |
__ |
|
|
312 |
!!MAGIC DIRECTORY |
|
|
313 |
|
|
|
314 |
|
|
|
315 |
The magic file entries have been collected from various |
|
|
316 |
sources, mainly USENET, and contributed by various authors. |
|
|
317 |
Christos Zoulas (address below) will collect additional or |
|
|
318 |
corrected magic file entries. A consolidation of magic file |
|
|
319 |
entries will be distributed periodically. |
|
|
320 |
|
|
|
321 |
|
|
|
322 |
The order of entries in the magic file is significant. |
|
|
323 |
Depending on what system you are using, the order that they |
|
|
324 |
are put together may be incorrect. |
|
|
325 |
!!EXAMPLES |
|
|
326 |
|
|
|
327 |
|
|
|
328 |
$ file file.c file /dev/hda |
|
|
329 |
file.c: C program text |
|
|
330 |
file: ELF 32-bit LSB executable, Intel 80386, version 1, |
|
|
331 |
dynamically linked, not stripped |
|
|
332 |
/dev/hda: block special |
|
|
333 |
$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} |
|
|
334 |
/dev/hda: x86 boot sector |
|
|
335 |
/dev/hda1: Linux/i386 ext2 filesystem |
|
|
336 |
/dev/hda2: x86 boot sector |
|
|
337 |
/dev/hda3: x86 boot sector, extended partition table |
|
|
338 |
/dev/hda4: Linux/i386 ext2 filesystem |
|
|
339 |
/dev/hda5: Linux/i386 swap file |
|
|
340 |
/dev/hda6: Linux/i386 swap file |
|
|
341 |
/dev/hda7: Linux/i386 swap file |
|
|
342 |
/dev/hda8: Linux/i386 swap file |
|
|
343 |
/dev/hda9: empty |
|
|
344 |
/dev/hda10: empty |
|
|
345 |
$ file -i file.c file /dev/hda |
|
|
346 |
file.c: text/x-c |
|
|
347 |
file: application/x-executable, dynamically linked (uses shared libs), not stripped |
|
|
348 |
/dev/hda: application/x-not-regular-file |
|
|
349 |
!!HISTORY |
|
|
350 |
|
|
|
351 |
|
|
|
352 |
There has been a __file__ command in every |
|
|
353 |
UNIX since at least Research Version 6 (man |
|
|
354 |
page dated January 16, 1975). The System V version |
|
|
355 |
introduced one significant major change: the external list |
|
|
356 |
of magic number types. This slowed the program down slightly |
|
|
357 |
but made it a lot more flexible. |
|
|
358 |
|
|
|
359 |
|
|
|
360 |
This program, based on the System V version, was written by |
|
|
361 |
Ian Darwin |
|
|
362 |
|
|
|
363 |
|
|
|
364 |
John Gilmore revised the code extensively, making it better |
|
|
365 |
than the first version. Geoff Collyer found several |
|
|
366 |
inadequacies and provided some magic file entries. |
|
|
367 |
Contributions by the ` |
|
|
368 |
|
|
|
369 |
|
|
|
370 |
Guy Harris, guy@netapp.com, made many changes from 1993 to |
|
|
371 |
the present. |
|
|
372 |
|
|
|
373 |
|
|
|
374 |
Primary development and maintenance from 1990 to the present |
|
|
375 |
by Christos Zoulas (christos@astron.com). |
|
|
376 |
|
|
|
377 |
|
|
|
378 |
Altered by Chris Lowth, chris@lowth.com, 2000: Handle the |
|
|
379 |
``-i'' option to output mime type strings and using an |
|
|
380 |
alternative magic file and internal logic. |
|
|
381 |
|
|
|
382 |
|
|
|
383 |
Altered by Eric Fischer (enf@pobox.com), July, 2000, to |
|
|
384 |
identify character codes and attempt to identify the |
|
|
385 |
languages of non-ASCII files. |
|
|
386 |
|
|
|
387 |
|
|
|
388 |
The list of contributors to the |
|
|
389 |
!!LEGAL NOTICE |
|
|
390 |
|
|
|
391 |
|
|
|
392 |
Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. |
|
|
393 |
Covered by the standard Berkeley Software Distribution |
|
|
394 |
copyright; see the file LEGAL.NOTICE in the source |
|
|
395 |
distribution. |
|
|
396 |
|
|
|
397 |
|
|
|
398 |
The files ''tar.h'' and ''is_tar.c'' were written by |
|
|
399 |
John Gilmore from his public-domain __tar__ program, and |
|
|
400 |
are not covered by the above license. |
|
|
401 |
!!BUGS |
|
|
402 |
|
|
|
403 |
|
|
|
404 |
There must be a better way to automate the construction of |
|
|
405 |
the Magic file from all the glop in Magdir. What is it? |
|
|
406 |
Better yet, the magic file should be compiled into binary |
|
|
407 |
(say, ndbm(3) or, better yet, fixed-length |
|
|
408 |
ASCII strings for use in heterogenous network |
|
|
409 |
environments) for faster startup. Then the program would run |
|
|
410 |
as fast as the Version 7 program of the same name, with the |
|
|
411 |
flexibility of the System V version. |
|
|
412 |
|
|
|
413 |
|
|
|
414 |
__File__ uses several algorithms that favor speed over |
|
|
415 |
accuracy, thus it can be misled about the contents of text |
|
|
416 |
files. |
|
|
417 |
|
|
|
418 |
|
|
|
419 |
The support for text files (primarily for programming |
|
|
420 |
languages) is simplistic, inefficient and requires |
|
|
421 |
recompilation to update. |
|
|
422 |
|
|
|
423 |
|
|
|
424 |
There should be an ``else'' clause to follow a series of |
|
|
425 |
continuation lines. |
|
|
426 |
|
|
|
427 |
|
|
|
428 |
The magic file and keywords should have regular expression |
|
|
429 |
support. Their use of ASCII TAB as a field |
|
|
430 |
delimiter is ugly and makes it hard to edit the files, but |
|
|
431 |
is entrenched. |
|
|
432 |
|
|
|
433 |
|
|
|
434 |
It might be advisable to allow upper-case letters in |
|
|
435 |
keywords for e.g., troff(1) commands vs man page |
|
|
436 |
macros. Regular expression support would make this |
|
|
437 |
easy. |
|
|
438 |
|
|
|
439 |
|
|
|
440 |
The program doesn't grok FORTRAN . It should |
|
|
441 |
be able to figure FORTRAN by seeing some |
|
|
442 |
keywords which appear indented at the start of line. Regular |
|
|
443 |
expression support would make this easy. |
|
|
444 |
|
|
|
445 |
|
|
|
446 |
The list of keywords in ''ascmagic'' probably belongs in |
|
|
447 |
the Magic file. This could be done by using some keyword |
|
|
448 |
like `*' for the offset value. |
|
|
449 |
|
|
|
450 |
|
|
|
451 |
Another optimisation would be to sort the magic file so that |
|
|
452 |
we can just run down all the tests for the first byte, first |
|
|
453 |
word, first long, etc, once we have fetched it. Complain |
|
|
454 |
about conflicts in the magic file entries. Make a rule that |
|
|
455 |
the magic entries sort based on file offset rather than |
|
|
456 |
position within the magic file? |
|
|
457 |
|
|
|
458 |
|
|
|
459 |
The program should provide a way to give an estimate of |
|
|
460 |
``how good'' a guess is. We end up removing guesses (e.g. |
|
|
461 |
``From '' as first 5 chars of file) because they are not as |
|
|
462 |
good as other guesses (e.g. ``Newsgroups:'' versus |
|
|
463 |
``Return-Path:''). Still, if the others don't pan out, it |
|
|
464 |
should be possible to use the first guess. |
|
|
465 |
|
|
|
466 |
|
|
|
467 |
This program is slower than some vendors' file commands. The |
|
|
468 |
new support for multiple character codes makes it even |
|
|
469 |
slower. |
|
|
470 |
|
|
|
471 |
|
|
|
472 |
This manual page, and particularly this section, is too |
|
|
473 |
long. |
|
|
474 |
!!AVAILABILITY |
|
|
475 |
|
|
|
476 |
|
|
|
477 |
You can obtain the original author's latest version by |
|
|
478 |
anonymous FTP on __ftp.astron.com__ in the directory |
|
|
479 |
''/pub/file/file-X.YY.tar.gz'' |
|
|
480 |
|
|
|
481 |
|
|
|
482 |
This __Debian__ version adds long options and corrects |
|
|
483 |
some bugs. It can be obtained from every site carrying a |
|
|
484 |
__Debian__ distribution (ftp.debian.org and |
|
|
485 |
mirrors). |
|
|
486 |
---- |