Penguin
Annotated edit history of url(7) version 2, including all changes. View license author blame.
Rev Author # Line
1 perry 1 URI
2 !!!URI
3 NAME
4 SYNOPSIS
5 DESCRIPTION
6 USAGE
7 CHARACTER ENCODING
8 WRITING A URI
9 NOTES
10 SECURITY
11 CONFORMING TO
12 BUGS
13 AUTHOR
14 SEE ALSO
15 ----
16 !!NAME
17
18
19 uri, url, urn - uniform resource identifier (URI), including a URL or URN
20 !!SYNOPSIS
21
22
23 URI = [[ absoluteURI | relativeURI ] [[
24 !!DESCRIPTION
25
26
27 A Uniform Resource Identifier (URI) is a short string of
28 characters identifying an abstract or physical resource (for
29 example, a web page). A Uniform Resource Locator (URL) is a
30 URI that identifies a resource through its primary access
31 mechanism (e.g., its network
32
33
34 URIs are the standard way to name hypertext link
35 destinations for tools such as web browsers. The string
36
37
38 URIs can be absolute or relative. An absolute identifier
39 refers to a resource independent of context, while a
40 relative identifier refers to a resource by describing the
41 difference from the current context. Within a relative path
42 reference, the complete path segments
43
44
45 A fragment identifier, if included, refers to a particular
46 named portion (fragment) of a resource; text after a '#'
47 identifies the fragment. A URI beginning with '#' refers to
48 that fragment in the current resource.
49 !!USAGE
50
51
52 There are many different URI schemes, each with specific
53 additional rules and meanings, but they are intentionally
54 made to be as similar as possible. For example, many URL
55 schemes permit the authority to be the following format,
56 called here an ''ip_server'' (square brackets show what's
57 optional):
58
59
60 ''ip_server ='' [[''user'' [[ : ''password'' ] @ ]
61 ''host'' [[ : ''port'']
62
63
64 This format allows you to optionally insert a user name, a
65 user plus password, and/or a port number. The ''host'' is
66 the name of the host computer, either its name as determined
67 by DNS or an IP address (numbers separated by periods). Thus
68 the URI
69 ''
70
71
72 Here are some of the most common schemes in use on Unix-like
73 systems that are understood by many tools. Note that many
74 tools using URIs also have internal schemes or specialized
75 schemes; see those tools' documentation for information on
76 those schemes.
77
78
79 __http - Web (HTTP) server__
80
81
82 http://''ip_server''/''path''
83 http://''ip_server''/''path''?''query''
84
85
86 This is a URL accessing a web (HTTP) server. The default
87 port is 80. If the path refers to a directory, the web
88 server will choose what to return; usually if there is a
89 file named
90
91
92 A query can be given in the archaic
93 key''=''value'' separated by the
94 ampersand character (''key'' can be
95 repeated more than once, though it's up to the web server
96 and its application programs to determine if there's any
97 meaning to that. There is an unfortunate interaction with
98 HTML/XML/SGML and the GET query format; when such URIs with
99 more than one key are embedded in SGML/XML documents
100 (including HTML), the ampersand (
101 ''
102
103
104 __ftp - File Transfer Protocol (FTP)__
105
106
107 ftp://''ip_server''/''path''
108
109
110 This is a URL accessing a file through the file transfer
111 protocol (FTP). The default port (for control) is 21. If no
112 username is included, the user name
113
114
115 __gopher - Gopher server__
116
117
118 gopher://''ip_server''/''gophertype selector''
119 gopher://''ip_server''/''gophertype
120 selector''%09''search''
121 gopher://''ip_server''/''gophertype
122 selector''%09''search''%09''gopher+_string''
123
124
125 The default gopher port is 70. ''gophertype'' is a
126 single-character field to denote the Gopher type of the
127 resource to which the URL refers. The entire path may also
128 be empty, in which case the delimiting
129 ''
130
131
132 ''selector'' is the Gopher selector string. In the Gopher
133 protocol, Gopher selector strings are a sequence of octets
134 which may contain any octets except 09 hexadecimal (US-ASCII
135 HT or tab), 0A hexadecimal (US-ASCII character LF), and 0D
136 (US-ASCII character CR).
137
138
139 __mailto - Email address__
140
141
142 mailto:''email-address''
143
144
145 This is an email address, usually of the form
146 ''name''@''hostname''. See mailaddr(7) for more
147 information on the correct format of an email address. Note
148 that any % character must be rewritten as %25. An example is
149 __
150
151
152 __news - Newsgroup or News message__
153
154
155 news:''newsgroup-name''
156 news:''message-id''
157
158
159 A ''newsgroup-name'' is a period-delimited hierarchical
160 name, such as
161 ''
162
163
164 A ''message-id'' corresponds to the Message-ID of IETF
165 RFC 1036, without the enclosing
166 ''unique''@''full_domain_name''. A message identifier
167 may be distinguished from a news group name by the presence
168 of the ''
169
170
171 __telnet - Telnet login__
172
173
174 telnet://''ip_server''/
175
176
177 The Telnet URL scheme is used to designate interactive text
178 services that may be accessed by the Telnet protocol. The
179 final
180
181
182 __file - Normal file__
183
184
185 file://''ip_server''/''path_segments''
186 file:''path_segments''
187
188
189 This represents a file or directory accessible locally. As a
190 special case, ''host'' can be the string
191 ''glob__(7) and glob(3)).
192
193
194 The second format (e.g.,
195
196
197 __man - Man page documentation__
198
199
200 man:''command-name''
201 man:''command-name''(''section'')
202
203
204 This refers to local online manual (man) reference pages.
205 The command name can optionally be followed by a parenthesis
206 and section number; see man(7) for more information
207 on the meaning of the section numbers. This URI scheme is
208 unique to Unix-like systems (such as Linux) and is not
209 currently registered by the IETF. An example is
210 __
211
212
213 __info - Info page documentation__
214
215
216 info:''virtual-filename''
217 info:''virtual-filename''#''nodename''
218 info:(''virtual-filename'')
219 info:(''virtual-filename'')''nodename''
220
221
222 This scheme refers to online info reference pages (generated
223 from texinfo files), a documentation format used by programs
224 such as the GNU tools. This URI scheme is unique to
225 Unix-like systems (such as Linux) and is not currently
226 registered by the IETF. As of this writing, GNOME and KDE
227 differ in their URI syntax and do not accept the other's
228 syntax. The first two formats are the GNOME format; in
229 nodenames all spaces are written as underscores. The second
230 two formats are the KDE format; spaces in nodenames must be
231 written as spaces, even though this is forbidden by the URI
232 standards. It's hoped that in the future most tools will
233 understand all of these formats and will always accept
234 underscores for spaces in nodenames. In both GNOME and KDE,
235 if the form without the nodename is used the nodename is
236 assumed to be
237
238
239 __whatis - Documentation search__
240
241
242 whatis:''string''
243
244
245 This scheme searches the database of short (one-line)
246 descriptions of commands and returns a list of descriptions
247 containing that string. Only complete word matches are
248 returned. See whatis(1). This URI scheme is unique to
249 Unix-like systems (such as Linux) and is not currently
250 registered by the IETF.
251
252
253 __ghelp - GNOME help documentation__
254
255
256 ghelp:''name-of-application''
257
258
259 This loads GNOME help for the given application. Note that
260 not much documentation currently exists in this
261 format.
262
263
264 __ldap - Lightweight Directory Access
265 Protocol__
266
267
268 ldap://''hostport''
269 ldap://''hostport''/
270 ldap://''hostport''/''dn''
271 ldap://''hostport''/''dn''?''attributes''
272 ldap://''hostport''/''dn''?''attributes''?''scope''
273 ldap://''hostport''/''dn''?''attributes''?''scope''?''filter''
274 ldap://''hostport''/''dn''?''attributes''?''scope''?''filter''?''extensions''
275
276
277 This scheme supports queries to the Lightweight Directory
278 Access Protocol (LDAP), a protocol for querying a set of
279 servers for hierarchically-organized information (such as
280 people and computing resources). More information on the
281 LDAP URL scheme is available in RFC 2255. The components of
282 this URL are:
283
284
285 hostport
286
287
288 the LDAP server to query, written as a hostname optionally
289 followed by a colon and the port number. The default LDAP
290 port is TCP port 389. If empty, the client determines which
291 the LDAP server to use.
292
293
294 dn the LDAP Distinguished Name, which identifies the base
295 object of the LDAP search (see RFC 2253 section
296 3).
297
298
299 attributes
300
301
302 a comma-separated list of attributes to be returned; see RFC
303 2251 section 4.1.5. If omitted, all attributes should be
304 returned.
305
306
307 scope
308
309
310 specifies the scope of the search, which can be one of
311
312
313 filter
314
315
316 specifies the search filter (subset of entries to return).
317 If omitted, all entries should be returned. See RFC 2254
318 section 4.
319
320
321 extensions
322
323
324 a comma-separated list of type=value pairs, where the =value
325 portion may be omitted for options not requiring it. An
326 extension prefixed with a '!' is critical (must be supported
327 to be valid), otherwise it's non-critical
328 (optional).
329
330
331 LDAP queries are easiest to explain by example. Here's a
332 query that asks ldap.itd.umich.edu for information about the
333 University of Michigan in the U.S.:
334
335
336 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US
337
338
339 To just get its postal address attribute,
340 request:
341
342
343 ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress
344
345
346 To ask a host.com at port 6666 for information about the
347 person with common name (cn)
348
349
350 ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
351
352
353 __wais - Wide Area Information Servers__
354
355
356 wais://''hostport''/''database''
357 wais://''hostport''/''database''?''search''
358 wais://''hostport''/''database''/''wtype''/''wpath''
359
360
361 This scheme designates a WAIS database, search, or document
362 (see IETF RFC 1625 for more information on WAIS). Hostport
363 is the hostname, optionally followed by a colon and port
364 number (the default port number is 210).
365
366
367 The first form designates a WAIS database for searching. The
368 second form designates a particular search of the WAIS
369 database ''database''. The third form designates a
370 particular document within a WAIS database to be retrieved.
371 ''wtype'' is the WAIS designation of the type of the
372 object and ''wpath'' is the WAIS
373 document-id.
374
375
376 __other schemes__
377
378
379 There are many other URI schemes. Most tools that accept
380 URIs support a set of internal URIs (e.g., Mozilla has the
381 about: scheme for internal information, and the GNOME help
382 browser has the toc: scheme for various starting locations).
383 There are many schemes that have been defined but are not as
384 widely used at the current time (e.g., prospero). The nntp:
385 scheme is deprecated in favor of the news: scheme. URNs are
386 to be supported by the urn: scheme, with a hierarchical name
387 space (e.g., urn:ietf:... would identify IETF documents); at
388 this time URNs are not widely implemented. Not all tools
389 support all schemes.
390 !!CHARACTER ENCODING
391
392
393 URIs use a limited number of characters so that they can be
394 typed in and used in a variety of situations.
395
396
397 The following characters are reserved, that is, they may
398 appear in a URI but their use is limited to their reserved
399 purpose (conflicting data must be escaped before forming the
400 URI):
401
402
403 ; / ? : @
404
405
406 Unreserved characters may be included in a URI. Unreserved
407 characters include include upper and lower case English
408 letters, decimal digits, and the following limited set of
409 punctuation marks and symbols:
410
411
412 - _ . ! ~ * ' ( )
413
414
415 All other characters must be escaped. An escaped octet is
416 encoded as a character triplet, consisting of the percent
417 character
418
419
420 Unreserved characters can be escaped without changing the
421 semantics of the URI, but this should not be done unless the
422 URI is being used in a context that does not allow the
423 unescaped character to appear. For example,
424
425
426 For URIs which must handle characters outside the US ASCII
427 character set, the HTML 4.01 specification (section B.2) and
428 IETF RFC 2718 (section 2.2.5) recommend the following
429 approach:
430
431
432 1.
433
434
435 translate the character sequences into UTF-8 (IETF RFC 2279)
2 perry 436 - see utf-8(7) - and then
1 perry 437
438
439 2.
440
441
442 use the URI escaping mechanism, that is, use the %HH
443 encoding for unsafe octets.
444 !!WRITING A URI
445
446
447 When written, URIs should be placed inside doublequotes
448 (e.g.,
449 never__ move extraneous punctuation
450 (such as the period ending a sentence or the comma in a
451 list) inside a URI, since this will change the value of the
452 URI. Instead, use angle brackets instead, or switch to a
453 quoting system that never includes extraneous characters
454 inside quotation marks. This latter system, called the 'new'
455 or 'logical' quoting system by
456 __
457
458
459 The URI syntax was designed to be unambiguous. However, as
460 URIs have become commonplace, traditional media (television,
461 radio, newspapers, billboards, etc.) have increasingly used
462 abbreviated URI references consisting of only the authority
463 and path portions of the identified resource (e.g.,
464 !!NOTES
465
466
467 Any tool accepting URIs (e.g., a web browser) on a Linux
468 system should be able to handle (directly or indirectly) all
469 of the schemes described here, including the man: and info:
470 schemes. Handling them by invoking some other program is
471 fine and in fact encouraged.
472
473
474 Technically the fragment isn't part of the URI.
475
476
477 For information on how to embed URIs (including URLs) in a
478 data format, see documentation on that format. HTML uses the
479 format uri''''text''
480 ''uri''}.
481 Man and mdoc have the recently-added UR macro, or just
482 include the URI in the text (viewers should be able to
483 detect :// as part of a URI).
484
485
486 The GNOME and KDE desktop environments currently vary in the
487 URIs they accept, in particular in their respective help
488 browsers. To list man pages, GNOME uses
489 !!SECURITY
490
491
492 A URI does not in itself pose a security threat. There is no
493 general guarantee that a URL, which at one time located a
494 given resource, will continue to do so. Nor is there any
495 guarantee that a URL will not locate a different resource at
496 some later point in time; such a guarantee can only be
497 obtained from the person(s) controlling that namespace and
498 the resource in question.
499
500
501 It is sometimes possible to construct a URL such that an
502 attempt to perform a seemingly harmless operation, such as
503 the retrieval of an entity associated with the resource,
504 will in fact cause a possibly damaging remote operation to
505 occur. The unsafe URL is typically constructed by specifying
506 a port number other than that reserved for the network
507 protocol in question. The client unwittingly contacts a site
508 that is in fact running a different protocol. The content of
509 the URL contains instructions that, when interpreted
510 according to this other protocol, cause an unexpected
511 operation. An example has been the use of a gopher URL to
512 cause an unintended or impersonating message to be sent via
513 a SMTP server.
514
515
516 Caution should be used when using any URL that specifies a
517 port number other than the default for the protocol,
518 especially when it is a number within the reserved
519 space.
520
521
522 Care should be taken when a URI contains escaped delimiters
523 for a given protocol (for example, CR and LF characters for
524 telnet protocols) that these are not unescaped before
525 transmission. This might violate the protocol, but avoids
526 the potential for such characters to be used to simulate an
527 extra operation or parameter in that protocol, which might
528 lead to an unexpected and possibly harmful remote operation
529 to be performed.
530
531
532 It is clearly unwise to use a URI that contains a password
533 which is intended to be secret. In particular, the use of a
534 password within the 'userinfo' component of a URI is
535 strongly disrecommended except in those rare cases where the
536 'password' parameter is intended to be public.
537 !!CONFORMING TO
538
539
540 IETF RFC 2396, HTML 4.0.
541 !!BUGS
542
543
544 Documentation may be placed in a variety of locations, so
545 there currently isn't a good URI scheme for general online
546 documentation in arbitrary formats. References of the form
547
548
549 Many programs and file formats don't include a way to
550 incorporate or implement links using URIs.
551
552
553 Many programs can't handle all of these different URI
554 formats; there should be a standard mechanism to load an
555 arbitrary URI that automatically detects the users'
556 environment (e.g., text or graphics, desktop environment,
557 local user preferences, and currently-executing tools) and
558 invokes the right tool for any URI.
559 !!AUTHOR
560
561
562 David A. Wheeler (dwheeler@dwheeler.com) wrote this man
563 page.
564 !!SEE ALSO
565
566
2 perry 567 lynx(1), mailaddr(7), utf-8(7),
568 man2html(1), IETF RFC 2255.
1 perry 569 ----
This page is a man page (or other imported legacy content). We are unable to automatically determine the license status of this page.