中文XML论坛--[原创] 关于URI Reference, fragment identifier 以及为什么 RDF的 namespace URI 后总要带一个 "#" .

名词解释：

URI（Universal Resource Identifier，统一资源标识符）：首先它是一个标识符——标识资源的标识符——而且是Universal的（表示这是大家都遵从的一种用法）。需要注意的是：标识符并不意味着一定要通过资源的网络地址来标识资源，相反，也可以是某种国际规定的名称（比如ISBN）来唯一标识资源。

URL（Universal Resource Locator）: URI的子集。通过获取资源的访问机制来标识资源的一种URI。

URI的语法为

<scheme>:<scheme-specific-part>。

其中分为两部分，第一部分表明他的scheme(所标识机制所采取的方案)；剩下的部分是与scheme（比如，ftp,http,mailto等都是现有的URI scheme）相关的。因此URI schema不同，URI的语法也不同。下面是几个例子：

{{ // extracted from section 1.3 of RFC 2396

ftp://ftp.is.co.za/rfc/rfc1808.txt
-- ftp scheme for File Transfer Protocol services

gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
-- gopher scheme for Gopher and Gopher+ Protocol services

http://www.math.uio.no/faq/compression-faq/part1.html
-- http scheme for Hypertext Transfer Protocol services

mailto:mduerst@ifi.unizh.ch
-- mailto scheme for electronic mail addresses

news:comp.infosystems.www.servers.unix
-- news scheme for USENET news groups and articles

telnet://melvyl.ucop.edu/
-- telnet scheme for interactive services via the TELNET Protocol
}}

generic URI: 虽然URI没有统一的语法，但其中的一个子集具有共同的语法，具有这样的语法的URI就叫做"generic URI"。

generic URI = <scheme>://<authority><path>?<query>

abosolute URI: 上面出现的均为绝对URI，他们都是从<scheme>开始写得。

relative URI: 省略了<scheme>，甚至<autority>及部分<path>的URI为相对URI。相对URI只在它所在的上下文中有意义。

基准URI(base URI): 使得相对URI能够具有universal意义的那些上下文信息（通常也是一个URI）。在将一个相对URI解析为绝对URI的过程(一般来说就是连接base URI和相对URI)中，关键是要确定基准URI是什么。而基准URI是这样确定的(按优先级排序)：

1. 在文档内容中被显示指定基准URI（比如在XML中通过xml:base属性，或在HTML中通过<BASE>元素）。

2. 基准URI是封装实体（比如相对URI所在的文档）的基准URI。

3. 基准URI是获取当前实体的URI（比如在浏览器地址栏里输入的URL）。

4. 基准URI由具体的应用场合定义。

URI引用（URI reference, 简称URIref）：可以（"MAY"[RFC2119]，即可有可无，optional）在末尾附带一个#及一个字符串（可以为空）的URI。

URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]

fragment identifier: 即上述跟在#符号后的字符串。fragmenter identifier仅被User Agent(如浏览器)使用，作为决定如何显示URI取回内容的参数；它不被server用作决定返回什么内容的参数。也就是说fragmenter identifier不属于URI的一部分，而是由各个MIME TYPE规定如何使用的。在不同的MIME TYPE（比如HTML, XML, RDF）中，fragmenter identifier的作用是不同的。

same-document reference：即不含URI的URIref，也就是说只包含fragment identifier的URIref，如 "#section2"。

需要注意的是在[RFC2396]中，对如何处理这类URIref的规定：即a.htm中的"#section2"解析为绝对URI应该是“a.htm#section2”，而考虑基准URI(不过很多浏览器，比如IE，在实现时仍然考虑把same-document reference作为一般的URIref处理，也就是如前所述一样，要根据基准URI来将same-document reference解析为绝对URI)。

{{ // 4.2. Same-document References, RFC 2396

   A URI reference that does not contain a URI is a reference to the
   current document.  In other words, an empty URI reference within a
   document is interpreted as a reference to the start of that document,
   and a reference containing only a fragment identifier is a reference
   to the identified fragment of that document.  Traversal of such a
   reference should not result in an additional retrieval action.
   However, if the URI reference occurs in a context that is always
   intended to result in a new request, as in the case of HTML's FORM
   element, then an empty URI reference represents the base URI of the
   current document and should be replaced by that URI when transformed
   into a request.

}}

============================================

Case Study One: 把下面的a.htm放到http://www.w3china.org/demo/目录下，并通过URI http://www.w3china.org/demo/a.htm来打开的话，点击链接“Homepage”，浏览器将跳转到http://www.w3china.org/demo/homepage.htm。本例中，基准URI是根据上述第3条规则获得的。

a.htm
---------
<HTML>
<HEAD>
</HEAD>
<BODY>
<A href="homepage.htm">Homepage</A>
</BODY>
</HTML>

Case Study Two: 把下面的b.htm放到http://www.w3china.org/demo/目录下，并通过URI http://www.w3china.org/demo/b.htm来打开的话，点击链接“Homepage”，浏览器将跳转到http://www.xml.org.cn/test/homepage.htm。本例中，基准URI是根据上述第1条规则获得的。

b.htm
---------
<HTML>
<HEAD>
<BASE href="http://www.xml.org.cn/test">
</HEAD>
<BODY>
<A href="homepage.htm">Homepage</A>
</BODY>
</HTML>

Case Study Three: 把下面的a.htm放到http://www.w3china.org/demo/目录下，但是我们并不是通过URI http://www.w3china.org/demo/a.htm来访问它，而是通过把URI 把二级域名test.w3china.org 配置为一个目的地址URI http://www.w3china.org/demo/a.htm的隐藏URL跳转（需要配置w3china.org的域名解析服务器），并在浏览器中通过http://test.w3china.org/来打开的话，点击链接“Homepage”，浏览器将跳转到http://test.w3china.org/homepage.htm。本例中，基准URI是根据上述第3条规则获得的。

b.htm
---------
<HTML>
<HEAD>
<BASE href="http://www.xml.org.cn/test">
</HEAD>
<BODY>
<A href="homepage.htm">Homepage</A>
</BODY>
</HTML>

Case Study Four: 把Case Study Three中的a.htm换为b.htm，同样点击“Homagepage”，浏览器将跳转到http://www.xml.org.cn/test/homepage.htm。本例中，基准URI是根据上述第1条规则获得的。

Case Study Five: (注本例只是根据RFC2396的解释，并不代表实际应用中的效果) 把下面的c.htm放到http://www.w3china.org/demo/目录下, 然后在符合RFC 2396的浏览器中打开c.htm，当点击链接Section 2使，将转向http://www.w3china.org/demo/c.htm#section2,而不是http://www.xml.org.cn/test/c.htm#section2。

c.htm
---------
<HTML>
<HEAD>
<BASE href="http://www.xml.org.cn/test">
</HEAD>
<BODY>

.........

<A href="#section2">Section 2</A>

.........

<A name="section2">
Section 2
...........

</BODY>
</HTML>

Case Study Six: 对c.htm略作改动，把"#section"改为"./#section"把下面的c.htm放到http://www.w3china.org/demo/目录下, 然后在符合RFC 2396的浏览器中打开c.htm，当点击链接Section 2使，将转向http://www.w3china.org/demo/c.htm#section2,而不是http://www.xml.org.cn/test/c.htm#section2。

c.htm
---------
<HTML>
<HEAD>
<BASE href="http://www.xml.org.cn/test">
</HEAD>
<BODY>

.........

<A href="#section2">Section 2</A>

.........

<A name="section2">
Section 2
...........

</BODY>
</HTML>

Case Study Seven: 把下面的a.rdf放到http://www.w3china.org/demo/目录下,如果我们在另一个rdf文档中书写<rdf:Description rdf:about="http://www.w3china.org/demo/a.rdf#item10245">，则http://www.w3china.org/demo/a.rdf#item10245标识的是产品item10245这一事物，而不是某个文档，或文档的一部分（比如元素<rdf:Description rdf:ID="item10245">的内容，这与fragment identifier在HTML中的语义不同）。详见附后的参考文献。

a.rdf
---------

..........

<rdf:Description rdf:ID="item10245">
<exterms:model rdf:datatype="&xsd;string">Overnighter</exterms:model>
<exterms:sleeps rdf:datatype="&xsd;integer">2</exterms:sleeps>
</rdf:Description>
.............
<rdf:Description rdf:about="#item10245">
<exterms:weight rdf:datatype="&xsd;decimal">2.4</exterms:weight>
<exterms:packedSize rdf:datatype="&xsd;integer">784</exterms:packedSize>
</rdf:Description>

Case Study Eight: 看下面2个文件，为何foo.rdf中namespace URI后总要加一个#号，而bar.xml中的就没加？。

前面提到, #的解释是与各个MIME TYPE相关的。在RDF中, #表明URIref标识的不是一个web文档，而是一个事物。加上这个#，就是为了显式表明，URIref标识的是一个（网络可获取，或非网络可获取的）事物。

foo.rdf
---------------
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:contact="http://www.w3.org/2000/10/swap/pim/contact#">

  <contact:Person rdf:about="http://www.w3.org/People/EM/contact#me">
    <contact:fullName>Eric Miller</contact:fullName>
    <contact:mailbox rdf:resource="mailto:em@w3.org"/>
    <contact:personalTitle>Dr.</contact:personalTitle>
  </contact:Person>

</rdf:RDF>

bar.xml
---------------------
<?xml version="1.0"?>

<html:html xmlns:html='http://www.w3.org/TR/REC-html40'>
  <html:head><html:title>Frobnostication</html:title></html:head>
  <html:body><html:p>Moved to
    <html:a href='http://frob.com'>here.</html:a></html:p></html:body>
</html:html>

==================================

参考文献：

How We Identify Things (on the Semantic Web) ? （http://www.w3.org/2001/03/identification-problem/）
URI References: Fragment Identifiers on URIs （http://www.w3.org/DesignIssues/Fragment.html）
RFC 2396 （http://www.ietf.org/rfc/rfc2396.txt?number=2396）

{{ // extracted from RFC 2396

4. URI References

   The term "URI-reference" is used here to denote the common usage of a
   resource identifier.  A URI reference may be absolute or relative,
   and may have additional information attached in the form of a
   fragment identifier.  However, "the URI" that results from such a
   reference includes only the absolute URI after the fragment
   identifier (if any) is removed and after any relative URI is resolved
   to its absolute form.  Although it is possible to limit the
   discussion of URI syntax and semantics to that of the absolute
   result, most usage of URI is within general URI references, and it is
   impossible to obtain the URI from such a reference without also
   parsing the fragment and resolving the relative form.

URI-reference = [ absoluteURI | relativeURI ] [ "#" fragment ]

   The syntax for relative URI is a shortened form of that for absolute
   URI, where some prefix of the URI is missing and certain path
   components ("." and "..") have a special meaning when, and only when,
   interpreting a relative path.  The relative URI syntax is defined in
   Section 5.

4.1. Fragment Identifier

   When a URI reference is used to perform a retrieval action on the
   identified resource, the optional fragment identifier, separated from
   the URI by a crosshatch ("#") character, consists of additional
   reference information to be interpreted by the user agent after the
   retrieval action has been successfully completed.  As such, it is not
   part of a URI, but is often used in conjunction with a URI.

fragment = *uric

   The semantics of a fragment identifier is a property of the data
   resulting from a retrieval action, regardless of the type of URI used
   in the reference.  Therefore, the format and interpretation of
   fragment identifiers is dependent on the media type [RFC2046] of the
   retrieval result.  The character restrictions described in Section 2
   for URI also apply to the fragment in a URI-reference.  Individual
   media types may define additional restrictions or structure within
   the fragment for specifying different types of "partial views" that
   can be identified within that media type.

   A fragment identifier is only meaningful when a URI reference is
   intended for retrieval and the result of that retrieval is a document
   for which the identified fragment is consistently defined.

}}

在orangebench的鼓励下匆忙写成，，错误和不足之处在所难免，欢迎指正。。


	W 3 C h i n a ( since 2003 ) 旗下站点苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》	95.703ms