新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> 关于 XML 的一般性技术讨论,提供 XML入门资料 和 XML教程
    [返回] 中文XML论坛 - 专业的XML技术讨论区XML.ORG.CN讨论区 - XML技术『 XML基础 』 → XML解析器测评(二) - 参评对象:LIBXML2, Java 1.5, Apache AXIOM, DOM4J, JDOM, Oracle XDK 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 17523 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: XML解析器测评(二) - 参评对象:LIBXML2, Java 1.5, Apache AXIOM, DOM4J, JDOM, Oracle XDK 举报  打印  推荐  IE收藏夹 
       本主题类别: XML工具和开发环境    
     admin 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      头衔:W3China站长
      等级:计算机硕士学位(管理员)
      文章:5255
      积分:18406
      门派:W3CHINA.ORG
      注册:2003/10/5

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给admin发送一个短消息 把admin加入好友 查看admin的个人资料 搜索admin在『 XML基础 』的所有贴子 点击这里发送电邮给admin  访问admin的主页 引用回复这个贴子 回复这个贴子 查看admin的博客楼主
    发贴心情 XML解析器测评(二) - 参评对象:LIBXML2, Java 1.5, Apache AXIOM, DOM4J, JDOM, Oracle XDK


    By Matthias Farwick, Michael Hafner
    May 16, 2007
    In [URL=http://www.xml.com/pub/a/2007/05/09/xml-parser-benchmarks-part-1.html]part 1[/URL] of this series we showed you the results of our event-driven parser benchmarks. The outcome of these benchmarks showed that the LIBXML2 SAX-like parser in C is superior over the other tested parsers. In second place followed the two Java pull-parser implementations Javolution and Woodstox.

    In this part of the series we will show you how the object model parser performed in our tests. Object model parsers read in the data by using the event parsers. The object model parser benchmarks were of special interest for our high performance web service security gateway, because most web services security operations involve that at least the header of a SOAP message is read and altered. This in-memory altering can only be done by object model parsers like DOM implementations. The results for the AXIOM implementations are also very interesting in this context. They use a pull-parser to build up the in-memory representation of a XML document until the last node that needs to be read or altered. This has the advantage that not the whole document needs to be read into memory.

    The test setup is the same as in Part 1 of this series, only the AXIOM benchmark in C was compiled with the Mircosoft C/C++ compiler. For each parser the document throughput per second is measured.

    The following list shows all tested object model parsers.

    The Tested Object Model Parsers
    [URL=http://www.xmlsoft.org/]LIBXML2[/URL] Tree 2.6.27 (C)
    LIBXML2 tree is DOM like XML parser. It uses the LIBXML2 SAX-like implementation to read in the XML data.
    [URL=http://java.sun.com/j2se/1.5.0/docs/guide/xml/jaxp/index.html]Java 1.5[/URL] Default DOM (Java)
    The default DOM implementation in Java 1.5. Uses the default SAX implementation to read in the documents.
    [URL=http://ws.apache.org/commons/axiom/]Apache AXIOM[/URL] Java 1.1.2, C 0.96 (Java und C)
    AXIOM is a XML object model by Apache. It was developed for Apache's Web Service Engine AXIS2, but it is pushed forward as a separate project. Currently there exist a Java and a C version of the parser. The Java version uses the Woodstox StAX parser to read in the documents. The C version uses the LIBXML2 stream pull-parser. As already mentioned AXIOM has the advantage of only building the document tree in memory until the last node of which data is needed. This way the whole tree only has to be built when the data in the end of the document is required to be read or altered. The C implementation is currently in version 0.96 and can therefore not be considered as fully stable.
    [URL=http://www.dom4j.org/]DOM4J[/URL] 1.6.1 (Java)
    DOM4J is an object model parser whose API was specially built for convenient use in the object oriented context of Java.
    [URL=http://www.jdom.org/]JDOM[/URL] 1.0 (Java)
    Like DOM4J, this parser was built out of the need for an API that is more convenient to use in an object-oriented context than the W3C DOM specification.
    [URL=http://www.oracle.com/technology/tech/xml/xdkhome.html]Oracle XDK[/URL] DOM implementation (C)
    This parser of the XDK (XML Development Kid) by oracle implements the W3C DOM specification.
    Object Model Parser Benchmarks
    The following benchmarks show the results for the tested parsers which build a document model in memory. In these benchmarks AXIOM cannot play out its advantages because in all tests the whole document is processed.
    按此在新窗口浏览图片
    Figure 1: Benchmark results for the object model parsers for small documents


    Figure 1 shows that LIBXML2 is much faster than all other implementation for these three small document sizes. The two AXIOM parsers perform well for very small documents, since they do not seem to have the same overhead the DOM parsers expose. The Java 1.5 default DOM parser is the fastest of the three Java DOM parsers, closely followed by JDOM and dom4j. The Oracle DOM parser seems to have a significant overhead for each document it reads, since it reveals the worst performance for small documents.


    按此在新窗口浏览图片
    Figure 2: Benchmark results for object model parsers with medium-sized documents

    [URL=http://www.xml.com/lpt/a/%3C!--CS_NEXT_REF--%3E][/URL]


    In the next benchmark for medium-sized documents (Figure 2) LIBXML2 is still ahead of the others for documents up to 455 KB. The Oracle DOM implementation does better as the documents get bigger and catches up to LIBXML2 for documents around 455 KB in size. Both AXIOM implementations do worse with increasing document size. Of the three Java object model parsers the Java 1.5 default DOM parser is always ahead of dom4j, and dom4j always ahead of JDOM.


    按此在新窗口浏览图片
    Figure 3: Benchmark results for object model parsers with large documents


    Figure 3 reveals that the AXIOM implementations do significantly worse than all other implementation for large documents. For the 4 MB document the C implementation of AXIOM has a performance drop. LIBXML2 looses its leading position for these document sizes and is overtaken by Java 1.5 DOM, the Oracle parser and dom4j for the 4 MB files.

    Partial Document Parsing Benchmark
    In the previous benchmarks we tested the complete walk through the documents in which the AXIOM implementations could not play out their advantages of only building the object tree until the last requested node. In the following benchmarks we only requested the first 67 elements of each document. This corresponds, for example, to the use case of only checking the header of a SOAP message for its contents.

    按此在新窗口浏览图片
    Figure 4: Benchmark results for the reading of only the first 67 elements in small documents


    In Figure 4 we can see that the AXIOM implementations cannot play out their advantages for small documents until this size of 5 KB. From the 13.5 KB sized files on, both implementations beat LIBXML2 and Java DOM.


    按此在新窗口浏览图片
    Figure 5: Benchmark results for the reading of only the first 67 elements in large documents


    In Figure 5 you can see that the two AXIOM implementations expose the same performance for all document sizes, which is expected since the only need to read in the first 67 elements. The other parser, obviously perform worse with growing document sizes because they need to build the whole document tree before they can walk through the elements.

    Conclusions
    From the above presented benchmarks, LIBXML2 can be considered as the overall performance winner for object model parsers. It not only performs much better than all other parsers on documents up to 500 KB in size, but it also beats the two AXIOM implementations for documents up to 5 KB, when only the first part of the documents is read. It also does especially well for very small documents of about 1 KB where it is up to 10 times faster than the other implementations. For really big documents above 500 KB the default Java 1.5 DOM parser and the Oracle DOM parser in C are alternatives.
    But as the partial documents parsing benchmarks show, it is advisable that you evaluate which use cases of XML processing you will perform the most. If you find that in most cases you will only need to alter parts in the beginning of a XML document, you should consider using the Java AXIOM implementation. Due to the version status of 0.96 of the AXIOM implementation in C, and the significant performance drop for large documents, we recommend you to wait for future releases of that parser. dom4j does slightly worse, compared to the Java 1.5 default DOM implementation, but has a more convenient API.
    Of course development time also plays a significant role in the decision process which parser to choose. For all tested C parsers you have to be very careful not to produce memory leaks, which will slow down the development. On the other hand especially the JDOM and dom4j APIs are very convenient to use.

    Together with other benchmarks we performed on security operations like encryption and signature, the benchmarks of this article made us confident to use the LIBXML2 parser in C, and C security libraries for our high performance web service security stack. The C libraries also have the advantage of using less memory than a full fledged JVM, which is an advantage on small security appliances that we want to use.

    Additional Resources

    [URL=http://www-128.ibm.com/developerworks/java/library/ws-java2/index.html]Java Web services, Part 2: Digging into Axis2: AXIOM[/URL] by Dennis Sosnoski

    Sun's XMLTest XML parser benchmark tool

    [URL=http://xmlbench.sourceforge.net/]xmlbench[/URL], which is a XML parser benchmark tool for C parsers


       收藏   分享  
    顶(0)
      




    ----------------------------------------------

    -----------------------------------------------

    第十二章第一节《用ROR创建面向资源的服务》
    第十二章第二节《用Restlet创建面向资源的服务》
    第三章《REST式服务有什么不同》
    InfoQ SOA首席编辑胡键评《RESTful Web Services中文版》
    [InfoQ文章]解答有关REST的十点疑惑

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/5/17 9:57:00
     
     emmali808 美女呀,离线,快来找我吧!
      
      
      等级:大二(研究C++)
      文章:56
      积分:270
      门派:XML.ORG.CN
      注册:2008/3/11

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给emmali808发送一个短消息 把emmali808加入好友 查看emmali808的个人资料 搜索emmali808在『 XML基础 』的所有贴子 引用回复这个贴子 回复这个贴子 查看emmali808的博客2
    发贴心情 
    I get it! Thanks!
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2008/3/12 21:08:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 XML基础 』的所有贴子 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/4/28 20:19:21

    本主题贴数2,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    7,578.125ms