ITEEDU

与Beautiful Soup类似的库

I've found several other parsers for various languages that can handle bad markup, do tree traversal for you, or are otherwise more useful than your average parser.
我已经找了几个其他的用于不同语言的可以处理烂标记的剖析器。简单介绍一下,也许对你有所帮助。

  • I've ported Beautiful Soup to Ruby. The result is Rubyful Soup.
  • Hpricot is giving Rubyful Soup a run for its money.
  • ElementTree is a fast Python XML parser with a bad attitude. I love it.
  • Tag Soup is an XML/HTML parser written in Java which rewrites bad HTML into parseable HTML.
  • HtmlPrag is a Scheme library for parsing bad HTML.
  • xmltramp is a nice take on a 'standard' XML/XHTML parser. Like most parsers, it makes you traverse the tree yourself, but it's easy to use.
  • pullparser includes a tree-traversal method.
  • Mike Foord didn't like the way Beautiful Soup can change HTML if you write the tree back out, so he wrote HTML Scraper. It's basically a version of HTMLParser that can handle bad HTML. It might be obsolete with the release of Beautiful Soup 3.0, though; I'm not sure.
  • Ka-Ping Yee's scrape.py combines page scraping with URL opening.