与Beautiful Soup类似的库
I've found several other parsers for various languages that can
handle bad markup, do tree traversal for you, or are otherwise more
useful than your average parser.
我已经找了几个其他的用于不同语言的可以处理烂标记的剖析器。简单介绍一下,也许对你有所帮助。
- I've ported Beautiful Soup to Ruby. The result is Rubyful Soup.
- Hpricot is
giving Rubyful Soup a run for its money.
- ElementTree is a fast Python XML parser with a bad attitude. I love it.
- Tag Soup is
an XML/HTML parser written in Java which rewrites bad HTML into
parseable HTML.
- HtmlPrag is a
Scheme library for parsing bad HTML.
- xmltramp is a
nice take on a 'standard' XML/XHTML parser. Like most parsers, it
makes you traverse the tree yourself, but it's easy to use.
- pullparser includes a tree-traversal method.
- Mike Foord didn't like the way Beautiful Soup can change HTML if
you write the tree back out, so he wrote HTML
Scraper. It's basically a version of HTMLParser that can handle
bad HTML. It might be obsolete with the release of Beautiful Soup 3.0,
though; I'm not sure.
- Ka-Ping Yee's scrape.py combines page
scraping with URL opening.