要是你引用了一个元素,你可以使用extract
将它从树中抽离。
下面是将所有的注释从文档中移除的代码:
from BeautifulSoup import BeautifulSoup, Comment soup = BeautifulSoup("""1<!--The loneliest number--> <a>2<!--Can be as bad as one--><b>3""") comments = soup.findAll(text=lambda text:isinstance(text, Comment)) [comment.extract() for comment in comments] print soup # 1 # <a>2<b>3</b></a>
这段代码是从文档中移除一个子树:
from BeautifulSoup import BeautifulSoup soup = BeautifulSoup("<a1></a1><a><b>Amazing content<c><d></a><a2></a2>") soup.a1.nextSibling # <a><b>Amazing content<c><d></d></c></b></a> soup.a2.previousSibling # <a><b>Amazing content<c><d></d></c></b></a> subtree = soup.a subtree.extract() print soup # <a1></a1><a2></a2> soup.a1.nextSibling # <a2></a2> soup.a2.previousSibling # <a1></a1>
extract
方法将一个剖析树分离为两个不连贯的树。naviation的成员也因此变得看起来好像这两个树
从来不是一起的。
soup.a1.nextSibling # <a2></a2> soup.a2.previousSibling # <a1></a1> subtree.previousSibling == None # True subtree.parent == None # True