需求:
爬取三国演义小说的所有章节标题和章节内容(http://mathfunc.com/book/sanguoyanyi.html)

脚本:

import requests
from bs4 import BeautifulSoup

url="http://mathfunc.com/book/sanguoyanyi.html"
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36 Edg/88.0.705.74"}
page_text=requests.get(url,headers=headers).text
soup=BeautifulSoup(page_text,"lxml")
li_list=soup.select(".book-mulu>ul>li")
file=open("./三国演义.txt","w",encoding="utf-8")
for li in li_list:
title=li.a.string
detail_url="http://mathfunc.com"+li.a["href"]
detail_text=requests.get(detail_url,headers=headers).text
detail_soup=BeautifulSoup(detail_text,"lxml")
div_tag=detail_soup.find("div",class_="chapter_content")
content=div_tag.text
file.write(title+":"+content+"\n")
print(title,"爬取成功!")
file.close()


爬取效果: