爬虫练手之三国演义

需求：
爬取三国演义小说的所有章节标题和章节内容（http://mathfunc.com/book/sanguoyanyi.html）

脚本：

import requests
from bs4 import BeautifulSoup

url="http://mathfunc.com/book/sanguoyanyi.html"
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36 Edg/88.0.705.74"}
page_text=requests.get(url,headers=headers).text
soup=BeautifulSoup(page_text,"lxml")
li_list=soup.select(".book-mulu>ul>li")
file=open("./三国演义.txt","w",encoding="utf-8")
for li in li_list:
    title=li.a.string
    detail_url="http://mathfunc.com"+li.a["href"]
    detail_text=requests.get(detail_url,headers=headers).text
    detail_soup=BeautifulSoup(detail_text,"lxml")
    div_tag=detail_soup.find("div",class_="chapter_content")
    content=div_tag.text
    file.write(title+":"+content+"\n")
    print(title,"爬取成功！")
file.close()

爬取效果：