使用python访问网页

2023-01-31 05:35

短信预约 -IT技能 免费直播动态提醒

python版本：3

访问页面:

import urllib.request

url="https://blog.csdn.net/qq_33160790"
req=urllib.request.Request(url)
resp=urllib.request.urlopen(req)
data=resp.read().decode('utf-8')

print(data)

效果：
这里写图片描述

抓取csdn页面中文章的链接：
xpath语法可以看这篇文章：
http://www.w3school.com.cn/xpath/xpath_syntax.asp

from lxml import etree
import requests

url='https://blog.csdn.net/qq_33160790'
resp=requests.get(url)
if resp.status_code==requests.codes.ok:
        html=etree.HTML(resp.text)
        hrefs=html.xpath('////span[@class="link_title"]/a/@href')
        for href in hrefs:
                print href

效果：
这里写图片描述

打印出所有文章url：

from lxml import etree
import requests

for i in range(1,23):   #23 is equal to pagelist-1
        #print(i)
        url='https://blog.csdn.net/qq_33160790/article/list/'+str(i)
        resp=requests.get(url)
        if resp.status_code==requests.codes.ok:
                html=etree.HTML(resp.text)
                hrefs=html.xpath('////span[@class="link_title"]/a/@href')
                for href in hrefs:
                        print href

这里写图片描述

刷csdn点击脚本：
PS：url和23结合实际修改

from lxml import etree
import requests
import urllib.request

for i in range(1,23):   #23 is equal to pagelist-1
        #print(i)
        url='https://blog.csdn.net/qq_33160790/article/list/'+str(i)
        resp=requests.get(url)
        if resp.status_code==requests.codes.ok:
                html=etree.HTML(resp.text)
                hrefs=html.xpath('////span[@class="link_title"]/a/@href')
                for href in hrefs:
                        print (href)
                        req=urllib.request.Request(href)
                        data=urllib.request.urlopen(req).read()

免责声明：

① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的，并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据，供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。

② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

网页 python

阅读原文内容投诉