Requests 是用Python语言编写HTTP客户端库,跟urllib、urllib2类似,基于 urllib,但比 urllib 更加方便,可以节约我们大量的工作,完全满足 HTTP 测试需求,编写爬虫和测试服务器响应数据时经常会用到。
Requests 的哲学是以 PEP 20 的习语为中心开发的,所以它比 urllib 更加 Pythoner,更重要的一点是它支持 Python3
Beautiful is better than ugly. (美丽优于丑陋)
Explicit is better than implicit. (清楚优于含糊)
Simple is better than complex. (简单优于复杂)
Complex is better than complicated. (复杂优于繁琐)
Readability counts. (重要的是可读性)
Requests 官网:
https://pypi.python.org/pypi/requests
安装 Requests
方式1)pip 安装
pip install requests # python2.7pip3 install requests # python3.6
方式2)源码安装
下载 requests-2.18.2.tar.gz
解压安装:
tar zxvf requests-2.18.2.tar.gzcd requestspython setup.py install
验证安装:
$ python
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 12:39:47)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> requests
<module 'requests' from '/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/__init__.pyc'>
使用 Requests
requests 封装了Python的urlib和urllib2,所以爬取网页非常简洁
1. 爬取网页
import requests
# 爬取米扑科技首页
req = requests.get(url = 'http://mimvp.com')
print("status_code : " + str(req.status_code))
print("mimvp text : " + req.text)
# 爬取米扑代理(含请求参数)
req = requests.get(url='http://proxy.mimvp.com/free.php', params={'proxy':'out_tp','sort':'p_ping'})
print("status_code : " + str(req.status_code))
print("mimvp text : " + req.text)
爬取网页非常简洁吧,Python urllib 三行代码,requests只需要一行代码搞定
urllib : urllib.urlopen('http://mimvp.com').read()
urllib2: urllib2.urlopen('http://mimvp.com').read()
或者
import urllib2
req = urllib2.Request('http://mimvp.com')
res = urllib2.urlopen(req)
page = res.read()
requests 接口格式:
requests.get('https://mimvp.com/timeline.json') # GET请求
requests.post('http://mimvp.com/post') # POST请求
requests.put('http://mimvp.com/put') # PUT请求
requests.delete('http://mimvp.com/delete') # DELETE请求
requests.head('http://mimvp.com/get') # HEAD请求
requests.options('http://mimvp.com/get') # OPTIONS请求
requests 接口示例:
import requests
requests.get('http://mimvp.com', params={'love': 'mimvp'}) # GET参数实例
requests.post('http://mimvp.com', data={'love': 'mimvp'}) # POST参数实例
Requests 设置代理
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
#
# Python requests 支持 http、https、socks4、socks5
#
# 米扑代理示例:
# http://proxy.mimvp.com/demo2.php
#
# 米扑代理购买:
# http://proxy.mimvp.com
#
# mimvp.com
# 2016-09-16
import requests
import ssl
import socks, socket # 需要引入socks.py文件,请到米扑代理下载
mimvp_url = "http://proxy.mimvp.com/exist.php"
mimvp_url2 = "https://proxy.mimvp.com/exist.php"
mimvp_url3 = "https://apps.bdimg.com/libs/jquery-i18n/1.1.1/jquery.i18n.min.js"
# 使用代理 http, https
proxies = {
"http" : "http://120.77.155.249:8888",
"https" : "http://54.255.211.38:80",
}
req = requests.get(mimvp_url2, proxies=proxies, timeout=30, verify=False)
print("mimvp text : " + req.text)
# 使用代理 socks4
proxies = {
'socks4' : '163.121.188.2:4000',
}
socks4_ip = proxies['socks4'].split(":")[0]
socks4_port = int(proxies['socks4'].split(":")[1])
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS4, socks4_ip, socks4_port)
socket.socket = socks.socksocket
req = requests.get(mimvp_url2, timeout=30, verify=False)
print("mimvp text : " + req.text)
# 使用代理 socks5
proxies = {
'socks5' : '190.9.58.211:45454',
}
socks5_ip = proxies['socks5'].split(":")[0]
socks5_port = int(proxies['socks5'].split(":")[1])
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, socks5_ip, socks5_port)
socket.socket = socks.socksocket
req = requests.get(mimvp_url2, timeout=30, verify=False)
print("mimvp text : " + req.text)
本示例采用的米扑代理,支持 http、https、socks4、socks5等多种协议,覆盖全球120多个国家,中国34个省市
推荐米扑代理: http://proxy.mimvp.com
参考推荐:
Python pyspider 安装与开发
Python3 urllib 用法详解
PhantomJS 安装与开发
Node.js 安装与开发
Node.js SuperAgent 安装与开发