Fork me on GitHub

requests redirect bug

得之坦然,失之淡然,顺其自然,争其必然

fix一个requests redirect bug

起因

在我请求某个链接的时候,requests抛出了如下异常(我是python3环境):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Traceback (most recent call last):
File "F:\python\python36\lib\site-packages\IPython\core\interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-8-36f4e5f02f7e>", line 1, in <module>
resp = requests.get('http://xxx.com/xxx')
File "F:\python\python36\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "F:\python\python36\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "F:\python\python36\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "F:\python\python36\lib\site-packages\requests\sessions.py", line 644, in send
history = [resp for resp in gen] if allow_redirects else []
File "F:\python\python36\lib\site-packages\requests\sessions.py", line 644, in <listcomp>
history = [resp for resp in gen] if allow_redirects else []
File "F:\python\python36\lib\site-packages\requests\sessions.py", line 124, in resolve_redirects
url = self.get_redirect_target(resp)
File "F:\python\python36\lib\site-packages\requests\sessions.py", line 115, in get_redirect_target
return to_native_string(location, 'utf8')
File "F:\python\python36\lib\site-packages\requests\_internal_utils.py", line 25, in to_native_string
out = string.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 45: invalid continuation byte

看错误信息应该是重定向了,并且requests无法正常解码location字段

通过抓包看请求,发现重定向链接含有非ascii字符:

可以看到含有非ascii字符,且编码是gbk

分析

根据上面报错信息,查看requests\sessions.py的get_redirect_target函数的解码部分和:

1
2
3
4
5
6
7
8
9
10
11
12
if resp.is_redirect:
location = resp.headers['location']
# Currently the underlying http module on py3 decode headers
# in latin1, but empirical evidence suggests that latin1 is very
# rarely used with non-ASCII characters in HTTP headers.
# It is more likely to get UTF8 header rather than latin1.
# This causes incorrect handling of UTF8 encoded location headers.
# To solve this, we re-encode the location in latin1.
if is_py3:
location = location.encode('latin1')
return to_native_string(location, 'utf8')
return None

如果是python3,那么先使用latin1加密一次,接着调用了to_native_string,默认的就是’utf-8’,而最终的错误回溯也是在这里:

1
2
3
4
5
6
7
8
9
if isinstance(string, builtin_str):
out = string
else:
if is_py2:
out = string.encode(encoding)
else:
out = string.decode(encoding)

return out

如果是python3,会解码,encoding就是’utf-8’。

但是我的重定向链接编码不是utf-8而是gbk,所以无法正常解码。

解决

我的解决办法是对session的get_redirect_target方法做了patch:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import requests
from requests._internal_utils import to_native_string
from requests.compat import is_py3


def get_redirect_target(self, resp):
"""hook requests.Session.get_redirect_target method"""
if resp.is_redirect:
location = resp.headers['location']
if is_py3:
location = location.encode('latin1')
encoding = resp.encoding if resp.encoding else 'utf-8'
return to_native_string(location, encoding)
return None


def patch():
requests.Session.get_redirect_target = get_redirect_target

其他地方需要patch的,直接在使用requests请求之前调用patch方法就行了

-------------本文结束感谢您的阅读-------------

本文标题:requests redirect bug

文章作者:Longofo

发布时间:2019年01月06日 - 14:01

最后更新:2019年01月06日 - 15:01

原始链接:http://longofo.cc/requests redirect bug.html

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

请我吃包辣条也好啊!!!
分享到: