无名 发表于 2022-5-8 18:54:32

【代码分享】Python正则爬取数据

所需包 request re

爬取豆瓣电影高分榜

import requests
import re
headers = {
      'User-Agent': 'Mozilla/5.0 (Linux; U; Android 8.1.0; zh-cn; BLA-AL00 Build/HUAWEIBLA-AL00) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.132 MQQBrowser/8.9 Mobile Safari/537.36'
}

url = 'https://m.douban.com/doulist/240962/'

response = requests.get(url)
results = re.findall('.*?href="(.*?)".*?cover.*?src="(.*?)"\salt="(.*?)">.*?(.*?).*?meta.*?>(.*?).*?recommend.*?>(.*?)',response.text,re.S)
for result in results:
      #print(result)
      print(result,result,result,result,result)
http://cdn.u1.huluxia.com/g3/M01/BB/B1/wKgBOV2DQy-AUpldAANF7MAZM-o353.jpghttp://cdn.u1.huluxia.com/g3/M01/BB/B1/wKgBOV2DQzCAfr0tAAHz3uKA3Kk241.jpghttp://cdn.u1.huluxia.com/g3/M01/BB/B1/wKgBOV2DQzKAHyNTAAJwm7g5g1o156.jpg
页: [1]
查看完整版本: 【代码分享】Python正则爬取数据