当前位置：首页 > news >正文

展示网站开发做网站自己

news 2025/10/14 5:08:03

展示网站开发,做网站自己,魔法自助建站,wordpress怎么做产品列表页我们来讲最常见的反反爬虫方法 import requests r requests.get(网页网址) print(r.requests.headers) 一.使用简单的方法把请求头改为真的浏览器模式 import requests link网页地址 heraders{User-Agent:} rrequests.get(link,headersheaders) print(r.requsts.headers)我们…我们来讲最常见的反反爬虫方法 import requests r requests.get(网页网址) print(r.requests.headers) 一.使用简单的方法把请求头改为真的浏览器模式 import requests link网页地址 heraders{User-Agent:} rrequests.get(link,headersheaders) print(r.requsts.headers)我们可以使用python的fake-uesragent可以容易的切换User-Agent pip install fake-uesragent from fake_useragent import UserAgent import requestslink uaUserAgent() hearders{User-Agent:} responserequests.grt(urlurl,headersheaders)print(response.status_code) print(r.request.headers) 这里可以使用ua.random实现随机变换headers。每次生成的伪装表名不一样。我们还需要在headers里面写上Host和Referer 二.我们爬取的时候应该设置一段的时间限制 import time t1time.time() time.sleep(2 t2time.time() total_timet2-t1 print(total_time)我们的时间应该不能确定为一个固定的值我们现在可以加入random模块来实现时间的随机性。 import random import timesleep_timerandom.randint(0,2)random.random print(sleep_time) time.sleep(sleep_time)现在我们可以把爬虫和时间间隔结合在一起了 import requests from bs4 import BeautifulSoup import time import randomlinkdef scrap(link):headers{User-Agent:}rrequests.get(link,headersheaders)hemlr.textsoupBeautifulSoup(html,ixml)return soup soupscrap(link) title_listsoup.find_all(h1,class_post-title) for eachone in title_list:urleachone.a[href]print(开始爬取,url)soup_artscrap(url)titlesoup_art.find(h1,class_view-title).text.strip()print(标题,title)sleep_timerandom.randint(0,2)random.random()print(开始休息,sleep_time,秒)time.sleep(sleep_time) 我们可以把爬取的放入文件里面

查看全文

http://www.lakalapos1.cn/news/69163/