cookie——可以理解为,记录为登录状态。如果在登录一个网站之后,想拿到信息发现404了,就是没有加cookie在这个header里。
下图加了cookie和没有加的对比(我是用了selenuim自动化登录的):
下面是加了的
这个就进入了。
下面是古诗文网的登录和获取。
import time
import ddddocr
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
zhanghao = "19894604325"
mima = "lxh258258"
url = 'https://so.gushiwen.cn/user/login.aspx?from=http://so.gushiwen.cn/user/collect.aspx'
wd = webdriver.Edge()
wd.implicitly_wait(10)
wd.get(url)
time.sleep(1)
wd.find_element(By.CSS_SELECTOR,'#email').send_keys(f'{zhanghao}')
wd.find_element(By.CSS_SELECTOR,'#pwd').send_keys(f'{mima}')
#验证码
img = wd.find_element(By.CSS_SELECTOR,'#imgCode')
#img.screenshot_as_png
with open('gushiwen.png',mode='wb') as f:
f.write(img.screenshot_as_png)
#分析数据
ocr = ddddocr.DdddOcr() #ocr实例对象
code_text = ocr.classification(img.screenshot_as_png)
wd.find_element(By.CSS_SELECTOR,'#code').send_keys(f"{code_text}")
wd.find_element(By.CSS_SELECTOR,'#denglu').click()
time.sleep(6)
#爬取
url2 = 'https://so.gushiwen.cn/user/collect.aspx'
headers = {
'User-Agent':
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36 Edg/121.0.0.0',
'Cookie':
'login=flase; ticketStr=202945020%7cgQEZ8TwAAAAAAAAAAS5odHRwOi8vd2VpeGluLnFxLmNvbS9xLzAyUFlublFubGVkN2kxLTlpUE5CMTMAAgSJRcxlAwQAjScA; ASP.NET_SessionId=o42gt4a0v1fyinfcy1y4w0k5; codeyzgswso=6501cb6880be4877; gsw2017user=5612449%7cCB284E64EB2D7536EBF09392CC9AE0CF%7c2000%2f1%2f1%7c2000%2f1%2f1; login=flase; wxopenid=defoaltid; gswZhanghao=19894604325; gswPhone=19894604325; idsShiwen2017=%2c102533%2c109816%2c53504%2c12578%2c'
}
response = requests.get(url2,headers=headers).text
with open('guhsi.html','w',encoding='utf-8') as fp:
fp.write(response)
elem = wd.find_element(By.CSS_SELECTOR,'a[style=" float:left;"]')
print(f"\n{elem.text}")
wd.quit()
这里值得一提的,写入文件的方式。
1.我以前使用,这个形式。上面的和这个不同,总结一下,上面的更容易记忆,但都一样。
f = open('文件名.txt', mode='a', encoding='utf-8')
f.write(Mcontent)
2.验证码部分用了ddddcor
#验证码
img = wd.find_element(By.CSS_SELECTOR,'#imgCode')
#img.screenshot_as_png
with open('gushiwen.png',mode='wb') as f:
f.write(img.screenshot_as_png)
#分析数据
ocr = ddddocr.DdddOcr() #ocr实例对象
code_text = ocr.classification(img.screenshot_as_png)
将图片储存然后识别
============qq登录==========不完善
到最后还有手机号验证码,给我搞懵了----------------------验证码还是用“云打码”(我不会用嘿嘿嘿!以后在仔细研究一下)
import time
import ddddocr
from selenium import webdriver
from selenium.webdriver.common.by import By
num = '2488220557'
password = ' '
url = 'https://www.baidu.com/'
wd = webdriver.Edge()
#饶过检测
wd.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument",
{"source":"""Object.defineProperty(navigator,'webdriver',{get:()=>undefined})"""})
wd.implicitly_wait(5)
wd.get(url)
mainWindow = wd.current_window_handle
#wd.switch_to.window(mainWindow)
element = wd.find_element(By.CSS_SELECTOR,'#kw')
time.sleep(1)
element.send_keys('qq\n')
element2 = wd.find_element(By.CSS_SELECTOR,'.c-container')
element3 = element2.find_element(By.CSS_SELECTOR,'a[target="_blank"]').click()
time.sleep(2)
for handle in wd.window_handles:
wd.switch_to.window(handle)
print(wd.title)
if '轻松' in wd.title:
break
elements = wd.find_element(By.NAME,'im.qq.com.login')
elements.click()
time.sleep(1)
wd.switch_to.frame(wd.find_element(By.CSS_SELECTOR,'iframe[name="frame-login"]'))
time.sleep(1)
wd.find_element(By.CSS_SELECTOR,'#switcher_plogin').click()
time.sleep(1)
element4 = wd.find_element(By.CSS_SELECTOR,'#u').send_keys(f'{num}')
element5 = wd.find_element(By.CSS_SELECTOR,'#p').send_keys(f'{password}')
time.sleep(2)
wd.find_element(By.CSS_SELECTOR,'#login_button').click()
time.sleep(100)
1.
2.
3.
4.
5.
注意这里切换了frame
这里用了这几天学到基本所有,因为CSS的强大所以,就全用了CSS。
1.这是一个绕过检测的代码,可以没有用,可以不用写(等待返回要写哦)
#饶过检测
wd.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument",
{"source":"""Object.defineProperty(navigator,'webdriver',{get:()=>undefined})"""})
wd.implicitly_wait(5)
2.发起请求,并切储存当前窗口(没啥意义,就是为了复习知识点)
wd.get(url)
mainWindow = wd.current_window_handle
3.这个特别值得说:
浏览器切换窗口,你看到切换了但是这个‘wd’是没有切换到,你看到的这个网页的,所以要切换,注意!!!!!!!!!!!!!!!!!!!!!!
如果你觉得你得的没有问题,看是不是浏览器窗口没切换,还是frame窗口没切换!!!
for handle in wd.window_handles:
wd.switch_to.window(handle)
print(wd.title)
if '轻松' in wd.title:
break
4.切换为frame窗口(注意!!!)
wd.switch_to.frame(wd.find_element(By.CSS_SELECTOR,'iframe[name="frame-login"]'))
time.sleep(1)
没啥了,就是验证码了,b站有好多视频,其实只要找‘云打码’这个网站怎么用,就行。这个平台很全面。