嗨喽~大家好呀,这里是魔王呐 ❤ ~!
python更多源码/资料/解答/教程等 点击此处跳转文末名片免费获取
开发环境:
-
python 3.8
-
pycharm 专业版
模块使用:
-
requests >>> 发送请求 第三方库 (需要安装)
-
parsel >>> 第三方库 用来提取网页源代码的
-
csv >>> 内置模块 无需安装
-
time >>> 内置模块 无需安装
模块安装:
win + R 输入cmd 输入安装命令 pip install 模块名 (如果你觉得安装速度比较慢, 你可以切换国内镜像源)
代码实现步骤
-
发送请求 (用代码 访问对应的网址)
-
获取数据
-
解析数据 (提取数据 将我们需要的内容提取出来 不需要的 丢弃)
-
保存数据 (表格文件)
代码展示
导入模块
'''
遇到问题没人解答?小编创建了一个Python学习交流QQ群:926207505
寻找有志同道合的小伙伴,互帮互助,群里还有不错的视频学习教程和PDF电子书!
'''
import requests # 发送请求 第三方库 (需要安装)
import parsel # 第三方库 用来提取网页源代码的
import csv # 内置模块 无需安装
import time
保存数据
with open("jingdong.csv", mode='w', newline='', encoding='utf-8') as f:
csv.writer(f).writerow(['title', 'price', 'shop', 'detail_url'])
模拟浏览器 <可修改> 只加ua得不到数据,考虑加其他的内容 cookie
headers = {
'Cookie': '__jdu=1675327822068798256204; shshshfpa=a8c4d3ab-4de2-1594-07c6-96937703bc48-1675511732; shshshfpx=a8c4d3ab-4de2-1594-07c6-96937703bc48-1675511732; shshshfp=df23b3178a68c52485e728025047439d; _pst=jd_7449b8b770c1a; unick=u_y14qxm7bysay; pin=jd_7449b8b770c1a; _tp=vZPPhy6cqARc6L2%2B3nOzUq3kCs2OWuApKpEwLezV01A%3D; b_dw=1903; b_dh=962; b_dpr=1; b_webp=1; b_avif=1; autoOpenApp_downCloseDate_auto=1698495726388_1800000; unpl=JF8EAMhnNSttW0IBBBhWGRsWHA9QW1pcQx4APWJSUlRbSABVE1dMQBJ7XlVdXxRLFx9sYxRXXFNLVQ4ZCisSEXteXVdZDEsWC2tXVgQFDQ8VXURJQlZAFDNVCV9dSRZRZjJWBFtdT1xWSAYYRRMfDlAKDlhCR1FpMjVkXlh7VAQrAhwUFEleUldeC0oQCmlvDFdZX0hVACsDKxUge21WX14NTh8zblcEZB8MF1cEEgsbGl1LWlJaXwtNHgBsZgJdW1BCVwEcARoXIEptVw; PCSYCityID=CN_430000_430100_0; thor=459E9A0707CDD36020E74D14717A705AD6CEE67A8D55FEDAACBD33B9D31511E639D728DAFB1FF36D36DE627F8F2F79845F92317DEDEAB842A76D839D99A84DA9F0E3F8B9DBF9C66BF47B74F66CCB0051E6C00FDBFD4545AD7396FD35D9DF2D4EE2B81CFD32ED986FBC3547605F3FA2EFD8C022688992015FFC079D1239A9636A3C6747E1A981BB7272167E6708A9D699AE6D7C17170909155A757473FE3F744E; flash=2_kGENmsBsM776mXDUIck9N5Hr7-RR1caB21t4eIczd1sA64NWUiuUcZCTP974PA7P1w5LFIs7Dq1LubLJleTXpdUeaJ6cT2ac-HBgzp8AdSo*; pinId=f_SKjtPUQ3D1_NrwwoSZkrV9-x-f3wj7; __jdv=76161171|baidu-pinzhuan|t_288551095_baidupinzhuan|cpc|0f3d30c8dba7459bb52f2eb5eba8ac7d_0_28d02e387fc546e982c4f7822ea9dfc3|1700914261968; joyytokem=babel_UEro4WAa7vEhMypakgZQQDqfZhUMDFDd2t1VTk5MQ==.ckBbRWxyQ1hEYnRCXQsQLzQuQmEsOgJMK3JbXVlkb0YVRytyCSo0IBQDMTQ0DjIYIQEyR1gcExkmIw0lIiMPAhp7JCkvZBYlCAwPAjYqNBQCCRU=.3511f139; joyya=1700914317.1700914320.22.01c6v1p; jsavif=1; __jda=122270672.1675327822068798256204.1675327822.1699364622.1700914240.14; __jdc=122270672; areaId=18; ipLoc-djd=18-1482-0-0; shshshsID=7f16a7856e408b5462889269ad6e0b3c_10_1700914558416; __jdb=122270672.9.1675327822068798256204|14.1700914240; 3AB9D23F7A4B3CSS=jdd03BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZUAAAAMMAZUHL4AAAAAADDF3D4TATRWIPYX; shshshfpb=AAkh2aAaMEsTTq03iFZQHxpaTdwO8SBZ1URcyaAAAAAA; 3AB9D23F7A4B3C9B=BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZU',
'Origin': 'https://search.jd.com',
'Referer': 'https://search.jd.com/',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
}
多页获取
'''
遇到问题没人解答?小编创建了一个Python学习交流QQ群:926207505
寻找有志同道合的小伙伴,互帮互助,群里还有不错的视频学习教程和PDF电子书!
'''
s = 1
for page in range(1, 121):
t = int(time.time() * 1000)
body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"' + str(
page) + '","s":"' + str(s) + '","click":"0","log_id":"1697547020245.6899","show_items":""}'
if page == 2:
s = 26
body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"' + str(
page) + '","s":"' + str(s) + '","click":"0","log_id":"1697547020245.6899","show_items":""}'
elif page > 2:
s += 30
if page % 2 == 0:
body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","page":"' + str(
page) + '","s":"' + str(
s) + '","scrolling":"y","log_id":"1697545127114.3155","tpl":"3_M","isList":0,"show_items":""}'
else:
body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"' + str(
page) + '","s":"' + str(s) + '","click":"0","log_id":"1697544397338.9790","show_items":""}'
params = {
'appid': 'search-pc-java',
'functionId': 'pc_search_s_new',
'client': 'pc',
'clientVersion': '1.0.0',
't': str(t),
'body': body,
'loginType': '3',
'uuid': '122270672.1675327822068798256204.1675327822.1699364622.1700914240.14',
'area': '18_1482_0_0',
'h5st': '20231125202253488;g5giig9tnm63gij2;f06cc;tk03wcf291d0c18nWWFtx6pen6ynAePyLgl8sAR4d4xue8lLvNwODA2L03Vzrmb5U4cXmr_rv9qY_zudES6M8bCoMT_o;cb9e846a92e17e24d3b20741663decba;4.1;1700914973488;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cb7cfd3202c38e987146e3ec8ef23fa4659401e38bc57b6c5c13359eb13bdf39a81072e7e1f5d36d7268e19d4c84529eae8e660648fc8bd86dcf267343f8398533e0beede6ef3a273f620464f176480ba2dee25aae40d89a8cb2b033d3274cf53eeb92de682f09f23e19412f37fa309bd2',
'x-api-eid-token': 'jdd03BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZUAAAAMMAZUHL4AAAAAADDF3D4TATRWIPYX',
}
请求网址
url = 'https://api.m.jd.com/'
- 发送请求 (访问网站)
response = requests.get(url=url, params=params, headers=headers)
- 提取数据 将需要的内容提取出来
html_data = response.text
怎么样提取网页源代码当中的内容
select = parsel.Selector(html_data)
# //ul[@class="gl-warp clearfix"]/li
拿到了每个商品所属的标签
'''
遇到问题没人解答?小编创建了一个Python学习交流QQ群:926207505
寻找有志同道合的小伙伴,互帮互助,群里还有不错的视频学习教程和PDF电子书!
'''
lis = select.xpath('//ul[@class="gl-warp clearfix"]/li')
for li in lis:
# li.xpath('string(.//div[@class="p-name p-name-type-2"])').get()
title = li.xpath('string(.//div[@class="p-name p-name-type-2"])').get("").strip()
price = li.xpath('string(.//div[@class="p-price"])').get("").strip()
shop = li.xpath('string(.//div[@class="p-shop"])').get("").strip()
detail_url = "https:" + li.xpath('.//div[@class="p-name p-name-type-2"]/a/@href').get("")
print(title, price, shop, detail_url)
- 保存数据
with open("jingdong.csv", mode='a', newline='', encoding='utf-8') as f:
csv.writer(f).writerow([title, price, shop, detail_url])
尾语
最后感谢你观看我的文章呐~本次航班到这里就结束啦 🛬
希望本篇文章有对你带来帮助 🎉,有学习到一点知识~
躲起来的星星🍥也在努力发光,你也要努力加油(让我们一起努力叭)。