基于爬虫+Flask+Echarts+MySQL的网易云评论可视化大屏
- 1、前言
- 2、实现
- 2.1 挑选想要采集的歌曲评论
- 2.2 构建爬虫
- 2.2.1 采集歌曲评论
- 2.2.2 清洗数据入库
- 2.3 搭建flask框架
- 2.4 数据传值
- 2.5 完整代码&数据集获取
1、前言
本项目是基于requests爬虫+flask+echarts搭建的网易云评论的可视化大屏,主要涉及到的技术有爬虫、数据库操作,flask框架,echarts图表。
最终效果如下:
2、实现
2.1 挑选想要采集的歌曲评论
为了爬取歌曲评论,一想到的是网易云音乐,故从中挑选评论比较多的五首歌,分别是:
富士山下
爱情转移
孤独患者
葡萄成熟时
任我行
这五首歌,对应的网易云 id 是:
music_list = {65766: '富士山下', 65536: '爱情转移', 64093: '孤独患者', 66285: '葡萄成熟时', 27483202: '任我行'}
2.2 构建爬虫
2.2.1 采集歌曲评论
url = 'https://music.163.com/weapi/comment/resource/comments/get?csrf_token='
data = {
'params': encText,
'encSecKey': encSecKey
}
respond = requests.post(url, headers=headers, data=data)
json_data = json.loads(respond.text)
comments = json_data['data']['comments']
for per_comment in comments:
comment = per_comment['content']
user_name = per_comment['user']['nickname']
user_id = per_comment['user']['userId']
comment_time = per_comment['time']
timestamp = comment_time / 1000
dt_object = datetime.datetime.fromtimestamp(timestamp)
phone = per_comment.get('extInfo', {}).get('endpoint', {}).get('CLIENT_TYPE', '')
ip = per_comment['ipLocation']['location']
per_line = pd.DataFrame(
{'user_id': [user_id], 'comment': [comment], 'user_name': [user_name], 'time': [dt_object], 'ip': [ip],
'phone': [phone], 'music_id': [music_id], 'music_name': [music_list[music_id]]})
result = pd.concat([result, per_line])
last_time = comment_time
2.2.2 清洗数据入库
def process_csv(file_path):
# 读取 CSV 文件
df = pd.read_csv(file_path)
df = df.drop_duplicates()
# 数据库连接
connection = pymysql.connect(
host='127.0.0.1',
user='root',
password='123456',
database='music_comments'
)
try:
# 创建表(如果不存在)
create_table_if_not_exists(connection)
insert_query = """
INSERT INTO comments (user_id, comment, user_name, time, ip, phone, music_id, music_name)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
"""
with connection.cursor() as cursor:
for _, row in df.iterrows():
cursor.execute(insert_query, (
row['user_id'] if pd.notna(row['user_id']) else None,
row['comment'] if pd.notna(row['comment']) else None,
row['user_name'] if pd.notna(row['user_name']) else None,
row['time'] if pd.notna(row['time']) else None,
row['ip'] if pd.notna(row['ip']) else None,
row['phone'] if pd.notna(row['phone']) else None,
row['music_id'] if pd.notna(row['music_id']) else None,
row['music_name'] if pd.notna(row['music_name']) else None
))
connection.commit()
print("数据写入成功")
except Exception as e:
print(f"错误: {e}")
finally:
connection.close()
最后数据库中数据效果如下:
2.3 搭建flask框架
from flask import Flask,render_template
app = Flask(__name__)
app.config['JSON_AS_ASCII'] = False
@app.route('/')
def index():
return render_template('index.html')
if __name__ == '__main__':
app.run(debug=True)
2.4 数据传值
# 左上图
data_music_name = ['富士山下', '爱情转移', '孤独患者', '葡萄成熟时', '任我行']
data_count = [216651, 142228, 109656, 102065, 26275]
list_1_data = []
for i in range(0, len(data_music_name)):
music_name = data_music_name[i]
count = data_count[i]
list_1_data.append([music_name, count])
# 中上图
data2 = df['phone'].value_counts().reset_index()
data2 = data2[~data2['phone'].isnull()]
data_phone = data2['phone'].tolist()
data_count = data2['count'].tolist()
list_2_name = []
list_2_value = []
for i in range(0, len(data_phone)):
list_2_name.append(data_phone[i])
list_2_value.append(data_count[i])
# 右上图
text_df = df[~df['comment'].isnull()]
text = '。'.join(text_df['comment'].tolist())
words = jieba.cut(text)
word_counts = Counter(words)
filtered_counts = {word: count for word, count in word_counts.items()
if len(word) > 1 and re.match(r'^[\u4e00-\u9fa5]+$', word)}
most_common_words = Counter(filtered_counts).most_common(10)
most_common_words4 = Counter(filtered_counts).most_common(100)
list_3_name = []
list_3_value = []
for per_word in most_common_words:
list_3_name.append(per_word[0])
list_3_value.append(per_word[1])
# 左下图
data4 = df['ip'].value_counts().reset_index()
data4 = data4[~data4['ip'].isnull()]
data_ip = data4['ip'].tolist()
data_count = data4['count'].tolist()
list_4_name = []
list_4_value = []
# 显示全部
# for i in range(0, len(data_ip)):
# 显示前十
for i in range(0, 10):
list_4_name.append(data_ip[i])
list_4_value.append(data_count[i])
# 词云
wordcloud_data = [{"name": word, "value": count} for word, count in most_common_words4]
2.5 完整代码&数据集获取
完整代码&数据集可以点击最上面链接下载或者扫描下方👇🏻👇🏻👇🏻二维码获取,还有更多可视化大屏等着你:
001 服务大数据可视化监管平台
002 水质情况实时监测预警系统
003 联心菜市场数据中心
004 政务大数据共享交换平台
005 可视化监控管理
006 全国疫情实时监控
007 惠民服务平台
008 兰州智慧消防大数据平台
......