Child Mind Institute - Detect Sleep States(2023年第一次Kaggle拿到了银牌总结)

感谢

感谢艾兄(大佬带队)、rich师弟(师弟通过这次比赛机械转码成功、耐心学习)、张同学(也很有耐心的在学习),感谢开源方案(开源就是银牌),在此基础上一个月不到收获到了很多,运气很好。这个是我们比赛的总结: 

我们队Kaggle CMI银牌方案,欢迎感兴趣的伙伴upvote:https://www.kaggle.com/competitions/child-mind-institute-detect-sleep-states/discussion/459610


计划 (系统>结果,稳健>取巧)

团队计划表,每个人做的那部分工作,避免重复,方便交流,提高效率,这个工作表起了很大的作用。


具体方案 

75th Place Detailed Solution - Spec2DCNN + CenterNet + Transformer + NMS

First of all, I would like to thank @tubotubo for sharing your high-quality code, and also thank my teammates @liruiqi577 @brickcoder @xtzhou for their contributions in the competition. Here, I am going to share our team’s “snore like thunder” solution from the following aspects:

  1. Data preprocessing
  2. Feature Engineering
  3. Model
  4. Post Processing
  5. Model Ensemble

1. Data preprocessing

We made EDA and readed open discussions found that there are 4 types of data anomalies:

  • Some series have a high missing rate and some of them do not even have any event labels;
  • In some series , there are no event annotations in the middle and tail (possibly because the collection activity has stopped);
  • The sleep record is incomplete (a period of sleep is only marked with onset or wakeup).
  • There are outliers in the enmo value.

To this end, we have some attempts, such as:

  • Eliminate series with high missing rates;
  • Cut the tail of the series without event labels;
  • Upper clip enmo to 1.

But the above methods didn't completely work. In the end, our preprocessing method was:

We split the dataset group by series into 5 folds. For each fold, we eliminate series with a label missing rate of 100% in the training dataset while without performing any data preprocessing on the validation set. This is done to avoid introducing noise to the training set, and to ensure that the evaluation results of the validation set are more biased towards the real data distribution, which improve our LB score + 0.006.

Part of our experiments as below:

ExperimentFold0Public (single fold)Private (5-fold)
No preprocess missing data0.7510.7180.744
Eliminate unlabeled data at the end of train_series & series with missing rate >80%0.7390.7090.741
Drop train series which don’t have any event labels0.7520.7240.749

2. Feature Engineering

  • Sensor features: After smoothing the enmo and anglez features, a first-order difference is made to obtain the absolute value. Then replace the original enmo and anglez features with these features, which improve our LB score + 0.01.
train_series['enmo_abs_diff'] = train_series['enmo'].diff().abs()
train_series['enmo'] = train_series['enmo_abs_diff'].rolling(window=5, center=True, min_periods=1).mean()
train_series['anglez_abs_diff'] = train_series['anglez'].diff().abs()
train_series['anglez'] = train_series['anglez_abs_diff'].rolling(window=5, center=True, min_periods=1).mean()
  • Time features: sin and cos hour.

In addition, we also made the following features based on open notebooks and our EDA, such as: differential features with different orders, rolling window statistical features, interactive features of enmo and anglez (such as anglez's differential abs * enmo, etc.), anglez_rad_sin/cos, dayofweek/is_weekend (I find that children have different sleeping habits on weekdays and weekends). But strangely enough, too much feature engineering didn’t bring us much benefit.

ExperimentFold0Public (5-fold)Private (5-fold)
anglez + enmo + hour_sin + hour_cos0.7630.7310.768
anglez_abs_diff + enmo_abs_diff + hour_sin + hour_cos0.7710.7410.781

3. Model

We used 4 models:

  • CNNSpectrogram + Spec2DCNN + UNet1DDecoder;
  • PANNsFeatureExtractor + Spec2DCNN + UNet1DDecoder.
  • PANNsFeatureExtractor + CenterNet + UNet1DDecoder.
  • TransformerAutoModel (xsmall, downsample_rate=8).

Parameter Tunning: Add more kernel_size 8 for CNNSpectrogram can gain +0.002 online.

Multi-Task Learning Objectives: sleep status, onset, wake.

Loss Function: For Spec2DCNN and TransformerAutoModel, we use BCE, but with multi-task target weighting, sleep:onset:wake = 0.5:1:1. The purpose of this is to allow the model to focus on learning the last two columns. We tried to train only for the onset and wake columns, but the score was not good. The reason is speculated that the positive samples in these two columns are sparse, and MTL needs to be used to transfer the information from positive samples in the sleep status to the prediction of sleep activity events. Also, I tried KL Loss but it didn't work that well.

self.loss_fn = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([0.5,1.,1.]))

At the same time, we adjusted epoch to 70 and added early stopping with patience=15. The early stopping criterion is the AP of the validation dataset, not the loss of the validation set. batch_size=32.

ExperimentFold0Public (single fold)Private (5-fold)
earlystop by val_loss0.7500.6970.742
earlystop by val_score0.7510.7180.744
loss_wgt = 1:1:10.7520.7240.749
loss_wgt = 0.5:1:10.7550.7230.753

Note: we used the model_weight.pth with the best offline val_score to submit our LB instead of using the best_model.pth with the best offline val_loss。

4. Post Processing

Our post-processing mainly includes:

  • find_peaks(): scipy.signal.find_peaks;
  • NMS: This task can be treated as object detection. [onset, wakeup] is regarded as a bounding boxes, and score is the confident of the box. Therefore, I used a time-series NMS. Using NMS can eliminate redundant boxes with high IOU, which increase our AP.
def apply_nms(dets_arr, thresh):
    x1 = dets_arr[:, 0]
    x2 = dets_arr[:, 1]
    scores = dets_arr[:, 2]

    areas = x2 - x1
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)
        xx1 = np.maximum(x1[i], x1[order[1:]])
        xx2 = np.minimum(x2[i], x2[order[1:]])
        inter = np.maximum(0.0, xx2 - xx1 + 1)
        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]

    dets_nms_arr = dets_arr[keep,:]
    onset_steps = dets_nms_arr[:, 0].tolist()
    wakeup_steps = dets_nms_arr[:, 1].tolist()
    nms_save_steps = np.unique(onset_steps + wakeup_steps).tolist()
    return nms_save_steps

In addition, we set score_th=0.005 (If it is set too low, a large number of events will be detected and cause online scoring errors, so it is fixed at 0.005 here), and use optuna to simultaneously search the parameter distance in find_peaks and the parameter iou_threshold of NMS. Finally, when distance=72 and iou_threshold=0.995, the best performance is achieved.

import optuna

def objective(trial):
    score_th = 0.005 # trial.suggest_float('score_th', 0.003, 0.006)
    distance = trial.suggest_int('distance', 20, 80)
    thresh = trial.suggest_float('thresh', 0.75, 1.)

    # find peak
    val_pred_df = post_process_for_seg(
        keys=keys,
        preds=preds[:, :, [1, 2]],
        score_th=score_th,
        distance=distance,
    )

    # nms
    val_pred_df = val_pred_df.to_pandas()
    nms_pred_dfs = NMS_prediction(val_pred_df, thresh, verbose=False)
    score = event_detection_ap(valid_event_df.to_pandas(), nms_pred_dfs)
    return -score

study = optuna.create_study()
study.optimize(objective, n_trials=100)
print('Best hyperparameters: ', study.best_params)
print('Best score: ', study.best_value)
ExperimentFold0Pubic (5-fold)Private (5-fold)
find_peak-0.7450.787
find_peak+NMS+optuna-0.7460.789

5. Model Ensemble

Finally, we average the output probabilities of the following models and then feed into the post processing methods to detect events. By the way, I tried post-processing the detection events for each model and then concating them, but this resulted in too many detections. Even with NMS, I didn't get a better score.

The number of ensemble models: 4 (types of models) * 5 (fold number) = 20.

ExperimentFold0Pubic (5-fold)Private (5-fold)
model1: CNNSpectrogram + Spec2DCNN + UNet1DDecoder0.772090.7430.784
model2: PANNsFeatureExtractor + Spec2DCNN + UNet1DDecoder0.7770.7430.782
model3: PANNsFeatureExtractor + CenterNet + UNet1DDecoder0.759680.6340.68
model4: TransformerAutoModel0.74680--
model1 + model2(1:1)-0.7460.789
model1 + model2+model3(1:1:0.4)-0.750.786
model1 + model2+model3+model4(1:1:0.4:0.2)0.7520.787

Unfortunately, we only considered CenterNet and Transformer to model ensemble with a tentative attitude on the last day, but surprisingly found that a low-CV-scoring model still has a probability of improving final performance as long as it is heterogeneous compared with your previous models. But we didn’t have more opportunities to submit more, which was a profound lesson for me.

Thoughts not done:

  • Data Augmentation: Shift the time within the batch to increase more time diversity and reduce dependence on hour features.

  • Model: Try more models. Although we try transformer and it didn’t work for us. I am veryyy looking forward to the solutions from top-ranking players.

Thanks again to Kaggle and all Kaggle players. This was a good competition and we learned a lot from it. If you think our solution is useful for you, welcome to upvote and discuss with us.

In addition, this is my first 🥈 silver medal. Thank you everyone for letting me learn a lot. I will continue to work hard. :)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/230432.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

汽车电子智能保险丝解决方案

一、背景知识 在过去的几十年里&#xff0c;电子在汽车系统创新中发挥了关键作用。新型半导体器件具有新颖的功能&#xff0c;增强了车辆机械系统提供的功能。 虽然半导体解决方案和电子产品将继续在汽车电子产品中发挥关键作用&#xff0c;但展望未来&#xff0c;汽车创新将…

SQL Server 2017数据库window server服务器改名操作

在window服务器修改机器名重新加域后&#xff0c;需要执行下面的SQL语句修改数据库里面记录的机器名字&#xff0c;才能在修改后通过新名字连接数据库。 if serverproperty(servername) <> servername begin declare server sysname set server ser…

【hugging face】bitsandbytes中8 bit量化的理解

8 位量化使数十亿参数规模的模型能够适应更小的硬件&#xff0c;而不会降低性能。 8 位量化的工作原理如下&#xff1a; 1.从输入隐藏状态中按列提取较大值&#xff08;离群值&#xff09;。 2.对 FP16 中的离群值和 int8 中的非离群值执行矩阵乘法。 3.改变非异常值结果以将值…

一文讲解如何从 Clickhouse 迁移数据至 DolphinDB

ClickHouse 是 Yandex 公司于2016年开源的 OLAP 列式数据库管理系统&#xff0c;主要用于 WEB 流量分析。凭借面向列式存储、支持数据压缩、完备的 DBMS 功能、多核心并行处理的特点&#xff0c;ClickHouse 被广泛应用于广告流量、移动分析、网站分析等领域。 DolphinDB 是一款…

Java--1v1双向通信-控制台版

文章目录 前言客户端服务器端输出线程端End 前言 TCP&#xff08;Transmission Control Protocol&#xff09;是一种面向连接的、可靠的网络传输协议&#xff0c;它提供了端到端的数据传输和可靠性保证。 本程序就是基于tcp协议编写而成的。 利用 TCP 协议进行通信的两个应用…

持续集成交付CICD:CentOS 7 安装 Nexus 3.63

目录 一、实验 1.CentOS 7 安装Nexus3.63 二、问题 1.安装Nexus报错 2.Nexus启动停止相关命令 一、实验 1.CentOS 7 安装Nexus3.63 &#xff08;1&#xff09;当前操作系统版本&JDK版本 cat /etc/redhat-releasejava -version&#xff08;2&#xff09;下载Nexus新…

Linux--学习记录(2)

解压命令&#xff1a; gzip命令&#xff1a; 参数&#xff1a; -k&#xff1a;待压缩的文件会保留下来&#xff0c;生成一个新的压缩文件-d&#xff1a;解压压缩文件语法&#xff1a; gzip -k pathname(待压缩的文件夹名)gzip -kd name.gz&#xff08;待解压的压缩包名&#x…

腾讯地图系列(二):微信小程序添加插件(三种方法)以及插件AppId获取

目录 第一章 前言 第二章 添加插件 2.1 微信小程序添加插件方法一&#xff08;微信公众平台添加插件&#xff09; 2.2 微信小程序添加插件方法二&#xff08;通过项目配置添加插件&#xff09; 2.3 微信小程序添加插件方法三&#xff08;微信公众平台服务市场添加插件&…

降维技术——PCA、LCA 和 SVD

一、说明 降维在数据分析和机器学习中发挥着关键作用&#xff0c;为高维数据集带来的挑战提供了战略解决方案。随着数据集规模和复杂性的增长&#xff0c;特征或维度的数量通常变得难以处理&#xff0c;导致计算需求增加、潜在的过度拟合和模型可解释性降低。降维技术通过捕获数…

Android 等待view 加载布局完成 (包括动态生成View)

前言 在实际开发中&#xff0c;有很多组件需要 根据数据&#xff0c;动态生成&#xff0c;或者 追加 / 减少 子view&#xff0c;由于View布局需要时间&#xff0c;此时想要获取父View的最新宽高值&#xff0c;要么手动测量&#xff0c;要么等待布局完成后再获取&#xff1b; …

actitivi自定义属性(二)

声明&#xff1a;此处activiti版本为6.0 此文章介绍后端自定义属性解析&#xff0c;前端添加自定义属性方法连接&#xff1a;activiti自定义属性&#xff08;一&#xff09;_ruoyi activiti自定义标题-CSDN博客 1、涉及到的类如下&#xff1a; 简介&#xff1a;DefaultXmlPar…

【数据结构】C语言实现:栈(Stack)与队列(Queue)

栈与队列 栈栈的概念及其结构栈的顺序结构&#xff08;杯子&#xff09;栈的相关接口 栈的实现顺序栈定义栈结构&#xff1a;头文件 (Stack.h)初始化 (StackInit)销毁 (StackDestroy)压栈 (StackPush)出栈(StackPop)读取栈顶元素 (StackTop)判断空栈(StackEmpty)栈元素个数(Sta…

Github入门教程之高效搜索和查看需要的项目

对咱们新入门的小白来说&#xff0c;前两天手把手注册 Github 账号的任务已经完成&#xff0c;接下来&#xff0c;学习如何高效搜索和查看自己感兴趣的内容。 下面是之前教程传送门 超详细GitHub注册和登录教程-CSDN博客 一. 搜索 可以在页面左上角「Search or jump to ...」…

C语言WFC实现绘制Lagrange插值多项式曲线的函数

前言&#xff08;引用&#xff09;&#xff1a; 拉格朗日多项式插值 插值方法有许多&#xff0c;常用的、基本的有&#xff1a;拉格朗日多项式插值、牛顿插值、分段线插值、Hermite插值和三次样条插值。这里只将一下拉格朗日多项式插值法&#xff1a; 方法应用 通缩点说&…

思维链(CoT)提出者 Jason Wei:关于大语言模型的六个直觉

文章目录 一、前言二、主要内容三、总结 &#x1f349; CSDN 叶庭云&#xff1a;https://yetingyun.blog.csdn.net/ 一、前言 Jason Wei 的主页&#xff1a;https://www.jasonwei.net/ Jason Wei&#xff0c;一位于 2020 年从达特茅斯学院毕业的杰出青年&#xff0c;随后加盟了…

人工智能|深度学习——知识蒸馏

一、引言 1.1 深度学习的优点 特征学习代替特征工程&#xff1a;深度学习通过从数据中自己学习出有效的特征表示&#xff0c;代替以往机器学习中繁琐的人工特征工程过程&#xff0c;举例来说&#xff0c;对于图片的猫狗识别问题&#xff0c;机器学习需要人工的设计、提取出猫的…

STM32F1之CAN介绍

目录 ​编辑 1. CAN 是什么&#xff1f; 2. 总线拓扑图 3. CAN 的特点 4. CAN 协议的基本概念 1. CAN 是什么&#xff1f; CAN 是 Controller Area Network 的缩写&#xff08;以下称为 CAN&#xff09;&#xff0c;是 ISO*1 国际标准化的串行通信协议。 在当前的汽车产…

【LeetCode每日一题合集】2023.11.27-2023.12.3 (⭐)

文章目录 907. 子数组的最小值之和&#xff08;单调栈贡献法&#xff09;1670. 设计前中后队列⭐&#xff08;设计数据结构&#xff09;解法1——双向链表解法2——两个双端队列 2336. 无限集中的最小数字解法1——维护最小变量mn 和 哈希表维护已经去掉的数字解法2——维护原本…

【UE5】监控摄像头效果(上)

目录 效果 步骤 一、视角切换 二、摄像头画面后期处理 三、在场景中显示摄像头画面 效果 步骤 一、视角切换 1. 新建一个Basic关卡&#xff0c;添加第三人称游戏资源到项目浏览器 2. 新建一个Actor蓝图&#xff0c;这里命名为“BP_SecurityCamera” 打开“BP_Securit…

90. 子集 II

90. 子集 II 原题链接&#xff1a;完成情况&#xff1a;解题思路&#xff1a;参考代码&#xff1a;错误经验吸取 原题链接&#xff1a; 90. 子集 II https://leetcode.cn/problems/subsets-ii/description/ 完成情况&#xff1a; 解题思路&#xff1a; /** * 包含重复元素…