CIKM 2025 数据集主题论文整理
总目录 大模型安全研究论文整理 2026年版https://blog.csdn.net/WhiffeYF/article/details/159047894CIKM 2025 数据集主题论文整理来源CIKM 2025 Resource Papers 轨道共 145 篇接收论文官方接收论文页https://cikm2025.org/program/accepted-papersACM 论文集https://dl.acm.org/doi/proceedings/10.1145/3746252本表共梳理 39 篇以发布数据集 / 基准数据集 / 语料库为核心贡献的论文。序号论文标题简介其它1C-FAITH: A Chinese Fine-Grained Benchmark for Automated Hallucination Evaluation面向中文场景构建的细粒度自动幻觉评测基准用于评估大模型在中文知识上的幻觉表现。rp02212Pet-Bench: Benchmarking the Abilities of Large Language Models as E-Pets in Social Network Services围绕电子宠物社交场景构建评测基准衡量大模型扮演宠物角色的能力。rp03083A Large-Scale Dataset of Interactions Between Weibo Users and Platform-Empowered LLM发布微博用户与平台部署 LLM 的大规模真实交互数据用于研究人机社交对话。rp06174From Rules to Flexibility: A Resource and Method for SEC Item Extraction in Post-2021 10-K Filings面向 2021 年新规后 10-K 财报的条目抽取资源与配套方法。rp10365ReDSM5: A Reddit Dataset for DSM-5 Depression Detection基于 DSM-5 标准从 Reddit 标注的抑郁检测数据集。rp11046PersonaGen: A Persona-Driven Open-Ended Machine-Generated Text Dataset以人格画像驱动生成的开放式机器文本数据集可用于检测与风格研究。rp11097ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph基于电商知识图谱评测大模型电商任务能力的基准。rp12618S2Cap: A Benchmark and a Baseline for Singing Style Captioning首个歌唱风格自动描述基准与基线模型。rp13289RottenReviews: Benchmarking Review Quality with Human and LLM-Based Judgments融合人工与 LLM 判别的评论质量评测数据集。rp134310The Yelp Collaborative Knowledge Graph基于 Yelp 数据构建的协同过滤知识图谱资源。rp138711TalkDep: Clinically Grounded LLM Personas for Conversation-Centric Depression Screening临床基础上的 LLM 人格集用于对话式抑郁筛查。rp201112E2MoCase: A Dataset for Emotional, Event and Moral Observations in News Articles on High-impact Legal Cases高影响力法律案件新闻中的情感、事件与道德标注数据集。rp217413IARD: Intruder Activity Recognition Dataset for Threat Detection面向威胁检测的入侵者行为识别数据集。rp247814VideoAVE: A Multi-Attribute Video-to-Text Attribute Value Extraction Dataset and Benchmark Models多属性视频到文本属性值抽取数据集及基线模型。rp320815TSD-CT: A Benchmark Dataset for Truthfulness Stance Detection真实性立场检测的基准数据集。rp321416QueryBridge: One Million Annotated Questions with SPARQL Queries — Dataset for Question Answering over Knowledge Graph百万级自然语言问句与 SPARQL 对齐的 KGQA 数据集。rp327917RuSemCor: a Word Sense Disambiguation corpus for Russian面向俄语的词义消歧语料库。rp336918NLP-QA: A Large-scale Benchmark for Informative Question Answering over Natural Language Processing Documents面向 NLP 文献的大规模信息型问答基准。rp356619Datasets for Supervised Adversarial Attacks on Neural Rankers用于监督式对抗攻击神经排序模型的数据集集合。rp387220Maneno Yetu: Dynamic Corpus Construction and Pretraining for Swahili NLP斯瓦希里语动态语料构建与预训练资源。rp464321YTCommentVerse: A Multi-Category Multi-Lingual YouTube Comment Corpus跨类别、跨语种的 YouTube 评论语料库。rp559122SeLeRoSa: Sentence-Level Romanian Satire Detection Dataset句子级罗马尼亚语讽刺检测数据集。rp628123Portuguese post-OCR Resources for Text Optimisation用于文本优化的葡萄牙语 OCR 后处理资源。rp663624FediData: A Comprehensive Multi-Modal Fediverse Dataset from Mastodon来自 Mastodon 的多模态联邦宇宙综合数据集。rp682325Multimodal Banking Dataset: Understanding Client Needs through Event Sequences通过事件序列建模客户需求的多模态银行业数据集。rp687726CSMD: Curated Multimodal Dataset for Chinese Stock Analysis面向中文股票分析的多模态精编数据集。rp695227Real-E: A Foundation Benchmark for Advancing Robust and Generalizable Electricity Forecasting推动鲁棒、可泛化电力预测的基础数据基准。rp695428HUSK: A Hierarchically Structured Urban Knowledge Graph Dataset for Multi-Level Spatial Tasks用于多层级空间任务的层次化城市知识图谱数据集。rp738629RapidDamageNarratives: An Ontology-Aligned Corpus of Commonsense-Tagged Early Damage Reports for Spatio-Temporal Infrastructure Recovery Prioritisation本体对齐、常识标注的早期灾损报告语料库服务于时空基础设施恢复优先级。rp742130A Use-Case Specific Dataset for Measuring Dimensions of Responsible Performance in LLM-generated Text衡量 LLM 生成文本负责任表现多维度的特定用例数据集。rp759331FinS-Pilot: A Benchmark for Online Financial RAG System面向在线金融 RAG 系统的评测基准。rp763232When Words Can’t Capture It All: Towards Video-Based User Complaint Text Generation with Multimodal Video Complaint Dataset面向视频用户投诉文本生成的多模态视频投诉数据集。rp831533ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method多模态气候数据基准与配套生成方法。rp836734When Facts Expire: Benchmarking Temporal Validity in Knowledge Graphs评测知识图谱中事实时间有效性的基准。rp850735Internet of Things Dataset for Human Operator Activity Recognition in Industrial Environment工业环境下人类操作员行为识别的物联网数据集。rp878136A Large-Scale Web Search Dataset for Federated Online Learning to Rank用于联邦在线排序学习的大规模网页搜索数据集。rp902837Building Safer Sites: A Large-Scale Multi-Level Dataset for Construction Safety Benchmark大规模多层级建筑工地安全评测数据集。rp921138PEQQS: a Dataset for Probing Extractive Quantity-focused Question Answering from Scientific Literature面向科学文献中数量抽取式问答能力探测的数据集。rp951639A Large-Scale Dataset for Content-Based Short-Video Recommendation面向内容驱动短视频推荐的大规模数据集。rp9856