我实验室2篇论文被WWW 2025接收!
2025-1-22 15:41:0 Author: mp.weixin.qq.com(查看原文) 阅读量:0 收藏

热烈祝贺我实验室2篇论文被交叉综合类顶会WWW 2025接收!

一项研究提出了一个高效的频域后门攻击方法,另一项评估了越狱防护下大语言模型安全过激导致的能力退化的风险,为模型安全问题提供了重要的新视角,也为未来改进模型安全性与性能的平衡提供了实践依据。

Revisiting Backdoor Attacks on Time Series Classification in the Frequency Domain

作者:黄元敏 张谧 汪兆祥 李文轩 杨珉

摘要:

Time series classification (TSC) is a cornerstone of modern web applications, powering tasks such as financial data analysis, network traffic monitoring, and user behavior analysis. In recent years, deep neural networks (DNNs) have greatly enhanced the performance of TSC models in these critical domains. However, DNNs are vulnerable to backdoor attacks, where attackers can covertly implant triggers into models to induce malicious outcomes. Existing backdoor attacks targeting DNN-based TSC models remain elementary. In particular, early methods borrow trigger designs from computer vision, which are ineffective for time series data. More recent approaches utilize generative models for trigger generation, but at the cost of significant computational complexity.

 In this work, we analyze the limitations of existing attacks and introduce an enhanced method, FreqBack. Drawing inspiration from the fact that DNN models inherently capture frequency domain features in time series data, we identify that improper perturbations in the frequency domain are the root cause of ineffective attacks. To address this, we propose to generate triggers both effectively and efficiently, guided by frequency analysis. FreqBack exhibits substantial performance across five models and eight datasets, achieving an impressive attack success rate of over 90%, while maintaining less than a 3% drop in model accuracy on clean data.

You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense

作者:买巫予骜 洪赓 陈沛 潘旭东 刘保君 张源 段海新 杨珉

摘要:

With the rise of generative large language models (LLMs) like LLaMA and ChatGPT, these models have significantly transformed daily life and work by providing advanced insights. However, as jailbreak attacks continue to circumvent built-in safety mechanisms, exploiting carefully crafted scenarios or tokens, the safety risks of LLMs have come into focus. While numerous defense strategies—such as prompt detection, modification, and model fine-tuning—have been proposed to counter these attacks, a critical question arises: do these defenses compromise the utility and usability of LLMs for legitimate users? Existing research predominantly focuses on the effectiveness of defense strategies without thoroughly examining their impact on performance, leaving a gap in understanding the trade-offs between LLM safety and performance.

Our research addresses this gap by conducting a comprehensive study on the utility degradation, safety elevation, and exaggerated-safety escalation of LLMs with jailbreak defense strategies. We propose USEBench, a novel benchmark designed to evaluate these aspects, along with USEIndex, a comprehensive metric for assessing overall model performance. Through experiments on seven state-of-the-art LLMs, we found that mainstream jailbreak defenses fail to ensure both safety and performance simultaneously. Although model-finetuning performs the best overall, their effectiveness varies across LLMs. Furthermore, vertical comparisons reveal that developers commonly prioritize performance over safety when iterating or fine-tuning their LLMs.

供稿:黄元敏、买巫予骜

版:沈钰霖

责编:邬梦莹

审核:张琬琪、洪赓、林楚乔

复旦白泽战队

一个有情怀的安全团队

还没有关注复旦白泽战队?

公众号、知乎、微博搜索:复旦白泽战队也能找到我们哦~


文章来源: https://mp.weixin.qq.com/s?__biz=MzU4NzUxOTI0OQ==&mid=2247492834&idx=1&sn=36396f6cc39b8c9b87eead00faf8c111&chksm=fde8609cca9fe98af3fadccc3c11435924e1782ae437379a9e171bf4e73c32a0ab01e9dfec11&scene=58&subscene=0#rd
如有侵权请联系:admin#unsafe.sh