unSafe.sh - 不安全
我的收藏
今日热榜
公众号文章
导航
Github CVE
Github Tools
编码/解码
文件传输
Twitter Bot
Telegram Bot
Search
Rss
黑夜模式
LLM as My Pair Researcher: Prover–Validator Collaboration and the Road to Logit-Gap Steering
嗯,用户希望我用中文帮他总结一篇文章的内容,控制在100字以内,而且不需要以“文章内容总结”或“这篇文章”这样的开头,直接写描述即可。首先,我需要仔细阅读并理解这篇文章。 文章主要讲述了作者如何与AI合作,将直觉转化为具体的研究方向——logit gap steering。他们通过分析解码过程中的token分布,发现拒绝行为并非完全不存在顺从的可能,而是概率上的优势。接着,他们提出了prover-validator模式,其中AI生成候选假设,人类验证并优化这些假设。这种方法帮助他们探索了如何通过小的扰动来改变模型的行为。 接下来,我需要将这些关键点浓缩到100字以内。要确保涵盖主要研究方向、方法和发现。同时,语言要简洁明了,避免使用复杂的术语。 最后,检查字数是否符合要求,并确保内容准确传达文章的核心信息。 </think> 文章探讨了如何通过AI与人类合作开发新的研究方向——logit gap steering。作者与AI协作,将直觉转化为具体研究,并提出prover-validator模式:AI生成候选假设,人类验证并优化。这种方法帮助探索如何通过小的扰动改变模型行为。...
2026-1-21 00:0:0 | 阅读: 0 |
收藏
|
Toooold - toooold.com
prover
validator
refusal
gap
tony
The Physics of mHC: Why Deep Learning Needs Energy Conservation
文章探讨了mHC方法如何从物理角度出发,通过构建被动系统稳定深度神经网络。作者指出传统神经网络作为“主动放大器”会导致信号能量爆炸,并提出通过双重随机矩阵和Sinkhorn-Knopp算法实现能量守恒与梯度稳定。这种方法确保了信号质量和能量不会被任意放大或缩小。...
2026-1-5 00:0:0 | 阅读: 0 |
收藏
|
Toooold - toooold.com
ij
mass
sums
quad
The Mathematics of Baby Shower Games: Solomonoff Inference in Action
Last weekend, I found myself applying data science in an unexpected setting: a...
2024-12-13 00:0:0 | 阅读: 1 |
收藏
|
Toooold - toooold.com
guesses
gap
optimal
inches
territory
The missing knowledge snippets of AI
It is a live blog post of some knowledge snippets of AI to bridge the gap amon...
2024-12-9 08:0:0 | 阅读: 1 |
收藏
|
Toooold - toooold.com
llm
memory
reasoning
lnkd
lora
Building a Lightweight Financial Agent: A Flexible Approach to Tool Use and Orchestration
In the rapidly evolving field of AI agents, there’s a growing trend towards co...
2024-9-15 08:0:0 | 阅读: 3 |
收藏
|
Toooold - toooold.com
income
financials
analysis
horizon
company2
Agentic Systems: Snake Oil or the Future of AI?
The short answer: it’s more nuanced—agentic systems have potential, but only i...
2024-8-21 08:0:0 | 阅读: 2 |
收藏
|
Toooold - toooold.com
agentic
llms
limitations
llm
reasoning
Those Magnificent underdogs competing ChatGPT
Disclaimer: the open source community and the AI community evolve so fast. Thi...
2023-4-8 08:0:0 | 阅读: 7 |
收藏
|
Toooold - toooold.com
alpaca
llama
tuning
gpt
llm
Understand Twitter’s Recommendation system with a diagram using GPT-4
Disclaimer: the following blog post is mostly generated by GPT-4. The image is...
2023-3-31 08:0:0 | 阅读: 12 |
收藏
|
Toooold - toooold.com
tweets
stroke
classdef
network
Collaborating with an AI Assistant to Discover a JPEG2000 Decoder Vulnerability
How I teamed up with ChatGPT, an AI language model, to identify and analyze a...
2023-3-20 08:0:0 | 阅读: 9 |
收藏
|
Toooold - toooold.com
chatgpt
cblk
jpeg2000
log2
tile
Make a CatGPT out of ChatGPT
Since ChatGPT is good at hallucination, why not make a CatGPT out of it? Let’s...
2023-2-2 08:0:0 | 阅读: 8 |
收藏
|
Toooold - toooold.com
laser
feline
manifold
accuracy
chatgpt
Guess the size of an atomic bomb and an iOS supply chain attack
Disclaimer: Nothing in this blog is related to the author’s current day-to-day...
2022-8-24 08:0:0 | 阅读: 5 |
收藏
|
Toooold - toooold.com
xcodeghost
fermi
bias
sampling
estimate
Zero hacking problem: do we really protect the customers?
Disclaimer: Nothing in this blog is related to the author’s day-to-day work. T...
2022-8-17 08:0:0 | 阅读: 3 |
收藏
|
Toooold - toooold.com
jason
probability
security
coin
fair
Linkedin spam: a case study of robust feature engineering
Disclaimer: Nothing in this blog is related to the author’s day-to-day work. T...
2022-8-3 08:0:0 | 阅读: 6 |
收藏
|
Toooold - toooold.com
attackers
adversarial
drift
Measure the unmeasurable: botnet and German tanks
Disclaimer: Nothing in this blog is related to the author’s day-to-day work. T...
2022-7-25 08:0:0 | 阅读: 5 |
收藏
|
Toooold - toooold.com
jason
frac
tanks
network
sampling
Nine cars, twenty-five horses and beyond
TL, DR“25 horses problem” and “9 cars problem” can have a general solution a...
2022-6-11 08:0:0 | 阅读: 5 |
收藏
|
Toooold - toooold.com
horses
cars
races
tensor
dimension
Cybersecurity problem solving with explainable machine learning
Disclaimer: Nothing in this blog is related to the author’s day-to-day work. T...
2022-2-15 08:0:0 | 阅读: 6 |
收藏
|
Toooold - toooold.com
shap
explainable
machine
median
explanatory
求解网络安全问题的可解释机器学习
本文的所有内容与作者的日常工作无关,其观点也仅代表作者个人意见,与作者的雇主无关。小朋友,你是否有很多问号,为什么过年回来发现模型预测性能下降?为什么总...
2022-2-13 08:0:0 | 阅读: 5 |
收藏
|
Toooold - toooold.com
模型
shap
数据
判别
网络
Test $\LaTeX$ rendering
Sample equation blocks\[\frac{1}{\sqrt{||A||}}\sum_{a'\in A}\frac{1}{\sqrt{q}}\sum_{c=0}^{q-1...
2022-1-17 08:0:0 | 阅读: 3 |
收藏
|
Toooold - toooold.com
frac
sqrt
rangle
ia
equation
Useful useless models: an approach to model complex cybersecurity problem solving
Disclaimer: Nothing in this blog is related to the author’s day-to-day work. T...
2022-1-9 08:0:0 | 阅读: 3 |
收藏
|
Toooold - toooold.com
solving
useless
correlation
induction
有用的无用模型:网络安全中复杂问题的建模方法
本文的所有内容与作者的日常工作无关,其观点也仅代表作者个人意见,与作者的雇主无关。为解决网络安全问题构建数据模型的时候,我们很难找到只用一个端到端的模型...
2022-1-4 08:0:0 | 阅读: 4 |
收藏
|
Toooold - toooold.com
模型
网络
数据
安全
无用
Previous
0
1
2
3
4
5
6
7
Next