Two years ago, I glanced at Matrix’s Olm library and immediately found several side-channel vulnerabilities. After dragging their feet for 90 days, they ended up not bothering to fix any of it.
The Matrix.org security team also failed to notify many of the alternative clients about the impending disclosure–a fact that became more annoying when they complained to me about how Matrix.org handled the embargo.
Instead, their response to my disclosure was to slap a deprecated notice on their README (despite allegedly having deprecated it in 2022) and then publicly insist they knew about the side-channel attacks all along, per Matthew Hodgson on Hacker News.
Knowingly shipping vulnerable cryptography to millions of people that happily insist their product is better than Signal is a level of irresponsible that borders on grifting.
So, at that point, my public stance on Matrix became, simply:
Don’t use Matrix.
And I was perfectly content leaving things at that, until my recent blog post about why there are no good Discord alternatives and what we can do about that summoned yet more annoying Matrix evangelists.
So I decided to finally take a look at Vodozemac, the Rust library that the Matrix team is so proud of.
Since you’re reading this, you already know how that turned out.
I’m putting the disclosure timeline front-and-center so everyone has a chance to see it.
[email protected]“Thank you for your report. We’re looking into it and will get back to you shortly.”
An astute observer will notice that this timeline does not cover 90 days. In fact, it only covers a week. There are a few reasons for that.
Full public disclosure of security vulnerabilities is a damn good idea, as Bruce Schneier argued in 2007.
Coordinated disclosure is the practice of researchers and vendors working together to practice full disclosure in a timeline that’s minimally disruptive to the vendor’s customers, in a way that maximizes security and safety.
Some fools call that “responsible” disclosure. They are wrong. (Matrix, to their credit, does not use the incorrect term on their security disclosure policy page.)
Google Project Zero established the norm of publishing within 90 days of a vulnerability’s discovery, whether it’s fixed or not. This, however, is a courtesy that independent researchers do not automatically owe the vendor.
I’ve covered a lot of this before in a previous blog post about how many people (including open source software developers) de-value security researchers.
The last time I reported an issue to Matrix, they insisted on the full 90 days and then did jack shit with that time. They didn’t even notify the developers of other Matrix clients that it was coming.
So, in my book, Matrix permanently lost the privilege of having a 90 day courtesy window. I gave them a week.
I can already feel the message board users typing a storm about this. So to cut the debate off at the pass, I say: You’re free to disagree with this decision when you disclose your own research findings.
Your bug, your policy.
My bug, my policy.
The Matrix team is already getting free specialized labor out of this deal. They don’t get to also dictate when, where, and how I criticize their software.
You’re not gonna find a fairer fucking deal than that.
These issues were identified in the source code tree at the commit hash, a4807ce7f8e69e0a512bf6c6904b0d589d06b993 (which was also the head of the main branch at the time).
Not all of these issues are vulnerabilities, per se, but I will front-load this section with the actual vulnerabilities and then cover the remaining issues.
Severity: High
To understand this one, I first need to explain how [Elliptic Curve] Diffie-Hellman works, and the laziest way to do cryptanalysis.
At the risk of oversimplification, you can think of Diffie-Hellman as “just multiplication with exponents”.
(Don’t get intimidated by the math notation; you won’t need advanced math skills for this one.)
I’m omitting a lot of details in this summary (i.e., what arithmetic means in the context of elliptic curve groups), but it’s serviceable for our purposes.
Since we can think of this as just multiplication, I must ask the obvious question that every security engineer asks themselves when they see asymmetric operations in any protocol (whether it’s key agreement, a distributed key generation protocol, or even zero-knowledge proofs):
What happens if you set one of the inputs to zero?
If you can imagine the consequences of this, congratulations, you understand the vodozemac vulnerability.
Affected code (source):
impl RemoteShared3DHSecret {
pub(crate) fn new(
identity_key: &StaticSecret,
one_time_key: &StaticSecret,
remote_identity_key: &PublicKey,
remote_one_time_key: &PublicKey,
) -> Self {
let first_secret = one_time_key.diffie_hellman(remote_identity_key);
let second_secret = identity_key.diffie_hellman(remote_one_time_key);
let third_secret = one_time_key.diffie_hellman(remote_one_time_key);
Self(merge_secrets(first_secret, second_secret, third_secret))
}
pub fn expand(self) -> (Box<[u8; 32]>, Box<[u8; 32]>) {
expand(&self.0)
}
}
impl Shared3DHSecret {
pub(crate) fn new(
identity_key: &StaticSecret,
one_time_key: &ReusableSecret,
remote_identity_key: &PublicKey,
remote_one_time_key: &PublicKey,
) -> Self {
let first_secret = identity_key.diffie_hellman(remote_one_time_key);
let second_secret = one_time_key.diffie_hellman(&remote_identity_key.inner);
let third_secret = one_time_key.diffie_hellman(&remote_one_time_key.inner);
Self(merge_secrets(first_secret, second_secret, third_secret))
}
pub fn expand(self) -> (Box<[u8; 32]>, Box<[u8; 32]>) {
expand(&self.0)
}
}
An all-zero public key is the encoding of what cryptographers call “the Identity Element” (if you add the identity element to another point, the result is that other point).
The result of multiplying a value by the Identity Element is… *drumroll* the Identity Element. This is the same as multiplying by zero and getting zero. Thus, all of the inputs to merge_secrets() are predictable by the attacker.
A pseudocode example of what’s happening here:
alice_secret := 0x779f5903f59fd3625d0960e84180493a3594404eeecb524c8b03e763bdebc369 alice_pk := 0xaa4e6fe494d292adb67dab99b40d4dc75f990417eb57570c33baf78646152f5f bob_pk := 0x0000000000000000000000000000000000000000000000000000000000000000 shared_secret := scalarmult(alice_secret, bob_pk) print(to_hex(shared_secret)) // "0000000000000000000000000000000000000000000000000000000000000000"
The issue is that the output of diffie_hellman can be zero if the public key or scalar is zero.
The authors of the X25519 library they’re using added a method called .was_contributory() to reject all-zero shared secrets. Notably, other parts of Vodozemac actually do things right (sas.rs and ecies/mod.rs), but the load-bearing code path for end-to-end encryption does not.
Not checking was_contributory() is a stupid vulnerability to have.
In my initial disclosure, I provided the Matrix team with a patch that adds a test case to src/olm/shared_secret.rs, which demonstrates 3 different ways to trigger this condition (including one accessible to remote attackers).
The Matrix team’s response to my report cited this 2017 mailing list post by Trevor Perrin discussing whether Diffie-Hellman primitives should abort on invalid outputs (i.e., the identity element), and then tries to apply the same reasoning to the wrong abstraction layer (protocols built atop Diffie-Hellman).
Indeed, the raw X25519 primitive does not need to reject the identity element or error out if the scalar multiplication results in an all-zero output. Part of the contract of X25519 and X448 is that all inputs and outputs are “valid” (i.e., there is no risk of an invalid curve attack leaking your secrets). This is one of the major motivations for these curves in RFC 7748.
However, primitives built using X25519 must reject them. See RFC 9180, for example.
For P-256, P-384, and P-521, senders and recipients MUST perform partial public key validation on all public key inputs, as defined in Section 5.6.2.3.4 of [keyagreement]. This includes checking that the coordinates are in the correct range, that the point is on the curve, and that the point is not the point at infinity. Additionally, senders and recipients MUST ensure the Diffie-Hellman shared secret is not the point at infinity.
For X25519 and X448, public keys and Diffie-Hellman outputs MUST be validated as described in [RFC7748]. In particular, recipients MUST check whether the Diffie-Hellman shared secret is the all-zero value and abort if so.
This is because the initial asynchronous ratchet handshake and KEMs require contributory material.
You bet. I sent this [secret, at the time] Gist to the Matrix.org team in my disclosure email.
Not an official one, but here’s how trivial the fix is to implement:
Subject: [PATCH] feat!: fix all-zero public key
---
Index: src/olm/shared_secret.rs
===================================================================
diff --git a/src/olm/shared_secret.rs b/src/olm/shared_secret.rs
--- a/src/olm/shared_secret.rs (revision a4807ce7f8e69e0a512bf6c6904b0d589d06b993)
+++ b/src/olm/shared_secret.rs (date 1770899879617)
@@ -36,7 +36,10 @@
use x25519_dalek::{ReusableSecret, SharedSecret};
use zeroize::{Zeroize, ZeroizeOnDrop};
-use crate::{Curve25519PublicKey as PublicKey, types::Curve25519SecretKey as StaticSecret};
+use crate::{
+ Curve25519PublicKey as PublicKey,
+ types::{Curve25519SecretKey as StaticSecret, KeyError},
+};
#[derive(Zeroize, ZeroizeOnDrop)]
pub struct Shared3DHSecret(Box<[u8; 96]>);
@@ -78,6 +81,26 @@
}
impl RemoteShared3DHSecret {
+ pub(crate) fn try_new(
+ identity_key: &StaticSecret,
+ one_time_key: &StaticSecret,
+ remote_identity_key: &PublicKey,
+ remote_one_time_key: &PublicKey,
+ ) -> Result<Self, KeyError> {
+ let first_secret = one_time_key.diffie_hellman(remote_identity_key);
+ let second_secret = identity_key.diffie_hellman(remote_one_time_key);
+ let third_secret = one_time_key.diffie_hellman(remote_one_time_key);
+
+ if !first_secret.was_contributory()
+ || !second_secret.was_contributory()
+ || !third_secret.was_contributory() {
+ return Err(KeyError::NonContributory)
+ }
+
+ Ok(Self(merge_secrets(first_secret, second_secret, third_secret)))
+ }
+
+ #[deprecated(since = "0.10.0", note = "SECURITY: Does not reject all-zero public keys. Use try_new() instead.")]
pub(crate) fn new(
identity_key: &StaticSecret,
one_time_key: &StaticSecret,
@@ -97,6 +120,26 @@
}
impl Shared3DHSecret {
+ pub(crate) fn try_new(
+ identity_key: &StaticSecret,
+ one_time_key: &ReusableSecret,
+ remote_identity_key: &PublicKey,
+ remote_one_time_key: &PublicKey,
+ ) -> Result<Self, KeyError> {
+ let first_secret = identity_key.diffie_hellman(remote_one_time_key);
+ let second_secret = one_time_key.diffie_hellman(&remote_identity_key.inner);
+ let third_secret = one_time_key.diffie_hellman(&remote_one_time_key.inner);
+
+ if !first_secret.was_contributory()
+ || !second_secret.was_contributory()
+ || !third_secret.was_contributory() {
+ return Err(KeyError::NonContributory)
+ }
+
+ OK(Self(merge_secrets(first_secret, second_secret, third_secret)))
+ }
+
+ #[deprecated(since = "0.10.0", note = "SECURITY: Does not reject all-zero public keys. Use try_new() instead.")]
pub(crate) fn new(
identity_key: &StaticSecret,
one_time_key: &ReusableSecret,
Notice the only meaningful difference (besides the Result type) is the was_contributory() check.
The callers will need to be updated to use the new try_new() API this patch adds instead of the now-deprecated new(). Those callers’ APIs may also need to be updated to return a Result instead of just a value.
A full changeset that propagates the fix across the entire library, as well as various Matrix clients, will certainly be a bit messier.
But nobody can I say I left the Matrix community hanging.
Severity: Low
The Matrix spec currently doesn’t even cover Version 2, but Vodozemac implements something it calls Version 2.
The key difference between them is that V2 uses the full HMAC output for message authentication, while V1 truncates the HMAC to 64 bits. There might be other differences; I didn’t look too deeply at its internals.
Since the intent of this V2 effort is clearly to provide increased security, I am choosing to interpret downgrading a session from V2 to V1 as a security vulnerability.
There are three different flaws with the current vodozemac implementation that fit this criteria:
SessionPickle is deserialized without an explicit config field, serde silently defaults to V1.Consequently, whatever benefit the authors were hoping to get out of this “Version 2” protocol, their Rust library basically shrugs and phones it in with 64-bit MAC security (instead of the 256-bit MAC security a reasonable person would expect).
This observation leads to two possible interpretations (both bad):
And dropping bytes from an authentication tag is an extremely lazy attack to succeed with.
Discerning which of the two above is the more likely interpretation is left an exercise for the reader.
This section is just a list of probable bugs I noticed while skimming through their Rust codebase.
While some of them could have security implications, there wasn’t any obvious exploitation path. With further research, some of these could also turn out to be vulnerabilities, but that seems unlikely.
Since they’re not vulnerabilities, I didn’t include any of them in my initial email to the Matrix.org security team, which means they’re finding out about these bugs the same time everyone else is.
The CheckCode for an ECIES-protected payload is a two-digit decimal number, which provides a 1% success rate per MitM attack attempt.
This rounds down to about 6 bits of security, because you only get one bite at the apple per ECIES packet. Still, a typical laptop can perform thousands of ECIES envelopes per second, so you can on average get lucky several times per second.
pub const fn to_digit(&self) -> u8 {
let first = (self.bytes[0] % 10) * 10;
let second = self.bytes[1] % 10;
first + second
}
This seems to be a security theater feature, rather than anything important, otherwise I would call it a vulnerability.
To be pedantic: This calculation is also subject to a modulo bias, since it’s mapping two bytes (possible values: 0-255) each to an integer modulo 10 (0-9). Since 256 doesn’t divide 10 evenly, the digits 0-5 will occur more frequently than 6-9 (26 out of 256 versus 25 out of 256). This makes each value 0.3% more likely than the other digits (assuming the input bytes are uniformly random).
But there’s no way anything important relies on this, so who cares?
Vodozemac hard-codes a constant, MAX_MESSAGE_BYTES to equal 40. After more than 40 skipped messaged keys are buffered, any additional keys are silently discarded, making corresponding messages permanently undecryptable.
fn push(&mut self, message_key: RemoteMessageKey) {
if self.inner.is_full() {
self.inner.pop_at(0);
}
self.inner.push(message_key)
}
A similar thing happens after MAX_MESSAGE_GAP messages (hard-coded to 2000).
These values appear to be arbitrarily chosen and haven’t been revised in years. There is no way for a Matrix client to configure these values.
If you call new_pickle() with the same key twice, then use the result to encrypt two different messages, your application will have encrypted two different messages with the same IV. This is because they use a static string “Pickle” as the HKDF info parameter (without appending a random “salt”).
/// Creates a new [`Cipher`] from the given raw key. The key is expected to
/// be 32 bytes in length, but we expect an unsized slice for
/// compatibility with the libolm API.
///
/// The key is deterministically expanded into a 32-byte AES key, a 32-byte
/// MAC key, and a 16-byte initialization vector (IV) using HKDF, with
/// the byte string "Pickle" used as the info during key derivation.
///
/// This key derivation format is typically used for libolm-compatible
/// encrypted pickle formats.
pub fn new_pickle(key: &[u8]) -> Self {
let keys = CipherKeys::new_pickle(key);
Self { keys }
}
Because it’s CBC mode, not GCM, this doesn’t immediately go thermonuclear on your client’s security. But it still loses semantic security.
This is more of a sharp edge in their library’s API than anything else (and is probably a result of legacy design flaws they must be backwards compatible with in perpetuity), but it shows a consistent lack of care.
#[cfg(fuzzing)] Bypasses MAC and Signature VerificationIf you ever accidentally compile vodozemac with the fuzzing Cargo feature flag enabled, you’ve just disabled all security in your client.
#[cfg(fuzzing)]
pub fn verify_mac(&self, _: &[u8], _: &Mac) -> Result<(), MacError> {
Ok(())
}
The Matrix team could have added a compile-time deprecation warning or used a Cargo feature flag in Cargo.toml instead of what they did here, but alas.
Hey, remember how I reported this exact issue in 2024 to libolm?
The good news is Vodozemac supports strict Ed25519 verification that prevents malleability.
The bad news is they disabled it by default and require a Cargo feature flag to turn it on.
#[cfg(not(fuzzing))]
pub fn verify(
&self,
message: &[u8],
signature: &Ed25519Signature,
) -> Result<(), SignatureError> {
if cfg!(feature = "strict-signatures") {
Ok(self.0.verify_strict(message, &signature.0)?)
} else {
Ok(self.0.verify(message, &signature.0)?)
}
}
The first vulnerability leads to a complete loss of confidentiality for the Olm protocol, since an all-zero shared secret is extremely easy for an attacker to guess. But what’s the blast radius as far as the ecosystem goes?
In my last blog post disclosing Matrix vulnerabilities, I looked at several Matrix clients to ascertain which actually used vodozemac instead of the “deprecated” libolm.
To estimate the adoption velocity of vodozemac over libolm, let’s re-examine the same sample of Matrix client software today that I looked at in 2024:
| Matrix Client | 2024 Backend | 2026 Backend |
|---|---|---|
| tulir/gomuks | libolm (1, 2) | Other (1, 2) |
| niochat/nio | libolm (1, 2) | Unchanged |
| ulyssa/iamb | vodozemac (1, 2) | Unchanged |
| mirukana/mirage | libolm (1) | Unchanged |
| Pony-House/Client | libolm (1) | Unchanged |
| MTRNord/cetirizine | vodozemac (1) | Unchanged |
| nadams/go-matrixcli | none | Unchanged |
| mustang-im/mustang | libolm (1) | vodozemac (1, 2, 3) |
| marekvospel/libretrix | libolm (1) | Unchanged (Archived) |
| yusdacra/icy_matrix | none | Unchanged |
| ierho/element | libolm (through the python SDK) | Unchanged |
| mtorials/cordless | none | Unchanged |
| hwipl/nuqql-matrixd | libolm (through the python SDK) | Unchanged |
| maxkratz/element-web | vodozemac (1, 2, 3, 4) | Unchanged |
| asozialesnetzwerk/riot | libolm (wasm file) | Unchanged |
| NotAlexNoyle/Versi | libolm (1, 2) | Unchanged |
Only one of the projects I found in my sampling of the matrix-client topic on GitHub in 2024 appears to use vodozemac today. (The other uses a pure Go implementation.)
Is this a representative sample of the Matrix clients that people are actually using today? No, it’s a heuristic.
So with all that in mind, for fun, I also looked at the deprecated Olm library to see if the same issue was present there.
The highlighted code snippet in the previous link calls out to the curve25519_donna function from curve25519-donna, which neither rejects an all-zero public key nor ensures the result of the calculation was contributory.
So both libolm and vodozemac are affected by the same weakness.
I’ve looked at Matrix’s cryptography twice, and each time, I found evidence that their engineers lack cryptographic expertise.
In 2024, I looked at their old libolm library for less than a minute and found several issues.
In 2026, I looked at their new Rust library for an evening after work, and found the issues above.
In 2024, they insisted they already knew about the issues I reported and that’s why they have the new Rust library, which they believed to be Hot Shit.
In 2026, I conclude that their shit is lukewarm.
I’m not one to extrapolate from a small number of data points, but this has to be embarrassing for Matrix, getting dogged on by a gay furry more than once.
I can only imagine how the European Union feels about this revelation.
I dunno. For me, there isn’t going to be a next time.
Neither my vulnerability disclosure from 2024 nor this one will end up in their “Security Hall of Fame.” That tells me Matrix’s leadership is not humble enough to learn from these experiences. And that’s a damn shame for everyone else in the free and open source software communities that contributed their time, work, and even money to try to make Matrix better.
Matrix had one audit in 2022, from Least Authority. Amusingly, their audit report highlighted and suggested fixes for some of the issues I point out here today.
(Full disclosure: I didn’t even bother to consult the audit report until I was almost finished drafting this blog post.)
As I wrote in Software Assurance & That Warm and Fuzzy Feeling:
Cryptography audits and penetration testing in general are valuable work. Audits are essential in avoiding worse outcomes for security in multiple domains (network, software, hardware, physical, etc.).
However, they’re not a panacea, and an audit status should not be viewed as a simple binary “YES / NO” matter.
Some “audit reports” are utterly worthless.
Unfortunately, you basically have to be qualified to do the same kind of work to accurately distinguish between the trash and treasure.
However, this is usually one of the first questions I get from some technologists: “Was it audited? And is the report public?”
Another important question I omitted from that blog post was, “Did the client even listen to the feedback they were given by the auditors?” In this case, the results speak for themselves.
I don’t think Matrix’s cryptography is nearly as stupid as Twitter’s “X Chat” feature, which famously made “actually verify signatures” a TODO:
But Matrix is definitely a runner-up, in my opinion. Avoid both.
Also, apparently Germany’s BSI funded another audit in 2023. The element.io blog post at the time also alluded to a still-unpublished vulnerability, CVE-2023-49953.
Two fun possibilities come to mind:
- It’s one of the things I already found and disclosed.
- There be yet even more dragons in their source code.
As far as the BSI-funded audit goes: It’s nothing impressive. You’d get basically the same level of detail with virtually any other security-focused static analysis tooling today.
BSI recently also decided that Hybrid Post-Quantum Cryptography should be done with the P-curves, rather than X25519 (i.e., X-Wing), and still insist on standardizing the Brainpool curves in the meantime (section 3.6.2). Therefore, in my opinion, the BSI’s competence is questionable at best.
The silver lining to the worst issue I’ve disclosed here is we finally have a solution for “Unable to decrypt message”: Just set your public key to zero.
Header art also by CMYKat.