Social-media Search Comparo
2023-10-3 03:0:0 Author: www.tbray.org(查看原文) 阅读量:9 收藏

Bluesky, Threads, and Mastodon (as of last week) all have a built-in search capability. So now’s a good time for a first-look comparison.

My social-media life is at @[email protected], a member-owned co-op Mastodon instance, but I regularly visit Bluesky and Threads, hoping to have an educated opinion on what’s up. I continue to believe that the evolution of humanity’s tools of conversation is terribly important. This week, our member-owned co-op instance installed Mastodon 4.2 and turned on search.

Methodology LOL · This mini-study uses a single query: "dark matter", two words enclosed in quotes, which many search engines will interpret as a search for the phrase, as distinct from the two words.

Hardly rigorous I know, and unquantitative, no numbers were abused in the construction of this essay. But I still think there are useful observations to make.

Let’s take them in chronological order of having shipped a search capability: Bluesky, Threads, Mastodon.

Bluesky search · It’s acceptably fast, well under a second.

Bluesky search for “dark matter”

The first observation is that it ignores the quotes, is apparently not requiring both words be present; note the second result.

Off to the side is a list of matching accounts, which are sorted with a bit more attention to match quality. Clicking on the “Users” tab expands the list.

The result-list ranking algorithm, near as I can tell, is reverse-chronological. Easy to understand, at least.

There aren’t that many results, the list only goes back two days.

I’m being a little unfair here, because the AT Protocol behind Bluesky calls for two kinds of services, decoupling the functions of storing your data from doing cross-federation firehose features. The latter is called the Big Graph Server (BGS) and is said to handle, among other things, search. BGS development is clearly embryonic at the moment.

The design, like much about Bluesky, seems attractive in principle; you could have multiple BGS’s competing on the basis of search quality and result ranking. We’ll have to wait to see how well it works out in practice.

I will note there was an entertaining post from “Queen Zii, Hypnotic Vampiress”, featuring a very dark albeit scanty leather outfit.

There don’t seem to be any advanced-search options.

Threads search · Performance is neither better nor worse than Bluesky’s.

Threads search for “dark matter”

It respects the fact that I’m searching for a phrase not just words. It has a result-ranking algorithm, which shouldn’t be surprising, various Meta properties have been doing search for a long time. I can’t comment on the quality of the results in terms of precision and recall. It’s nice to see results from phys.org at the top of the list; back in the day on Twitter-that-was, you could find actual astrophysicists discussing their work. I only saw a couple posts like that; Bluesky likewise, in a much smaller result list.

There are a lot of results, presented in a nice snappy infinite flow. So if you needed to go deep into a list to find something you were looking for, it wouldn’t be that unpleasant.

Like Bluesky, there are no advanced-search options. Unlike it, there’s no obvious way to search for accounts rather than posts.

The first experience feels about OK, exactly what I’d expect from conventional modern search technology applied to a large-ish data set in the control of a single service provider. Which raises a big question: How will it work once Threads goes out and federates with the thousands of Fediverse instances? It’ll be fun to watch.

Mastodon search · First, a few words of background. Mastodon lives in the Fediverse, the universe of instances that communicate using the ActivityPub protocol. There are instances that don’t run the Mastodon software, and Threads will soon, they say, be one of them.

Mastodon’s search capability is based on Elasticsearch.

Even if your server is running Mastodon, you can be running one of many (mobile or Web) clients. The picture below is one called Phanpy, my own daily driver. So your Mastodon search results might look a lot different. But I checked out a couple and the payload seems about the same.

Mastodon search for “dark matter”

Now, about that payload. It’s richer enough in structure that I only had enough room to show the first post. But you can get all-posts, all-accounts, or all-hashtags.

The result ranking, as with Threads, feels unsurprising: modern and conventional. It knew I wanted a phrase, and the top results were pretty meaty. It featured a lot more hard scientists than the alternatives, but I suspect that’s a function of there being more of them on Mastodon.

There are a whole lot of advanced-search tools; check out the details in the Mastodon 4.2 announcement. I don’t weigh that very heavily; back in the day, Google proved that a great search engine can figure out what you want from a natural-language query, to the point that advanced search was mostly unnecessary. [Ah, the days when Google was laser-focused on search quality.]

Now we get to the important differences between Mastodon and the rest. First, it was noticeably faster. Second, it had fewer results, only two or three screens-full.

In today’s Fediverse, when you search, you’re just searching a local Elasticsearch index on your own instance. It should be fast, and it is, and that’s good.

Then there’s the fewer-results thing. Your instance can only search the results it knows about. Which means, everything that anyone on your instance has posted, and posts from anyone anywhere who’s followed by anyone on your instance, or which mention anyone there.

This means that on a bigger instance, you’re going to see more results. How many more? And how much will they matter? We don’t have enough experience yet. The super-popular high-impact stuff that’s gone viral will show up on any instance. But maybe exploring the Long Tail from a small instance could be a real problem. There’s a technology called Mastodon Relays that might help out. We’ll see.

And anyhow, that isn’t even the most important difference between Mastodon and the rest…

Opt-in · Your Mastodon posts don’t get indexed for other people to search unless you explicitly go into your profile settings and enable “Include public posts in search results”; by default, disabled. Which may seem crazy at first glance. But it’s actually a major positive in my opinion, because there are serious ethical issues here, ones that not enough people have thought of. I wrote a big long blog piece on the subject, Private and Public Mastodon, late last year, if you care about these issues.

Anyhow, since search only started arriving in Mastodonia last week, there are a lot of instances who haven’t upgraded yet. And really a lot of people who haven’t got around to doing that opt-in. So I’d expect my search result lists to get quite a bit longer (Better? Nobody knows.) organically as those things happen.

Interesting times we live in.



文章来源: https://www.tbray.org/ongoing/When/202x/2023/10/02/Social-Search
如有侵权请联系:admin#unsafe.sh