home/content/posts/bsky.md
2024-11-18 00:44:42 +01:00

7.7 KiB
Raw Blame History

date title weight bookToC
2024-11-15 bsky.social & data non-privacy 10 true
## tl;dr
atproto, the protocol behind platforms like Bluesky, enables full data transparency, where every users actions—posts, follows, likes—are publicly accessible across the entire network. This openness makes it easy for anyone, including malicious actors like governments or corporations, to scrape and misuse personal data for profiling, stalking, harassment, or surveillance. Unlike other platforms, atproto does not offer privacy features like private messaging or data control, and users have no way to opt out of this transparency. The lack of user consent and control over personal data raises significant privacy concerns, yet Bluesky and atproto have largely avoided scrutiny by marketing themselves as an open-source, non-corporate alternative to Twitter."
tl;dr generated using ChatGPT 4o mini

Alright, that was a cool two days of self-hosting an atproto server and having a bsky account. Back to mastodon. On the way back, i want to open a discussion

how atproto works

atproto is the protocol on top of which apps like bsky and whtwnd are built upon. it offers a uniquely interesting idea of being the underlying network on top of which apps are built, like instances, each capable of showing the unique combination of data that they are built for.

This works because each user has a single unique identifier across the network, which resides alongside all their other data inside the Personal Data Server (PDS). Since all of a user's data live inside of the PDS, making an account in one of the platforms that reside on top of atproto means that you effectively have an account on the entirety of the atproto network, so bluesky, whitewind, an instagram clone. One account, every application1.

atproto is built to be federated, so that every account from every PDS is able to interact with every other account from every other PDS, some of which (like the one I made for two days) are self-hosted.

Other federated projects, like ActivityPub, ensure this by making the API endpoints of different services identical, so for example a video upload on PeerTube or an image post on PixelFed are considered to be identical to a post on Mastodon, and can be viwed and interacted by people on Mastodon. You still need separate accounts for each platform in order to post, but you can interact with posts AND users from everywhere.

the problem with privacy

atproto achieves this interoperability by making every data point of every user completely public. This means that every post, any interaction, every follow, are completely public, not only through the appliaction (so for example how you can view who your friends follow on instagram), but on the server side too.

to quote this blog post from Feb 2024

atproto is for connecting to others, so its focused on social applications. It also is currently 100% public, there are no private messages or similar. The reasons for this is that achieving private things in a federated system is very tricky, and they would rather get it right than ship something with serious caveats. Best for now to only use this stuff for things you want to be public.

Let's show what I've been saying let's use the atproto developer's profile: @atproto.com. Obviously, the information here is open and accessible; this is a good thing, people should be able to see what you post publicly, in the 'town square' that bsky wants to promote. Another way to see the data is to use an atproto explorer like the one made by Tom from frontapge.

This, as of the writing of this post, has the following

PDS Collections
---
app.bsky.actor.profile
app.bsky.feed.generator
app.bsky.feed.like
app.bsky.feed.post
app.bsky.feed.repost
app.bsky.graph.follow
app.bsky.graph.list
app.bsky.graph.listitem
app.bsky.graph.starterpack
chat.bsky.actor.declaration

Now, by their names we understand what each is, but the reader can spend as much time to familiarize themselves as they want. Suffice to say that this makes scraping a lot easier, since everything is stored in unencrypted plaintext JSON.

This openess would make it 'vulnerable to data scraping', if it wasn't advertised as a feature, and as a core principle of bsky and atproto in general.

malicius actors

Let's consider the view of a malicious actor, say a government agency, or a malicious individual or organization. What does atproto offer for them?

  • Full network scraping, since anybody has access to the entire atproto network, collecting data from all PDS servers. This information includes

    • profiling, identifying information,
    • posts,
    • replies,
    • likes,
    • location,
    • interests,
    • social behaviour,
    • daily patterns
    • much more that people with more time can figure out.
  • Individual tracking and profiling. Since every user's entire, unique history of interactions are stored in an easy to find/easy to read spot, tracing the digital fingerprint of a specific user across all platforms of the network is trivial.

misuse of data

With the predescribed access offered to anybody by the atproto network, I want to suggest how the data can be used for wrongdoing. Most people probably have some ideas of how it can be misused, but here are some examples, to really drive home the dangers that this openess enables

  • Stalking and Harassment

    Your abusive ex-anybody has complete and full access to your entire history across all networks. Blocking them? Not while atproto offers full access to your data.

  • Specific people of interest.

    If you are a refugee, or an activist, or a whistleblower, fleeing an opressive regime, or a genocide, or fighting for your freedom, your rights, or those of others, atproto offers a single point of access to all of your activities, to any one interested. If you are a public figure, then again, everything you do in the entire atproto network is public.

  • Corporate data collection

    It is probably a known fact that corporations try and collect as much data from their users as possible, either to create dynamic prices in supermarkets, or to create dynamic insurance prices based on lifestyle, or to feed even more data to the insatiable ai overlords

  • Employer Surveilance

    Employers 'might' (almost surely) scrape data to evaluate current or potential employees, raising ethical and privacy concerns about the workplace.

the core issue

All of the above point to the same problem that all data collection does, which is the lack of user consent and user control of the data.

Because of how the atproto network is structured, it inherently exposes and broadcasts data to the entirety of the internet, in a way that users are not able to opt out of.

While other platforms face exteme (and extremely well deserved) scrutiny for data handling and data privacy, bsky uniquelly seems to have sidestepped the conversation, and to instead focus on marketing itself as an open source alternative to Twitter/X, hoping to lure users since there is an association between open source and non-profit, non-corporate, private, of which bsky and atproto are none of.


  1. the physicist in me likes to think about this in the sense of how particles are excitations of quantum fields, applications like bsky or whtwnd are excitations of the underlying atproto network. ↩︎