Social network Bluesky recently put forth a proposal on GitHub that could change how users manage their posts and data. This new plan allows users to express whether they want their data scraped for purposes like training generative AI or for public archiving.
CEO Jay Graber brought attention to this proposal during her talk at South by Southwest. However, things took a turn when she shared updates on Bluesky, sparking concern among users. Many felt the new plan conflicted with Bluesky’s earlier promises of never selling user data or using it to train AI.
One user voiced their frustration, saying, “Oh, hell no! The beauty of this platform was the NOT sharing of information. Especially gen AI.”
In response, Graber noted that generative AI companies are already pulling public data from various places online, including Bluesky, since all posts on the platform are public. She explained that Bluesky aims to establish a new standard that serves as a guideline for data scraping, similar to how the robots.txt file works for websites.
Discussions around AI training and copyright have made tools like robots.txt more relevant, though they lack legal enforceability. Bluesky’s proposed standard wouldn’t be legally binding either, but Graber insists it would carry ethical weight, encouraging “good actors” to respect user preferences.
The proposal allows Bluesky users to adjust their settings and decide if they want their data used in four key areas: generative AI, protocol bridging (which connects different social media environments), bulk datasets, and web archiving, like the Internet Archive’s Wayback Machine. If users choose to opt-out of data usage for generative AI, the proposal states that companies and researchers should respect these choices when collecting data.
Molly White, a writer focused on tech, shared her thoughts, describing the proposal as a positive step. She pointed out that instead of welcoming AI scraping, Bluesky is trying to give users a way to express their data preferences amid existing practices. However, she also highlighted a potential flaw: this relies on scrapers to honor user signals, which isn’t always guaranteed. Many companies have ignored established guidelines like robots.txt in the past.
As the conversation around data privacy and AI continues to evolve, experts stress the importance of user transparency and consent. A recent survey revealed that 73% of internet users are concerned about how their data is being used by companies, underscoring the need for platforms like Bluesky to prioritize user control.
In summary, Bluesky’s proposal aims to create clearer guidelines for data usage while addressing user concerns about privacy. As AI technology continues to develop, how platforms handle user data will play a crucial role in shaping trust and engagement in digital spaces. For further insights on data privacy, you can check out reports from the Electronic Frontier Foundation.
Source link
Bluesky,jay graber