Multiple major tech companies appear to be planning to use user data to train their artificial intelligence models.
In early August, video conferencing platform Zoom changed its terms of service to include the following provision:
“You consent to Zoom’s access, use, collection, creation, modification, distribution, processing, sharing, maintenance, and storage of Service Generated Data for any purpose, to the extent and in the manner permitted under applicable Law, including for the purpose of product and service development, marketing, analytics, quality assurance, machine learning or artificial intelligence (including for the purposes of training and tuning of algorithms and models)”
After facing public backlash, they reversed course and clarified that they don’t plan to utilize user data from calls to train their AI models. However, other companies appear to be heading in that direction.
Earlier this week, a columnist from The Washington Post discovered that unless the user explicitly opts out, Google uses private data from their email to train its AI model.
Similarly, Meta took over a billion Instagram posts from public user accounts to train an AI model of their own.
One of the critical components for training new AI models is having a large amount of data. Current frontier models like GPT-4 are suspected to have been trained using virtually the entire public internet. But for companies to keep building bigger models, they will need even more data — and they are now turning to user data.
Researchers believe that once an AI model has seen private data during training, it is virtually impossible to make it forget that information. This means that new AI models that have been trained with private data may retain that information forever, and if they aren’t deployed safely, they may inadvertently share confidential information (such as an individual’s private conversations or health data) with other users.