AI models by OpenAI and other tech giants are under attack from bots extracting intelligence.
AI models by OpenAI and other tech giants are under attack from bots extracting intelligence.
The Rise of Web Scraper 2.0: Extracting Intelligence from AI Models
In the ever-evolving landscape of artificial intelligence, a new breed of digital bots has emerged, targeting powerful AI models such as OpenAI’s GPT-4. These bots, aptly named “web scraper 2.0,” are hijacking the models to extract valuable intelligence, posing both challenges and opportunities for developers and tech companies alike.
This phenomenon was first discovered and discussed by Guillermo Rauch, CEO of Vercel, a startup that specializes in assisting developers in building websites integrated with AI models. Rauch sat down with venture capitalists Elad Gil and Sarah Guo on the No Priors podcast to shed light on this growing issue. The situation is so concerning that I reached out to Rauch to gain deeper insights.
The Insatiable Demand for Quality Data
The boom in generative AI has led to an unprecedented demand for high-quality data. AI models heavily rely on this data for training, and without it, their performance suffers. However, the supply of such data is limited. This scarcity of training data has driven the development of these new bots.
Rauch explains that these bots strategically scrape the outputs of powerful AI models like GPT-4 and Llama 2 to collect fresh training data for their own models. This technique, known as “model distillation,” involves training another model based on a dataset consisting of, for instance, 100,000 high-quality outputs from GPT-4. Some leading AI companies, like OpenAI, Google, and Anthropic, explicitly prohibit the use of their outputs for training other models.
The Costly Consequences of Abusing AI Models
Another motivation behind the rise of these bots is the increasing cost associated with using top-performing AI models. OpenAI and other tech companies have implemented rate limits, restricting the number of questions users can ask per minute or per day. However, nefarious actors are circumventing these limits by creating bots that inundate the models with questions, leaving someone else to foot the bill for the generated answers.
- JPMorgan’s Jamie Dimon suggests the Fed may need to raise int...
- Biden advisor Ali Zaidi encourages CEOs to ignore political backlas...
- The ‘coffee-cup test’ reflects concerns about AI’...
These bots infiltrate applications with official accounts and API connections to powerful AI models. By exploiting these connections, the bots gain access to the AI models while evading direct billing. Rauch cites the unfortunate example of a developer who fell victim to this type of attack. Her data scientist application, which queries a major language model, was exploited by bots, resulting in a staggering $35,000 OpenAI bill. After months of pleading her case, OpenAI eventually refunded her expenditure.
Furthermore, these bots not only pose challenges for individuals but also bypass China’s blockade on accessing popular generative AI models like ChatGPT and GPT-4. By utilizing these bots to clandestinely collect the best outputs, users in China can circumvent the country’s strict censorship policies.
Safeguarding Against Bot Attacks
With Vercel hosting hundreds of thousands of AI applications each month, the threat of bot attacks looms large. Vercel has proactively implemented technology to help developers protect their applications from such attacks. By integrating rate limits, developers can control the number of queries made to AI models, curtailing the ability of outside bots to steal valuable intelligence.
These attacks not only affect individuals but also pose a significant risk to Software-as-a-Service (SaaS) businesses. SaaS companies often offer per-seat subscriptions, where customers pay a fixed monthly fee for unlimited use. However, when AI models are involved, bots can exploit these services, consuming outputs without benefiting real customers. Rauch suggests that businesses may need to adopt more usage-based charging models, such as combining platform fees with per-token or per-query charges, to counter these threats.
To conclude, the rise of web scraper 2.0 and the exploitation of powerful AI models present new challenges for developers and tech companies. However, by implementing robust security measures and exploring innovative pricing strategies, the industry can navigate these challenges and continue to harness the potential of AI while protecting valuable resources.
Source: Business Insider