By Srini Tummalapenta, Distinguished Engineer, CTO, Security Services, IBM.
Originally published on IBM's Security Intelligence Blog.
As deepfake attacks on businesses dominate news headlines, detection experts are gathering valuable insights into how these attacks came into being and the vulnerabilities they exploit.
Between 2023 and 2024, frequent phishing and social engineering campaigns led to account hijacking and theft of assets and data, identity theft, and reputational damage to businesses across industries.
Call centers of major banks and financial institutions are now overwhelmed by an onslaught of deepfake calls using voice cloning technology in efforts to break into customer accounts and initiate fraudulent transactions. Internal help desks and staff have likewise been inundated with social engineering campaigns via calls and messages, often successfully, as was the case in the attack on internal software developer company Retool, which led to tens of millions of dollars in losses for the company’s clients. A financial worker was duped into transferring funds to fraudsters. Speaker-based authentication systems are now being finessed and circumvented with deepfake audio.
The barrier to entry for bad actors is lower now than before. Tools allowing the creation of deepfakes are cheaper and more accessible than ever, giving even the users with no technical know-how the chance to engineer sophisticated, AI-fueled fraud campaigns.
Given the increasing proliferation and methods used by cyber criminals, real-time detection that leverages AI to catch AI will be essential in protecting the financial and reputational interests of businesses.
Deepfakes across modalities
A deepfake is a piece of synthetic media—an image, video, audio or text—that appears authentic, but has been made or manipulated with generative AI models.
Deepfake audio refers to synthetically generated sound that has been created or altered using deep learning models. A common method behind deepfake audio is voice cloning, involving fake speech created with less than a minute of voice samples of real people. Voice cloning is a particular concern in industries that use voice biometric verification to access customer accounts. Companies that receive a high volume of phone calls as part of their business report constant deepfake attacks on their infrastructure via voice cloning efforts.
The creation of a deepfake video typically involves training a deep neural network on a large dataset of videos and images featuring the target individual(s). The model learns their facial features, expressions and mannerisms, enabling it to generate new video content that looks authentic. Cyber criminals utilize deepfake videos to impersonate executives, bypass biometric verification and create false advertising, among many other uses. Meanwhile, deepfake images can be used to alter documents and bypass the efforts of Know Your Customer (KYC) and Anti-Money Laundering (AML) teams in curbing the creation of accounts under false identities.
Deepfake text refers to artificially generated content meant to mimic the style, structure and tone of human writing. These deepfake models are trained on large datasets of text to learn patterns and relationships between words, teaching them to generate sentences that appear coherent and contextually relevant. These deepfakes aid cyber criminals in large-scale social engineering and phishing attacks by producing massive volumes of convincing text, and are just as useful in document forgery.
The impact of deepfakes across industries
Audio deepfakes are one of the biggest risk factors for modern businesses, especially financial institutions. Bank call centers are increasingly inundated with deepfake voice clone calls attempting to access customer accounts, and AI-fueled fraud has become the leading security concern for the majority of banks as fraudsters submit AI-altered documents to open fake accounts. Finance workers are manipulated into moving tens of millions with deepfake meetings cloning the CEO’s voice and likeness. Following the Retool phishing attack, just one of the company’s cryptocurrency clients lost $15 million in assets.
But the damage caused by deepfake cyber crime goes far beyond voice clones and can impact any industry. Insurance companies are facing significant losses as fraudsters submit deepfake evidence for illegitimate claims. Competitors can create fake customer testimonials or deepfake videos and images of a supposedly faulty product to damage a brand. While the average cost of creating a deepfake is $1.33, the expected global cost of deepfake fraud in 2024 is $1 trillion. Deepfakes are a threat to markets and the economy at large: the deepfake of a Pentagon explosion caused panic on the stock market before officials could refute it. A more sophisticated attack could easily lead to massive losses in company value and damage to global economies.
For media companies, reputational damage caused by deepfakes can quickly lead to loss of viewers and ad revenue. At a time when audiences are already skeptical of all content they encounter, deepfakes raise the stakes for accurate reporting and fact-checking. If a piece of audiovisual media that serves as the basis or evidence for a news report is found to be a deepfake, unverified and unlabeled, the damage to the newsroom and the company’s relationship with its audience could be irreparable.
Social media platforms are just as vulnerable, especially because they’ve become the leading news source for the majority of Americans. Malicious actors spend a mere 7 cents to reach 100,000 social media users with a weaponized deepfake. Allowing the unchecked spread of AI-manipulated news stories can lead to serious audience and advertiser losses and shareholder unrest, not to mention the corrosive effects on society at large.
Deepfake disinformation campaigns can impact the integrity of elections, causing civic unrest and chaos within government institutions. Such unrest can rattle the markets, weaken the economy, and erode the trust between voters and the electoral system. Over 40,000 voters were affected by the deepfake Biden robocall in New Hampshire. But these campaigns are not limited to elections. State-sponsored actors can create synthetic videos of leaders making false claims to damage diplomatic and trade relations, incite conflict and manipulate stocks. The World Economic Forum’s Global Risks Report 2024 ranks AI-fueled disinformation as the number one threat the world faces in the next two years.
Deepfake detection solutions
How do organizations combat this urgent threat? It all comes down to detection.
The ability to detect AI-generated voices, videos, images and text—accurately, swiftly and at scale—can help organizations stay ahead of the threat actors attempting to use deepfakes to execute their fraud or disinformation campaigns.
Those working to secure call centers, customer-facing teams and internal help desks will want to seek out a solution that can detect AI-generated voices in real time. As these points of contact are highly vulnerable and susceptible to fraud, real-time voice deepfake detection should fit neatly into existing voice authentication or biometric platform workflows, affording companies seamless integration without retraining employees on a wholly new tech stack.
One in 6 banks struggle to identify their customers at any stage in the customer journey, and finance workers cited customer onboarding as the workflow process most vulnerable to fraud. Text and image detectors are a powerful deterrent to fake documents, identity theft and phishing efforts. A comprehensive deepfake detection toolset should fortify the onboarding and re-authentication flow of KYC and anti-fraud teams to defend against presentation and injection attacks.
Journalists should feel empowered to report on the news with confidence that their sources are authentic. Image, video and text detection models help ensure reporters don’t consider fake evidence in legitimate reports. 53% of Americans get their news from social media. A well-equipped detection solution should help content moderation teams—who cannot be expected to verify onslaught of content at scale—protect social media platforms against becoming unwitting channels for fake content.
Sophisticated audio deepfake detection tools are built to flag the newest popular tool of political manipulation: misleading robocalls using voice clones of political candidates. State-sponsored attackers can now easily masquerade as heads of state and other political figures. Today’s detection solutions can catch synthesized impersonations in critical moments, ensuring the public can be warned. Text detection helps government institutions catch harmful AI-generated documents and communications to help prevent identity and fraud before it can impact citizens’ lives and livelihoods.
Reality Defender is one such solution to detect and protect against advanced deepfakes of all mediums. Its platform-agnostic API allows organizations to upload a firehose of content and scale detection capabilities on demand, using a multi-model approach to look at every uploaded file from multiple angles and with the newest deepfake creation models in mind. This creates a more complete and robust result score, which reflects the probability of AI manipulation. With multiple models across multiple modalities, organizations can take informed and data-driven next steps in protecting their clients, assets, and reputations from complex deepfake attacks of today and tomorrow.