AI Ad Verification on Social Media
AI Ad Verification on Social Media
1.2 Scope
● Scams on social media platforms only
● Solution is not tackling phishing spam emails
Click Fraud Click fraud fakes the number of clicks, viewers and Bots, automated scripts or hired individuals — called
traffic on an online ad platform to deceive and gain a click farms — can generate these fraudulent clicks and
financial advantage. This type of ad fraud affects fraudulent traffic.
pay-per-click (PPC) ads.
Impression Impression fraud, or ad viewability fraud, generates Digital ad fraud called ad stacking and pixel stuffing.
Fraud fake ad views or video ad impressions without actual Imagine that you put a poster on a display board, but
human viewers. This type of mobile ad fraud affects someone intentionally puts or stacks another one on top
billed campaigns based on the number of times an ad of it. The intended audience will only see what’s on top
is displayed (CPM = cost per thousand impressions). and not the one that you actually displayed. In another
instance, you might decide to put a huge poster up to get
the attention of more people. However, someone comes
along and shrinks it to the size of a dot so that nobody
can actually see what you posted. Instead, they then stuff
that original space with other posters.
Conversion Conversion fraud fakes the number of leads or sales This type of ad fraud uses malicious bots or paid
Fraud to collect a commission or inflate performance metrics. individuals to complete forms, sign up for free trials or
make purchases using stolen credit card information.
Affiliate Ad Affiliate ad fraud is specifically related to affiliate Techniques used to engage in this type of fraud include
Fraud marketing programs, where affiliates are paid a cookie stuffing (putting data on someone’s computer to
commission for directing traffic or sales to a business. make it appear that they visited your site), using bots to
Fraudulent affiliates can generate invalid traffic or fake complete actions required for earning commissions or
conversions to earn commissions illegitimately. misrepresenting the source of the traffic to claim that they
provided leads.
We observe that the Ad Feature Extraction can happen via the approaches elaborated below
Natural Language Processing (NLP): Extract features like sentiment analysis (positive/negative language),
Text Analysis named entity recognition (identifying brands, products, people), keyword frequency (excessive use of generic
marketing terms), and part-of-speech tagging (unusual grammar patterns).
Bag-of-Words (BoW): Represent the ad content as a collection of words, capturing the overall vocabulary and
potential red flags like excessive repetition.
TF- IDF (Term Frequency-Inverse Document Frequency): This method goes beyond BoW by assigning weights
to words based on their importance within the ad and rarity across the ad corpus. It helps identify keywords specific
to deceptive ads.
Further, language style and tone analysis, Grammar and spelling checks can be the additional features that can be
extracted from Ads.
Image Recognition: Extract features like object detection (identifying logos, people, products), scene
Visual understanding (detecting unrealistic or staged settings), and image analysis (detecting poor image quality,
excessive editing).
Analysis Optical Character Recognition (OCR): Extract text embedded within images, allowing combined analysis of text
and visuals (e.g., identifying inconsistencies between the written text and the image).
URL Analysis: Extract features like domain name structure (unusual extensions, typos, subdomain
Link Analysis inconsistencies), website legitimacy checks (blacklisted domains, domain age and registration information, SSL
certificate verification), and website content analysis (looking for known phishing patterns).
Engagement Metrics: Analyze metrics like likes, shares, comments, and user reviews to identify unusual activity
Additional patterns that might suggest inauthentic promotion.
Temporal Features: Consider the time of ad posting, frequency of ad changes (rapidly changing content could be
Features suspicious), and ad lifespan (short-lived ads might be riskier).
Advertiser Profile Analysis: Details like Account age and history, user reputation scores, social media presence
verification.
Limiting Broad Fraudsters usually target high-volume and high-competition Unusually high click-through rates or a sudden
Targeting on keywords. By focusing on niche-specific keywords, the ads spike in traffic from a specific source can be
are less appealing to scammers. With a specific audience in red flags for ad fraud since deviations from
Keywords mind, one can set clearer expectations of user behavior. these expectations are easier to spot.
Monitoring the Focus on quality leads rather than quantity. Monitor the leads Monitoring allows early detection of anomalies
Quality of generated from online advertising efforts to identify patterns such as many leads with gibberish information,
or spot the characteristics of fake or low-quality leads. identical details submitted multiple times, or
Leads leads from regions outside the target market.
Adding Extra CAPTCHA challenges prevent bots from submitting fake Routine security audits and penetration testing
Safeguards on information. also uncover potential weaknesses and allow
timely mitigation before any scammer can
Website manipulate them.
Incorporating Using machine learning and pattern recognition to detect and By analyzing vast amounts of data in real time,
Artificial mitigate fraudulent activities, such as irregular click patterns, AI can detect anomalies that would be
suspiciously high engagement rates from certain sources or impossible for humans to identify manually.
Intelligence Ad abnormal user behavior.
Tools and Apps
Approaches Methods
Random Forests: An ensemble of decision trees that vote on the classification. They're robust
to overfitting and can handle high-dimensional data well.
Gradient Boosting Machines: Builds trees sequentially, with each tree correcting errors of the
Supervised Learning Models: previous ones. XGBoost and LightGBM are popular implementations known for their speed and
These models learn from labeled data performance.
to classify ads as genuine or fake. Support Vector Machines: Work well for binary classification tasks, especially in
high-dimensional spaces. They aim to find the hyperplane that best separates the classes.
Deep Neural Networks: Can learn complex patterns from large datasets. They're versatile but
may require more data and computational resources.
Clustering (e.g., K-means, DBSCAN): Group similar ads together. Outliers or small clusters
Unsupervised Learning: These might indicate fraudulent activity.
methods detect anomalies or patterns
Autoencoders: Neural networks that compress then reconstruct data. Ads that don't
without labeled data.
reconstruct well may be anomalous.
Semi-Supervised Learning:
Useful when you have a small amount Label propagation: Spreads labels from labeled to unlabeled data points based on similarity.
of labeled data and a large amount of Self-training: The model iteratively labels unlabeled data and retrains itself.
unlabeled data.
Ensemble Methods: Combine Stacking: Train a meta-model to combine predictions from base models.
multiple models to improve overall Voting: Each model votes on the classification, with majority or weighted voting determining
performance. the final output.
CNNs: Excellent for image analysis, detecting visual patterns indicative of fake ads.
RNNs/Transformers: Process sequential data like text, understanding context and language
Deep Learning Approaches patterns.
Multimodal learning: Combines different types of data (e.g., text and images) for a more
comprehensive analysis.
Graph Neural Networks: Can capture complex relationships between advertisers, ads, and
Graph-based Models: Analyze user interactions.
relationships between entities in the ad
Node2Vec: Creates vector representations of nodes in a graph, useful for detecting suspicious
ecosystem.
patterns in advertiser networks.
Multi-armed bandits: Balance exploration (trying new strategies) and exploitation (using
Reinforcement Learning: Adapts known effective strategies).
strategies over time to maximize
Deep Q-Networks: Combine deep learning with reinforcement learning for more complex
long-term rewards.
decision-making.
Time Series Analysis: Detects ARIMA: Models the time dependencies in data.
unusual temporal patterns. LSTM networks: Can capture long-term dependencies in sequential data.
BERT, GPT: State-of-the-art models for understanding and generating human-like text.
NLP Models: Analyze text content of Named Entity Recognition: Extracts key information like names, locations, and organizations
ads.
from text.
Computer Vision Models: Analyze Siamese networks: Compare images to detect duplicates or near-duplicates.
visual content of ads. Hybrid Approaches: Combine different techniques for improved performance.
Rule-based + ML: Use expert knowledge to create rules, then ML to refine and adapt these
Hybrid Approaches: Combine rules.
different techniques for improved
Feature extraction + classification: Use deep learning for feature extraction, then traditional
performance.
classifiers for final decision-making.
Online Learning Models: Incremental learning: Update the model with each new piece of data.
Continuously update with new data. Adaptive boosting: Adjust the importance of data points based on previous errors.
The User 62 year old retired female, India 41 year old female, US 32 year old male, India
Former bank employee, SBI, retired. IT employee with 2 kids and a IT employee, married,
About the Also worked in the fraud department of husband. Understands phishing and Understands different types of online
person SBI. various scams due to work scams
experience.
She went ahead despite being - She couldn’t verify the link in google Could not verify the authenticity of
suspicious because a sense of search due to time constraints. the product as even the product
Pain Points urgency was created which made her reviews were seemingly authentic
feel anxious and question ‘What if’. and balanced with both positives and
negatives.
- The people created a sense of - It was the first link in the google - The ad was positioned along with
urgency. search for ‘Apple Customer Care’. Instagram reels, the most used
- They used the tool ‘Anydesk’ to ask - It being the first link and a feature
her to share her mobile screen. sponsored ad was considered good - The ad was very genuine with
- She had paid all the bills and was enough to trust. high-quality images and authentic
very aware of the various scams but - The behavior of the person in the details as provided for watches
Key still bought into the narrative ‘if you line acting as a customer care and online elsewhere in ecommerce
takeaways don’t call and pay as per the the slip in the hindi accent gave away websites
updated govt policies, you will lose enough hints that something is - User realized that he was scammed
your electricity’. suspicious and made the user drop when the payment went through and
- Her suspicions and her awareness the call. the company ghosted him, without
on ‘Any Desk’ could save her from the any email or order information.
scam but was not enough to avoid the
call in the first place.
Reported losses worldwide to fraud that started on social media $1.4B $1.1B $729M $237M
● In 2023, 51% of reports about fraud starting on social media identified Facebook as the social
media platform, and 22% identified Instagram. 7 Of the $1.8 billion reported lost to
investment-related fraud in 2023, $707 million was lost using cryptocurrency and $689 million
was lost using bank transfers
● Scammers were most often reaching out by email and phone calls. But people reported that they
lost the most money on scams that started on social media.
● Gift cards were the top reported payment method on several types of scams in 2023, including
romance scams, tech support scams, government impersonation scams, and scams that
impersonate people you know, like your boss or a grandchild.
As per April 2024 Statista report, there are more than 378 million Facebook users in India alone, making it
the leading country in terms of Facebook audience size. With an audience of this scale, it is no
surprise that the vast majority of Facebook’s revenue is generated through advertising. Almost 81.8
percent of Facebook audiences worldwide access the platform only via mobile phone.
As per Statista Dec 2023 report, small scale businesses were likely targets of cybercriminals, given that
only 24 percent of all Indian companies adequately prepared to take on cyber attacks.
As per Statista March 2024 report, Google India reported an increase in search interest across various
categories such as financial security, family and personal health. Consumers were also more aware of
online scams and fraud and laid emphasis on the trustworthiness of brands. Brands rely on the decision
that advertising and marketing are a key avenue for verticals to leverage Indias’ growing digital economy.
Paid search was among the three most important digital advertising formats, chiefly sustained by the
banking and financial services sector in 2021. Interestingly, India also ranked among the leading five
countries in the world for ad blocking.
2.5 Pain Points for E-commerce Platforms and Social Media Platforms
E-commerce platforms and social media platforms both face significant pain points related to scams,
which can lead to various negative outcomes such as financial loss, data breaches, potential legal
consequences, and eroded user trust. Here’s a breakdown of these pain points:
Financial Loss Chargebacks and Refunds: When scams occur, Revenue Impact: Scams can drive away advertisers
customers often demand refunds or initiate and users, leading to reduced revenue from ads and
chargebacks, resulting in financial losses for the premium services.
platform. Increased Security Costs: Enhancing security
Operational Costs: Investigating and resolving measures to combat scams requires substantial
scam-related issues incurs additional operational investment in technology and personnel.
costs.
Data Breaches Personal Information Theft: Scammers may target User Information Exposure: Scammers often use
e-commerce platforms to steal sensitive customer social media to collect personal data, which can lead
information, such as credit card details, addresses, to large-scale data breaches.
and phone numbers. Misuse of Data: Stolen data can be used for further
Intellectual Property Theft: Cybercriminals can fraudulent activities, compounding the damage.
also steal proprietary information, including supplier
details and business strategies.
Potential Legal Regulatory Fines: Failure to protect customer data Compliance Issues: Social media platforms must
Consequences adequately can result in hefty fines from regulatory comply with data protection laws. Failing to do so can
bodies (e.g., GDPR, CCPA). result in significant legal repercussions.
Lawsuits: Customers affected by scams may sue Litigation: Victims of scams may pursue legal action
the platform for negligence, leading to legal against the platform for failing to prevent fraudulent
expenses and potential settlements. activities.
Eroded User Trust Customer Dissatisfaction: If users fall victim to Loss of User Base: Repeated scams can drive
scams, they may lose trust in the platform, leading to users away from the platform, reducing engagement
decreased customer loyalty and reduced sales. and active user numbers.
Reputation Damage: Negative publicity related to Brand Damage: Public awareness of scams can
scams can tarnish the platform’s reputation, making harm the platform's brand, leading to a loss of
it difficult to attract new customers. credibility and market share.
2.6 Insights
● Roughly 90% of data breaches are caused by Phishing scams.
● 30% of respondents in a survey reported falling victim to job scams on social media.
● Also, 12% of folks reported clicking on phishing URLs on social media platforms.
● Of teenagers and young adults, about 85% fell prey to shopping scams.
● Many investment scams (up to 50%) happen via social media platforms like Instagram,
telegram, and Facebook.
● Romance scams (25%) are a popular method scammers utilize on social media.
● According to organization reports, $1.5 billion was lost due to influencer scams.
● Precaution is better than cure: Education on fake links and fake profiles has helped many
users to be suspicious and not click and fall prey for such scams.
● Human vulnerabilities being the main culprit: Emotions such as greed, loneliness,
hopelessness, sadness, anxiety are the main causes of falling prey to online scams.
3. Framing Hypothesis
3.1 User Personas
One of the biggest impediments in curbing cyber crimes has been the lack of awareness on cyber
hygiene. Even when crimes were reported to authorities, the infrastructure and process to tackle such
cases were largely inefficient.
Another area that could ease cyber crime numbers is the expansion of the cyber security market in the
country. More investments in the sector could combat increased threats that are likely to continue with the
rollout of the 5G network and the establishment of smart cities.
The most critical pain point is the loss of user trust, as it directly impacts the platform's
Eroded User Trust user base and engagement. The negative publicity associated with scams can severely
tarnish the platform's brand image, making it difficult to attract and retain users and
advertisers.
Scammers exploiting social media can lead to significant data breaches, exposing
Data Breaches sensitive user information. This not only affects the users but also places the platform
at risk of legal repercussions and further loss of trust. Once data is compromised, it can
be used for additional fraudulent activities, compounding the harm and increasing the
platform's liability.
The departure of users and advertisers due to scams results in reduced revenue.
Financial Loss Combating scams requires substantial investment in security technologies and
personnel, adding to the financial burden on the platform.
Failing to protect user data and prevent scams can lead to significant fines from
Potential Legal regulatory bodies, especially with stringent data protection laws like GDPR and CCPA.
Consequences Affected users might take legal action against the platform, leading to costly legal
battles and potential settlements.
4. Framing Solution
BakBak.ai is an advanced ML-driven system that categorizes any advertisement posted on social media
platforms into one of 3 categories - Fraud, Potentially Risk, Safe.
This is a B2B SaaS product which will help enterprises like social media platforms to regulate the content
on their platform and weed out any potentially risky ads which might harm their users and lead to
organization reputation management issues.
● Integration with Social Platforms: The system integrates seamlessly with social media
platforms’ APIs. It scans new ads, assesses risk, and provides real-time feedback to advertisers.
● Feature Extraction: The system extracts relevant features from ad content, such as text,
images, and metadata. These features serve as input to the ML model for decision-making.
● Continuous Learning & Improvement: BakBak.ai continuously learns and improves by
incorporating feedback from human reviewers (Trust & Safety team of social media platforms).
Reviewers validate the system’s predictions and provide corrective input.
● Enhanced Ad Safety: Integrates robust data collection, processing, and user engagement
mechanisms to create a safer and more trustworthy advertising environment.
Feedback Loop:
1. Ad Crawling: Continuously crawl and collect data on newly posted ads across the
platform.Gather features such as ad text, images, links, and metadata.
2. Human Oversight During Initial Launch: For the first few months after launching, a trust and
safety team will manually oversee the system's performance.
○ The Trust and safety team will verify each ad flagged as potential risk by the model.
○ The flagged ads will land in the Bak bak incident queue. Any analyst can pick the
incidents(flagged ads) from the queue and resolve them.
○ The genuine ads will be released on the ads servers.
○ The model will be retrained using the new data through a feedback loop to refine the
model and moderation processes based on real-world observations and insights.
3. Feedback Integration: The system incorporates feedback from social media users who report
suspicious or misleading ads.
○ Users can report the fraudulent ads missed using the “Report this ad” option.
○ Once a user reports any ad, the BAKBAK.ai model will be retrained with addition of new
inputs - user feedback.
User Journey
We have divided the user journey into 3 phases for clearly defining the interactions between different
entities involved:
Phase 1: Fraud analysis of the advertisement before making it live on users’ social feed
1. Advertisers post the ad on the social media platform (Facebook, Instagram, etc.).
2. The platform (FB per se) passes the ad to BAKBAK.ai for evaluating the fraud risk category of the
ad.
3. BAKBAK.ai using its Pre trained ML Model - on the features set including text, images, people
figures used in the advertisement - makes a decision the ad belongs to one of the categories:
Fraud, Potentially Risk or Safe
Phase 2: Users interact with the advertisement live on their social feed
As decided by BAKBAK.ai in Phase 1,
1. If the ad is Fraud, it will be immediately removed from the social media platform, thus users will
not see it on the social feed of the platform.
2. If the ad is potentially risky, a high contrast label “Potential Risk” will be appended to the ad and
shown the same way to the users.
a. If the user clicks on it once displayed on the feed, then a warning UI will be shown with
“Report this ad” and “Looks Safe” CTAs - thus, alerting them about the potential risks
associated with the advertisement.
3. If the ad is Safe, then the platform will show it as is on the users’ social feed.
Phase 3: Users who got affected by fake ads can report them:
(This phase is currently out of scope for implementation during the initial launch so we are routing the
affected users to concerned departments in the government)
Detailed and Larger view of flow chart of activities available here -
Week 7 - "Let's tackle the scammers"
Feature Extraction - Text Analysis: Extract textual components like ad descriptions, titles, keywords.
- Image Analysis: Analyze graphical elements for scam indicators.
- Metadata Extraction: Collect posting time, advertiser profile, engagement metrics, and linked URLs.
Data Augmentation
Aspect Details
- Generating Synthetic Ads: Creating synthetic examples based on known scam patterns.
Synthetic Data - Balancing the Dataset: Ensuring a balanced dataset to avoid model bias.
- Test Robustness: Introducing adversarial examples to test and improve the model's robustness
Adversarial Examples against new scam tactic
View larger and detailed view of relationship among the various components ensuring scalability and
robustness here - Week 7 - "Let's tackle the scammers"
Load testing: Conduct performance tests to identify bottlenecks and optimize Impacts user
system performance. experience, ad
Performance Caching: Utilize caching mechanisms to improve response times.
High
delivery, and system
Asynchronous processing: Offload heavy tasks to background processes. efficiency.
Modularity: The system should be designed with modular components for easy
maintenance and updates. Facilitates future
Maintainability Testability: The system should be easily testable to identify and fix defects. Medium enhancements and
Documentation: Clear and comprehensive documentation should be available bug fixes.
for system components and processes.
Based on their experience, we have defined a high-level flow that would be experienced by
each of them on the platform.
Each of these user flows are connected directly with the user stories defined above as well.
For a detailed walkthrough, kindly refer to Col. D in the User Stories sheet.
6. Pricing Strategy
To ensure flexibility and cater to a variety of business needs, we propose the following tiered
pricing plan for our AI-driven ad verification system.
Extra Verifications: $0.01 per additional Billing Cycle: Monthly or annual Basic Support: Email support with a
verification beyond plan limits billing options available 24-hour response time (available in
Custom Feature Development: Discounts: 10% discount for Basic Plan)
Starting at $500 per feature annual upfront payments Priority Support: Email and phone
Extended Data Storage: $50/month for Free Trial: 14-day free trial support with a 12-hour response time
additional 1TB available for all plans (available in Professional Plan)
Specialized Reports: Custom pricing Premium Support: 24/7 dedicated
based on requirements support with a 4-hour response time
(available in Enterprise Plan)
Customer Acquisition Cost (CAC) Cost to acquire a new social media Total sales and marketing expenses / number of new
platform customer customers
Sales Cycle Length Sales Efficiency Average time from initial contact to contract signing
Fraud Prevention ROI Solution Effectiveness (Total financial loss prevented due to fraud detection -
cost of the solution) / cost of the solution
Average Fraud Loss Prevented Solution Impact Total financial loss prevented / number of customers
per Customer
Customer Acquisition Cost (CAC) Return on Investment (Total revenue generated from prevented fraud - total
ROI sales and marketing expenses) / total sales and
marketing expenses
Customer Lifetime Value (CLTV) Customer Value Total revenue generated by a customer over their
lifetime, including fraud prevention benefits
Key Considerations
● Data Collection: Accurate and comprehensive data on fraudulent activities, financial losses, and
customer behavior is essential for calculating these metrics.
● Attribution: Clearly defining how to attribute fraud prevention to the solution can be challenging,
especially in cases where multiple fraud prevention measures are in place.
● Timeframe: Establishing appropriate timeframes for measuring these metrics is crucial for
assessing the solution's long-term impact.
By tracking these metrics, we can demonstrate the financial value of the ad fraud detection solution to
potential customers and measure the overall impact on their business.
Fraudulent Links Reported Solution Effectiveness Total number of fraudulent links reported by the solution
Industry Fraud Reduction Solution Impact Estimated reduction in industry-wide ad fraud losses
Regulatory Compliance Industry Adherence Number of customers achieving regulatory compliance through
the solution
Competitive Advantage Market Position Number of competitive features or advantages over competitors
Time to Value Product Adoption Average time for customers to realize significant ROI
Market Penetration Product Reach Percentage of target market using the solution
Media Coverage Brand Visibility Number of media mentions and articles about the solution
Partner Ecosystem Industry Collaboration Number of partnerships formed with complementary solutions
7.3 Success metrics
By incorporating these calculations, we can gain a deeper understanding of the system's
performance and effectiveness in combating ad fraud.
Percentage decrease in
User Reports, Monthly, Quarterly,
System Effectiveness reported scams compared
Platform Data Annually
Reduction in Scam to previous period
Incidents Percentage decrease in
User Impact User Surveys number of users reporting Quarterly, Annually
scams
Average NPS score,
User Sentiment User Surveys customer satisfaction Quarterly, Annually
score
Platform Trust
Sentiment analysis
Social Media Listening
Brand Reputation score of social media Monthly, Quarterly
Tools
mentions
Average time taken to
Time to Detection System Performance System Logs detect a fraudulent ad Daily, Weekly
from the time it is posted
(Savings from reduced
Cost-Benefit
System Efficiency Financial Data fraud - cost of system) / Quarterly, Annually
Analysis cost of system
Number of model
Model Adaptability Model Performance System Logs updates / total time Monthly, Quarterly
period
Click-through rate on
educational content /
User Education User Awareness System Logs
number of users
Monthly, Quarterly
exposed to content
References