Spam Detection For Social Media Posting APIs

A small microservice for detecting possible spamming attempts when using social media posting APIs. It uses a rules based approach that can be configured by changing the config JSON.

This implementation supports Twitter. The code is modular and other platforms can be added as an independent service. The SocialMediaService is responsible for using third-party API calls in order to fetch user account and historical data.

Analysis Criteria

The request payload is analyzed for spam keywords, hashtags, and username
The request payload is analyzed for excessive URLs
User's post history is analyzed for posting frequency
User information is analyzed for potential spam in user bio etc.
User's verification status is considered as well

Response Structure:

interface PostAnalysisResult {
  action: SpamDetectionAction;
  confidence: number;
  score: number;
  reasons: string[];
  requestId: string;
  processedAt: number;
}

type SpamDetectionAction = "ALLOW" | "BLOCK" | "FLAG";

How to run

This code is using https://rapidapi.com/davethebeast/api/twitter241 for getting historical user data. You can use the free version to test this service.

The API keys needs to be passed as an ENVIRONMENT VARIABLE.

env file

create ".env" file in the root directory
add :

TWITTER_BASE_URL = "https://twitter241.p.rapidapi.com"
TWITTER_BEARER_TOKEN = "xxxxxxxxxx"
NODE_ENV = "development"

To run,

docker compose up

Then make a request to the /detect/ endpoint

Curl Request:

curl --request POST http://localhost:3000/api/v1/detect \
  --header "Content-Type: application/json" \
  --data '{
    "userID": "NatGeo",
    "content": "congratulations winner, cash prize, no experience needed, investment opportunity, double your money, risk free, bitcoin opportunity, spamHashtags: #followme, #follow4follow, #followforfollow, #followback",
    "platform": "twitter",
    "requestId": "11111111"
  }'

Endpoint for resetting cache.

 curl --request POST http://localhost:3000/admin/refresh-rules

After changing the contents of rules.json, call this endpoint to rest cache.

Test

npm run test

Architecture

This is a rules based spam detection program. It analyzes the current post, historical data from user's post history, and user profile information (please see the structure of rules.json below.) It follows this general flow:

API endpoint receives a request to analyze a social media post.
SpamDetectionService, based on the platform (specified in the payload), calls api to fetch user details (via SocialMediaService)
RulesEngine then takes the data and evaluates it based on the rules specified in rules.json and returns a response that includes an action (ALLOW | FLAG | BLOCK ) along with the calculated score and a confidence value which indicates how accurate the analysis is likely to be.

Cache

User history and rules are cached. The TTL for each can be specified in rules.json

Code Architecture

The code uses a container of services injects these services as dependencies where required. For example, the SpamDetectionService requires:

    private rulesEngine: RulesEngine,
    private cacheService: CacheClient,
    private database: any, // not implemented
    private messageQueue: any, // not implemented
    private logger: Logger,

as dependencies. Since this is a relatively small service it uses a container registry as opposed to tsinject or nestjs. These are the main service:

SpamDetectionService
SocialMedia/twitterService
cacheService
rulesEngine

Rules.json

Main configuration for the rules engine

The list of spam words is put in "rules.json":

rules.spamKeywords

rules.spamHasgtags

rules.spamUsernames

The threshold for considering a score as spam is also specified in the json, along with with the scoring parameters

  "thresholds": {
    "block": 80,
    "flag": 50,
    "allow": 49
  }
-----------------------------------------------------------------------------
  "scoring": {
    "weights": {
      "spamKeywords": 8,
      "excessiveUrls": 8,
      "excessiveHashtags": 6,
      "caps": 4.5,
      "userDescriotionSpamKeywords": 6.15,
      "userDescriptionExcessiveHashtag": 5.5,
      "userDescriptionExcessiveUrls": 5.75,
      "userNameSpamyKeywords": 6,
      "isLowFollowers": 25,
      "isUserLowFollowCount": 20,
      "isUserNotVerified": 10,
      "isHistoryPostingOften": 50
    }
  },

The properties with the prefix "is" is added to the final score while the rest are multiplied.

Platform specific system rules are also put in this json, foor example, the TTL for storing user post history in the cache.

TODO:

Fine tune rules.json weights
Add more words to to the set of spam words
Add messaging queue implementations for deeper analysis of flagged posts
Add database entry to allow manual analysis of flagged/blocked posts
Add more tests cases for rulesEngine as well as integration and E2E tests for the entire service

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
tests		tests
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
Dockerfile.dev		Dockerfile.dev
LICENCE.md		LICENCE.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml
jest.config.js		jest.config.js
nodemon.json		nodemon.json
package-lock.json		package-lock.json
package.json		package.json
systemdiagram.png		systemdiagram.png
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spam Detection For Social Media Posting APIs

Analysis Criteria

Response Structure:

How to run

env file

Curl Request:

Endpoint for resetting cache.

Test

Architecture

Cache

Code Architecture

Rules.json

Main configuration for the rules engine

TODO:

About

Uh oh!

Releases

Packages

Languages

License

ketansrivastav/spam-detector

Folders and files

Latest commit

History

Repository files navigation

Spam Detection For Social Media Posting APIs

Analysis Criteria

Response Structure:

How to run

env file

Curl Request:

Endpoint for resetting cache.

Test

Architecture

Cache

Code Architecture

Rules.json

Main configuration for the rules engine

TODO:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages