0% found this document useful (0 votes)
400 views20 pages

Spam Classification and URL Analysis

This document contains a list of URLs that have been categorized as spam or not spam. Some URLs are also categorized by the type of spam, such as scam, stolen content, keyword stuffing, etc. The majority of URLs listed are categorized as not spam, with some categorized as junk, soft spam, or hard spam. A few URLs have other categorizations like "other" or no categorization provided.

Uploaded by

Barbara ribeiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
400 views20 pages

Spam Classification and URL Analysis

This document contains a list of URLs that have been categorized as spam or not spam. Some URLs are also categorized by the type of spam, such as scam, stolen content, keyword stuffing, etc. The majority of URLs listed are categorized as not spam, with some categorized as junk, soft spam, or hard spam. A few URLs have other categorizations like "other" or no categorization provided.

Uploaded by

Barbara ribeiro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd

URL Spam Category (only requir

[Link] Not spam


[Link] Not spam
[Link] Junk
[Link]
Soft spam Stolen content
[Link] Not spam
[Link] Junk
[Link] Hard spam Scam
[Link] Not spam
[Link] Hard spam Scam
[Link] Not spam
[Link] Not spam
[Link] Not spam
[Link]
Not spam
[Link] Hard spam Scam
[Link] Not spam
[Link] Not spam
[Link] Not spam
[Link]
Not spam
[Link]
Other
[Link] Not spam
[Link]
Not spam
[Link]
Junk
[Link] Not spam
[Link]
Not spam
[Link] Junk
[Link] Not spam
[Link] Not spam
[Link]
Not spam
[Link] Junk
[Link]
Hard spam
[Link] Not spam
[Link] Not spam
[Link] Hard spam Scam
[Link] Not spam
[Link]
Not spam
[Link]
Not spam
[Link]
Junk
[Link] Not spam
[Link] Soft spam Keyword stuffing
[Link] Not spam
[Link]
Hard spam Misleading links / mis
[Link] Not spam
[Link]
Not spam
[Link] Not spam
[Link] Junk
[Link] Not spam
[Link]
Junk
[Link] Junk
[Link]
Soft spam Stolen content
[Link]
Not spam
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link] Not spam
[Link]
Soft spam
[Link] Not spam
[Link] Junk
[Link] Not spam
[Link]
Hard spam Misleading links / mis
[Link]
Not spam
[Link] Not spam
[Link] Junk
[Link] Not spam
[Link] Not spam
[Link] Not spam
[Link] Hard spam Misleading links / mis
[Link] Hard spam Scam
[Link]
Junk
[Link] Other Other
[Link] Not spam
[Link]
[Link] Junk
[Link] Not spam
[Link] Not spam
[Link]
Junk
[Link]
Junk
[Link]
Junk
[Link] Not spam
[Link] Hard spam Scam
[Link] Junk
[Link]
[Link]
Soft spam Poor layout / design
[Link]
Hard spam Misleading links / mis
[Link] Junk
[Link] Junk
[Link]
Not spam
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link] Not spam
[Link]
Not spam
[Link] Not spam
[Link]
Hard spam Misleading links / mis
[Link] Not spam
[Link]
Adult
[Link] Not spam
[Link]
Hard spam Misleading links / mis
[Link] Not spam
[Link]
Junk
[Link]
Not spam
[Link] Hard spam Misleading links / mis
[Link]
Not spam
[Link]
Not spam
[Link] Junk
[Link] Not spam
[Link]
Soft spam Poor layout / design
[Link] Junk
[Link]
Junk
[Link] Not spam
[Link]
Not spam
[Link] Not spam
[Link]
Not spam
[Link] Soft spam Stolen content
[Link] Not spam
[Link] Not spam
[Link] Not spam
[Link] Not spam
[Link]
Hard spam Misleading links / mis
[Link] Junk
[Link] Hard spam Misleading links / mis
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link]
Not spam
[Link] Not spam
[Link] Junk
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link] Hard spam Misleading links / mis
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link] Junk
[Link]
Hard spam Misleading links / mis
[Link] Not spam
[Link]
Not spam
[Link] Not spam
[Link] Not spam
[Link] Not spam
[Link]
Not spam
[Link] Junk
[Link]
Junk
[Link] Not spam
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link] Hard spam Misleading links / mis
[Link] Junk
[Link] Hard spam
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link] Not spam
[Link] Hard spam Misleading links / mis
[Link]
Not spam
[Link]
Not spam
[Link] Not spam
[Link]
Junk
[Link]
Not spam
[Link] Adult
[Link] spam Misleading links / mis
[Link] Not spam
[Link] Not spam
[Link] Junk
[Link]
Not spam
[Link] Junk
[Link] Not spam
[Link]
Not spam
[Link] Not spam
[Link] Hard spam Misleading links / mis
[Link] Soft spam Stolen content
[Link]
Not spam
[Link] Hard spam Misleading links / mis
[Link] Hard spam
[Link]
Not spam
[Link] Not spam
[Link] Not spam
[Link]
Not spam
[Link]
Not spam
[Link] Not spam
[Link] Not spam
[Link]
Hard spam Misleading links / mis
[Link] Hard spam Misleading links / mis
[Link] Junk
[Link]
Not spam
[Link] Soft spam Keyword stuffing
[Link]
Hard spam Misleading links / mis
[Link] spam
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link]
Not spam
[Link] Not spam
[Link] Junk
[Link]
Not spam
[Link]
Hard spam Misleading links / mis
[Link] Not spam
[Link] Not spam
[Link] Not spam
[Link] Hard spam Misleading links / mis
[Link] Not spam
[Link]
Hard spam Misleading links / mis
[Link] Junk
[Link] Not spam
[Link]
Soft spam Cheap or machine gene
[Link]
Junk
[Link] Not spam
Comment (only required for spam)

The page does presents cheap and copied content from other websites, but no misleading links or other harmful behavior to

The page is a raffle website that does not present information about other players or winners, as well as its social networks dir

The page is a store that does not present reference or any social media, for contacting the responsible for the store, which can

The page claims to be an online store, but there are no references or social media, also, there are other stores with the same n

Spanish/Foreign language

The page claims to be an online store, but there are no references or social media about it, also, the name is very similar to a f

The page does presents cheap and probably machine generated content, but no misleading links or other harmful behavior to

The page is a website for downloading an specific game that does present misleading links and misleading site behavior, there
The content is copied and the layout of the page is poor, but the page does not present harmful behavior, therefore, it can be

The page is a website for downloading an specific movie that does present poor layout and misleading links , therefore, being

The page is part of a website for downloading games that does present misleading links, therefore, it should be considered as

The page is a website fthat does present misleading links to other websites that does not work, therefore, being considered a
The website claims to be an online store, but it does scammy behavior, such as the social media icon leading to the social mea

Foreign language

The page is part of a website that does present false brand association to a famous website, therefore, it can be considered as

The website does present funneling to the same website which can be considered as spamy behavior, but not very harmful to
The page is part of a website for downloading games that does present misleading links, therefore, it should be considered as

The page is part of a website for downloading games that does present misleading links, therefore, it should be considered as
The website is a website for watching online shows that present misleading links, therefore, it should be considered as Hard S

The page is a blog which offers the download of an specific game, that does present misleading links that don't work, therefor

The website is a website for downloading an specific software that does present misleading links, therefore, it should be cons

The page is part of a blog that does present a poor layout but no use of harmful behavior, therefore, being considered as Soft

The page does present copied and stolen content from other websites, but no misleading links or other type of harmful behav

The page is a website that does present misleading links to other websites which doesn't work, therefore, being considered

The page is a website that does present misleading links to other websites which doesn't work, therefore, being considered a
The page is a blog which offers the download of music, that does present misleading links that don't work, therefore, Hard Spa

The page is part of a website that does present misleading links, therefore, it should be considered as Hard Spam.

The page is part of a website for downloading an specific type of game feature, which does present misleading links, therefore
The page is part of a blog that does present misleading links, therefore, being considered as Hard Spam.

The page is part of a website for downloading music albuns which does provide misleading links to websites that does not wor
The page is part of a website for downloading movies which does provide misleading links to websites that does not work, the

The page is a blog which offers the download of an specific movie, that does present misleading links that don't work, therefor

The page is a part of a website for watching movies online that does present misleading links, therefore, it should be considere

The page is a website for selling steams for games that does present misleading links and malicious behavior, therefore, can b

The page is part of a blog website that provides the download of a game which does present misleading links, therefore, Hard

The page is a part of a website for downloading movies online that does present misleading links, therefore, it should be consi
The website does provide stolen and copied content from other websites, therefore, being considered as Soft Spam.

The page is part of a website for watching movies that does present misleading links, therefore, Hard Spam.

The page is part of a website for downloading movies which does provide misleading links to websites that does not work, the
The page is a website for watching movies online which does provide misleading links, therefore, being considered as Hard Sp
The page does present keyword stuffing and no apparent harmful behavior to the user, therefore, Soft Spam,
The page is part of a website for watching movies that does present misleading links, therefore, Hard Spam.

The page is part of a website for waching movies which does provide misleading links to websites that does not work, therefo

The page is part of a blog website that provides the download or watching movies online that does present misleading links, t

The page is a website for watching animes online which does provide misleading links, therefore, being considered as Hard Sp

The page is part of a blog website that provides the download of a specific music album that does present misleading links, th

The content of the page makes no sense and can be considered as MGC, therefore, being labeled as Sof Spam.
bsites, but no misleading links or other harmful behavior to the user, therefore, being considered as Soft S

out other players or winners, as well as its social networks direct to a user's profile, it also asks for credit card data for participat

media, for contacting the responsible for the store, which can be considered as malicious and Scam behavior, therefore, Hard Spam.

or social media, also, there are other stores with the same name, which can be considered as false brand associartion and as a strategy fo

or social media about it, also, the name is very similar to a famous store in Brasil, Ponto frio, which can be close to considered as false bra

content, but no misleading links or other harmful behavior to the user, therefore, being considered as Soft Spam.

present misleading links and misleading site behavior, therefore, being considered as Hard Spam.
page does not present harmful behavior, therefore, it can be considered as Soft Spam.

s present poor layout and misleading links , therefore, being considered as Hard Spam.

esent misleading links, therefore, it should be considered as Hard Spam.

websites that does not work, therefore, being considered as Hard Spam.
avior, such as the social media icon leading to the social meadia of other famous website, therefore, being considered as harmful behavior

ation to a famous website, therefore, it can be considered as Spammy behavior, therefore, Hard Spam.

n be considered as spamy behavior, but not very harmful to the user, therefore, Soft Spam.
esent misleading links, therefore, it should be considered as Hard Spam.

esent misleading links, therefore, it should be considered as Hard Spam.


misleading links, therefore, it should be considered as Hard Spam.

that does present misleading links that don't work, therefore, Hard Spam.

t does present misleading links, therefore, it should be considered as Hard Spam.

use of harmful behavior, therefore, being considered as Soft Spam.

sites, but no misleading links or other type of harmful behavior, therefore, it should be considered as Soft Spam.

websites which doesn't work, therefore, being considered as Hard Spam.

websites which doesn't work, therefore, being considered as Hard Spam.


present misleading links that don't work, therefore, Hard Spam.

herefore, it should be considered as Hard Spam.

game feature, which does present misleading links, therefore, it should be considered as Hard Spam.
efore, being considered as Hard Spam.

does provide misleading links to websites that does not work, therefore, it should be considered as not acceptable, Hard Spam.
provide misleading links to websites that does not work, therefore, it should be considered as not acceptable, Hard Spam.

, that does present misleading links that don't work, therefore, Hard Spam.

es present misleading links, therefore, it should be considerered as not acceptable, Hard Spam.

ent misleading links and malicious behavior, therefore, can be considered as Hard Spam.

a game which does present misleading links, therefore, Hard Spam.

t does present misleading links, therefore, it should be considerered as not acceptable, Hard Spam.
websites, therefore, being considered as Soft Spam.

nt misleading links, therefore, Hard Spam.

provide misleading links to websites that does not work, therefore, it should be considered as not acceptable, Hard Spam.
ide misleading links, therefore, being considered as Hard Spam
behavior to the user, therefore, Soft Spam,
nt misleading links, therefore, Hard Spam.

vide misleading links to websites that does not work, therefore, it should be considered as not acceptable, Hard Spam.

watching movies online that does present misleading links, therefore, Hard Spam.

vide misleading links, therefore, being considered as Hard Spam

a specific music album that does present misleading links, therefore, Hard Spam.

s MGC, therefore, being labeled as Sof Spam.


Allowed spam labels Allowed categories
Hard spam Other
Soft spam Doorway
Not spam Malware
Junk Phishing
Adult Scam
Other Thin affiliation
or credit card data for participat Misleading links / misleading site behavior
Aggressive ads
am behavior, therefore, Hard [Link] content
Poor layout / design
Cheap or machine generated content / no incremental value
Keyword stuffing
Content farm (content not relevant to domain and/or page title)
alse brand associartion and as a strategy for phishing, therefore, Hardm Spam.

hich can be close to considered as false brand associartion and as a strategy for phishing, therefore, Hardm Spam.

red as Soft Spam.


ore, being considered as harmful behavior, Hard Spam.
ed as Soft Spam.

d as not acceptable, Hard Spam.


not acceptable, Hard Spam.

not acceptable, Hard Spam.


cceptable, Hard Spam.

You might also like