0% found this document useful (0 votes)
9 views6 pages

Scrape Configuration Major and Minor Data Points

The document outlines the major and minor data points required for scraping product information, including details like brand name, product descriptions, and pricing. It specifies additional restrictions and uses for each data point, as well as configurations needed for different types of retailers such as Softlines, HardLines, and Consumables. The document serves as a comprehensive guide for ensuring accurate and effective data scraping processes.

Uploaded by

tharanmurlee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views6 pages

Scrape Configuration Major and Minor Data Points

The document outlines the major and minor data points required for scraping product information, including details like brand name, product descriptions, and pricing. It specifies additional restrictions and uses for each data point, as well as configurations needed for different types of retailers such as Softlines, HardLines, and Consumables. The document serves as a comprehensive guide for ensuring accurate and effective data scraping processes.

Uploaded by

tharanmurlee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.

WebHome) - XWiki
Amazon Confidential

Scrape Configuration Major and Minor Data points


Primary Owner smt-ops (LDAP)
Last modified 8 months ago by vkams.

List of Major and minor data-points to be scraped .

Major data-Points:

SL.No Major Data-point Description Additional Restrictions Some of the uses in downstream UseFull WIKI LINKS

1 brand_name Brand name of item scraped. Should not contain Junk keyword Useful to decided valid/invalid
, which are not valid brands. selection in downstream processes ,
if brand coverage is less try
implement BFT[Brand from Title if
applicable]

2 bread_crumb1 First bread crumb for an item Should not contains Home, Useful In Categorization to decide
retailer name, title, brand,model, Gls and product family
numbers and should not be
duplicate.

3 bread_crumb2 Second bread crumb for an item Should not contains Home, Useful In Categorization to decide
retailer name, title, brand, model, Gls and product family
numbers and should not be
duplicate

4 bread_crumb3 Third bread crumb for an item Should not contains Home, Useful In Categorization to decide
retailer name, title, brand, model, Gls and product family
numbers and should not be
duplicate

5 remaining_bread_crumbs All remaining bread crumb for an item Should not contains Home, Useful In Categorization to decide
retailer name, title, brand, model, Gls and product family
numbers and should not be
duplicate

6 Product_descriptions Product descriptions of item scraped Useful in reverse mapping

7 Bullet_points Bullet points of item scraped Useful in reverse mapping

8 Additional_product_info Use to capture any other provided product Useful in reverse mapping
content. Please save all information
provided in tables.

9 color_name Color of the item Useful in reverse mapping

10 size_name Describes the size of an item Useful in reverse mapping

11 customer_star_ratings The total ratings the item has obtained Useful in deciding Head Selection by https://w.amazon.com/bin/view/SMT_Head_Selection_Parity_Program/
based on user opinions SMPear

12 no_of_customer_reviews Number of customers reviewed an item should be number only . if it is 0 Useful in deciding Head Selection by
then make it blank. should not SMPear
contains value like 'NA','null' or
any other characters.

13 external_id UPC,EAN,GTIN,GTIN13,ISBN can be should be 8-14 character Useful in reverse mapping


scraped

14 has_retail_offer if the seller/merchant of item is retailer Useful in deciding retail selection


then it should be 'Y', if any 3rd party seller
then 'N', if seller/merchant info is not
available then ''(blank). If item is
discontinued or it is Parent product in case
of variants then 'D'.

15 offering_availability Tells us if the item is 'In Stock' or 'Out of Useful to know if item is
Stock' purchasable

16 list_price In case of 2 price[before discount, after can be blank if single price is Useful to know the Price without
discount] then it should be price before given without discount. discount
discount.

17 our_price In case of 2 price[before discount, after Should not be null for In stock Useful to know the selling Price
discount] then it should be price after product.
discount. In case of single price it should
contain the single price.

18 url URL of the item To identify the product

19 Image_urls the url of the item's images and related Useful in Categorization or reverse
thumbnails mapping

20 title Short title of the item Should not be null. Useful In deciding head selection,
exclusion.

21 Parent_sku In case variants , it should uniquely should be unique for parent Useful for variant re-grouper https://w.amazon.com/bin/view/SMT/SelectionAddition/SGMBusinessLo
identify the parent product products and should not be null
[for variants selection]

22 website_sku used to uniquely identify items scraped should be unique and should not useful to identify the product
from a competitor be null uniquely

23 model Model of item scraped Useful in reverse mapping

24 part_number Part number of item scraped Useful in reverse mapping

25 item_number model number of the item as per the Useful in reverse mapping
manufacturer's specification

26 item_package_quantity This contains the number of items for the should be numeric Useful to know the quantity of item
product

27 manufacturer_name name of the manufacturer of the item; not Useful for reverse mapping, item
the brand reconcile , item information
Enrichment

28 merchants It contains the seller name of an item Useful to know the seller of item

29 merchant_url The URL of the merchants available for Useful to know of the seller url
the product]

https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 1/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki
30 merchant_sku Gives the unique code of the merchants Useful in identifying seller uniquely
available for the product

31 has_3p_offer it is used to decide if the product is sold by Useful for 3p reporting https://w.amazon.com/bin/view/SMPEAR_3P_REPORTING/Design/
3rd party seller. Can be 'Y','','N','D'.

32 item_length_width_height_weight Used when a combined string is available Useful for reverse mapping.
in product page

33 Label Associated with music, CDs, DVDs. For Useful for item reconcile, clustering https://w.amazon.com/bin/view/SMT/External_Catalog/EC_IMR/BrandE
example: Columbia, Universal and enrichment for BMVD

34 studio Associated with movie CDs and DvDs - Useful for item reconcile, clustering https://w.amazon.com/bin/view/SMT/External_Catalog/EC_IMR/BrandE
the production studio, for example: Warner and enrichment for BMVD
Home Video

35 publisher It contains Publisher name of the book Useful for item reconcile, clustering https://w.amazon.com/bin/view/SMT/External_Catalog/EC_IMR/BrandE
and enrichment for BMVD

36 is_bestseller To identify if product is Bestseller. Useful in deciding head selection https://w.amazon.com/bin/view/SMT/Bestsellers/Bestseller_Operation_D

37 bestseller_category Bestseller category of item scraped Useful in deciding head election https://w.amazon.com/bin/view/SMT/Bestsellers/Bestseller_Operation_D

38 bestseller_rank Bestseller rank of item scraped Useful in deciding head election https://w.amazon.com/bin/view/SMT/Bestsellers/Bestseller_Operation_D

39 shipping_information Shipping information of item scraped Useful to improve delivery


experience

40 item_sale_status Item_Sale status of item scraped Please refer the external wiki link to https://w.amazon.com/bin/view/ObsoleteSelectionIdentification/
understand it clearly.

41 obs_indicator To identify if product is obsolete Please refer the external wiki link to https://w.amazon.com/bin/view/ObsoleteSelectionIdentification/
understand it clearly.

Minor Data points:

SL.No Minor Data-point Description Possible value type

1 country_of_origin This contains the information of the country from which Text
the product has been originated

2 ingredients This describes the ingredient information of the items. Text


[ingredient used in manufacturing the item]

3 material_type This describes the material information of the items. Text

4 fabric_type This describes the type of fabrics used in clothing items. Text

5 dial_color This describes the dial color of watches. Text

6 warranty_type This describes the type of warranty for the item. Text

7 warranty_term_length This describes the number of year/month for warranty is Text


valid

8 author This describes the author of book. Text

9 pages This describes the number of pages in the book item. Number

10 format This describes the format of media items. Text

11 bindings This describes the binding information of books. Text

12 director This contains the director name of music/video Text


Cds/Dvds.

13 actor This contains the actor name of music/video Cds/Dvds. Text

14 release_date This describes the release date of music/movie/videos . Date

15 publication_date This describes the publication date of books. Date

16 language This describes the language of BMVD items. Text

17 shipping_information This contains the shipping charges, days etc. of items. Text

18 store_availability This describes if product is available in store of retailer. Text(Y/N)

19 store_pickup This describes the information about if store pick up is Text(Y/N)


available or not.

20 cash_on_delivery This describes if the cash on delivery payment option is Text(Y/N)


available or Not.

21 seller_name Contains name of seller Text

22 seller_url Contains url of seller Text

23 seller_rating Contains rating of seller Number

24 seller_type Contains type of seller Text

25 seller_no_of_reviews Contains no_of_reviews of seller Number

26 item_depth Describes depth of item Text

27 item_height Describes height of item Text

28 item_length Describes length of item Text

29 item_width Describes width of item Text

30 item_weight Describes weight of item Text

31 express_shipping Describes if express shipping option is available or not Text(Y/N)

32 fast_shipping Describes if fast shipping option is available or not Text(Y/N)

33 fast_shipping_cost Describes the cost of fast shipping Text

34 is_free_shipping Describes if free shipping is available Text(Y/N)

35 is_two_day_shipping_eligible Describes of item is eligible for 2 day shipping Text(Y/N)

36 same_day_shipping_cost Describes the cost of same day shipping Text(Y/N)

37 shipping_from Describes source location of item from it is shipped Text

38 standard_shipping Describes if standard shipping is available Text(Y/N)

https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 2/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki
39 standard_shipping_cost Describes the cost of standard shipping Text

40 standard_shipping_date Describes date of standard shipping Date

41 super_fast_shipping Describes if supper fast shipping is available Text(Y/N)

42 super_fast_shipping_cost Describes the cost of supper fast shipping Text

43 shipping_to Describes the destination of shipping Text

Some Important Data-points to be configured for Softlines heavy retailers:

Data points to be configured

brand_name

bread_crumb1

bread_crumb2

bread_crumb3

remaining_bread_crumbs

Product_descriptions

Bullet_points

Additional_product_info

color_name

size_name

customer_star_ratings

no_of_customer_reviews

external_id

has_retail_offer

offering_availability

list_price

our_price

url

Image_urls

title

Parent_sku

website_sku

model

item_package_quantity

manufacturer_name

merchants

merchant_url

merchant_sku

has_3p_offer

variants

is_bestseller

bestseller_category

bestseller_rank

shipping_information

item_sale_status

obs_indicator

shipping_from

material_type

dial_color

fabric_type

warranty_type

warranty_term_length

Some Important data points to be configured for HardLines heavy retailers:

Data-points to be configured

brand_name

bread_crumb1

bread_crumb2

bread_crumb3

remaining_bread_crumbs

Product_descriptions

Bullet_points

Additional_product_info

color_name

https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 3/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki
size_name

customer_star_ratings

no_of_customer_reviews

external_id

has_retail_offer

offering_availability

list_price

our_price

url

Image_urls

title

Parent_sku

website_sku

model

part_number

item_number

item_package_quantity

manufacturer_name

merchants

merchant_url

merchant_sku

has_3p_offer

item_length_width_height_weight

is_bestseller

bestseller_category

bestseller_rank

shipping_information

item_sale_status

obs_indicator

item_depth

item_height

item_length

item_width

item_weight

material_type

warranty_type

warranty_term_length

Some Important Data-points to be configured for Consumables Heavy retailers:

Data-points to be configured

brand_name

bread_crumb1

bread_crumb2

bread_crumb3

remaining_bread_crumbs

Product_descriptions

Bullet_points

Additional_product_info

color_name

size_name

customer_star_ratings

no_of_customer_reviews

external_id

has_retail_offer

offering_availability

list_price

our_price

url

Image_urls

title

Parent_sku

website_sku

https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 4/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki

model

part_number

item_number

item_package_quantity

manufacturer_name

merchants

merchant_url

merchant_sku

has_3p_offer

is_bestseller

bestseller_category

bestseller_rank

shipping_information

item_sale_status

obs_indicator

ingredients

warranty_type

warranty_term_length

item_weight

Some Important data-points to be configured for BMVD Heavy retailers;

Data-points to be configured

bread_crumb1

bread_crumb2

bread_crumb3

remaining_bread_crumbs

Product_descriptions

Bullet_points

Additional_product_info

color_name

size_name

customer_star_ratings

no_of_customer_reviews

external_id

has_retail_offer

offering_availability

list_price

our_price

url

Image_urls

title

Parent_sku

website_sku

model

part_number

item_number

item_package_quantity

manufacturer_name

merchants

merchant_url

merchant_sku

has_3p_offer

Label

studio

publisher

is_bestseller

bestseller_category

bestseller_rank

shipping_information

item_sale_status

obs_indicator

warranty_type

https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 5/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki

warranty_term_length

author

format

binding

director

actor

release_date

publication_date

publisher_label

language

pages

Note:

Breadcrumb1, title should be not null.

Breadcrumb1, Breadcrumb2, Breadcrumb3 should be unique and should not contain brand_name.

List_price should be greater than our_price.

Website_sku should be not null and should be unique.

offering_availability should not be null.

Normalize "-" in brand name when using brand repo.

Major Data Point (DP) like brand_name, title and website_sku etc. Should not contain "|".

Special characters which affects the XML document format (say ‘,”,<,>,[,],{,},&) should be encoded.

For more details about normalization refer: https://w.amazon.com/bin/view/SMT_SOB/Scrape/Beans_for_Scrape_Configuration/ .

Single scrape configuration : https://w.amazon.com/bin/view/SMT_SOB/Scrape/Types_of_Scrape/Single_Scrape_Configuration/ .

Multi-scrape configuration : https://w.amazon.com/bin/view/SMT_SOB/Scrape/Types_of_Scrape/Multi_Scrape_Configuration/ .

Tags:

https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 6/6

You might also like