5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.
WebHome) - XWiki
Amazon Confidential
Scrape Configuration Major and Minor Data points
Primary Owner smt-ops (LDAP)
Last modified 8 months ago by vkams.
List of Major and minor data-points to be scraped .
Major data-Points:
SL.No Major Data-point Description Additional Restrictions Some of the uses in downstream UseFull WIKI LINKS
1 brand_name Brand name of item scraped. Should not contain Junk keyword Useful to decided valid/invalid
, which are not valid brands. selection in downstream processes ,
if brand coverage is less try
implement BFT[Brand from Title if
applicable]
2 bread_crumb1 First bread crumb for an item Should not contains Home, Useful In Categorization to decide
retailer name, title, brand,model, Gls and product family
numbers and should not be
duplicate.
3 bread_crumb2 Second bread crumb for an item Should not contains Home, Useful In Categorization to decide
retailer name, title, brand, model, Gls and product family
numbers and should not be
duplicate
4 bread_crumb3 Third bread crumb for an item Should not contains Home, Useful In Categorization to decide
retailer name, title, brand, model, Gls and product family
numbers and should not be
duplicate
5 remaining_bread_crumbs All remaining bread crumb for an item Should not contains Home, Useful In Categorization to decide
retailer name, title, brand, model, Gls and product family
numbers and should not be
duplicate
6 Product_descriptions Product descriptions of item scraped Useful in reverse mapping
7 Bullet_points Bullet points of item scraped Useful in reverse mapping
8 Additional_product_info Use to capture any other provided product Useful in reverse mapping
content. Please save all information
provided in tables.
9 color_name Color of the item Useful in reverse mapping
10 size_name Describes the size of an item Useful in reverse mapping
11 customer_star_ratings The total ratings the item has obtained Useful in deciding Head Selection by https://w.amazon.com/bin/view/SMT_Head_Selection_Parity_Program/
based on user opinions SMPear
12 no_of_customer_reviews Number of customers reviewed an item should be number only . if it is 0 Useful in deciding Head Selection by
then make it blank. should not SMPear
contains value like 'NA','null' or
any other characters.
13 external_id UPC,EAN,GTIN,GTIN13,ISBN can be should be 8-14 character Useful in reverse mapping
scraped
14 has_retail_offer if the seller/merchant of item is retailer Useful in deciding retail selection
then it should be 'Y', if any 3rd party seller
then 'N', if seller/merchant info is not
available then ''(blank). If item is
discontinued or it is Parent product in case
of variants then 'D'.
15 offering_availability Tells us if the item is 'In Stock' or 'Out of Useful to know if item is
Stock' purchasable
16 list_price In case of 2 price[before discount, after can be blank if single price is Useful to know the Price without
discount] then it should be price before given without discount. discount
discount.
17 our_price In case of 2 price[before discount, after Should not be null for In stock Useful to know the selling Price
discount] then it should be price after product.
discount. In case of single price it should
contain the single price.
18 url URL of the item To identify the product
19 Image_urls the url of the item's images and related Useful in Categorization or reverse
thumbnails mapping
20 title Short title of the item Should not be null. Useful In deciding head selection,
exclusion.
21 Parent_sku In case variants , it should uniquely should be unique for parent Useful for variant re-grouper https://w.amazon.com/bin/view/SMT/SelectionAddition/SGMBusinessLo
identify the parent product products and should not be null
[for variants selection]
22 website_sku used to uniquely identify items scraped should be unique and should not useful to identify the product
from a competitor be null uniquely
23 model Model of item scraped Useful in reverse mapping
24 part_number Part number of item scraped Useful in reverse mapping
25 item_number model number of the item as per the Useful in reverse mapping
manufacturer's specification
26 item_package_quantity This contains the number of items for the should be numeric Useful to know the quantity of item
product
27 manufacturer_name name of the manufacturer of the item; not Useful for reverse mapping, item
the brand reconcile , item information
Enrichment
28 merchants It contains the seller name of an item Useful to know the seller of item
29 merchant_url The URL of the merchants available for Useful to know of the seller url
the product]
https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 1/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki
30 merchant_sku Gives the unique code of the merchants Useful in identifying seller uniquely
available for the product
31 has_3p_offer it is used to decide if the product is sold by Useful for 3p reporting https://w.amazon.com/bin/view/SMPEAR_3P_REPORTING/Design/
3rd party seller. Can be 'Y','','N','D'.
32 item_length_width_height_weight Used when a combined string is available Useful for reverse mapping.
in product page
33 Label Associated with music, CDs, DVDs. For Useful for item reconcile, clustering https://w.amazon.com/bin/view/SMT/External_Catalog/EC_IMR/BrandE
example: Columbia, Universal and enrichment for BMVD
34 studio Associated with movie CDs and DvDs - Useful for item reconcile, clustering https://w.amazon.com/bin/view/SMT/External_Catalog/EC_IMR/BrandE
the production studio, for example: Warner and enrichment for BMVD
Home Video
35 publisher It contains Publisher name of the book Useful for item reconcile, clustering https://w.amazon.com/bin/view/SMT/External_Catalog/EC_IMR/BrandE
and enrichment for BMVD
36 is_bestseller To identify if product is Bestseller. Useful in deciding head selection https://w.amazon.com/bin/view/SMT/Bestsellers/Bestseller_Operation_D
37 bestseller_category Bestseller category of item scraped Useful in deciding head election https://w.amazon.com/bin/view/SMT/Bestsellers/Bestseller_Operation_D
38 bestseller_rank Bestseller rank of item scraped Useful in deciding head election https://w.amazon.com/bin/view/SMT/Bestsellers/Bestseller_Operation_D
39 shipping_information Shipping information of item scraped Useful to improve delivery
experience
40 item_sale_status Item_Sale status of item scraped Please refer the external wiki link to https://w.amazon.com/bin/view/ObsoleteSelectionIdentification/
understand it clearly.
41 obs_indicator To identify if product is obsolete Please refer the external wiki link to https://w.amazon.com/bin/view/ObsoleteSelectionIdentification/
understand it clearly.
Minor Data points:
SL.No Minor Data-point Description Possible value type
1 country_of_origin This contains the information of the country from which Text
the product has been originated
2 ingredients This describes the ingredient information of the items. Text
[ingredient used in manufacturing the item]
3 material_type This describes the material information of the items. Text
4 fabric_type This describes the type of fabrics used in clothing items. Text
5 dial_color This describes the dial color of watches. Text
6 warranty_type This describes the type of warranty for the item. Text
7 warranty_term_length This describes the number of year/month for warranty is Text
valid
8 author This describes the author of book. Text
9 pages This describes the number of pages in the book item. Number
10 format This describes the format of media items. Text
11 bindings This describes the binding information of books. Text
12 director This contains the director name of music/video Text
Cds/Dvds.
13 actor This contains the actor name of music/video Cds/Dvds. Text
14 release_date This describes the release date of music/movie/videos . Date
15 publication_date This describes the publication date of books. Date
16 language This describes the language of BMVD items. Text
17 shipping_information This contains the shipping charges, days etc. of items. Text
18 store_availability This describes if product is available in store of retailer. Text(Y/N)
19 store_pickup This describes the information about if store pick up is Text(Y/N)
available or not.
20 cash_on_delivery This describes if the cash on delivery payment option is Text(Y/N)
available or Not.
21 seller_name Contains name of seller Text
22 seller_url Contains url of seller Text
23 seller_rating Contains rating of seller Number
24 seller_type Contains type of seller Text
25 seller_no_of_reviews Contains no_of_reviews of seller Number
26 item_depth Describes depth of item Text
27 item_height Describes height of item Text
28 item_length Describes length of item Text
29 item_width Describes width of item Text
30 item_weight Describes weight of item Text
31 express_shipping Describes if express shipping option is available or not Text(Y/N)
32 fast_shipping Describes if fast shipping option is available or not Text(Y/N)
33 fast_shipping_cost Describes the cost of fast shipping Text
34 is_free_shipping Describes if free shipping is available Text(Y/N)
35 is_two_day_shipping_eligible Describes of item is eligible for 2 day shipping Text(Y/N)
36 same_day_shipping_cost Describes the cost of same day shipping Text(Y/N)
37 shipping_from Describes source location of item from it is shipped Text
38 standard_shipping Describes if standard shipping is available Text(Y/N)
https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 2/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki
39 standard_shipping_cost Describes the cost of standard shipping Text
40 standard_shipping_date Describes date of standard shipping Date
41 super_fast_shipping Describes if supper fast shipping is available Text(Y/N)
42 super_fast_shipping_cost Describes the cost of supper fast shipping Text
43 shipping_to Describes the destination of shipping Text
Some Important Data-points to be configured for Softlines heavy retailers:
Data points to be configured
brand_name
bread_crumb1
bread_crumb2
bread_crumb3
remaining_bread_crumbs
Product_descriptions
Bullet_points
Additional_product_info
color_name
size_name
customer_star_ratings
no_of_customer_reviews
external_id
has_retail_offer
offering_availability
list_price
our_price
url
Image_urls
title
Parent_sku
website_sku
model
item_package_quantity
manufacturer_name
merchants
merchant_url
merchant_sku
has_3p_offer
variants
is_bestseller
bestseller_category
bestseller_rank
shipping_information
item_sale_status
obs_indicator
shipping_from
material_type
dial_color
fabric_type
warranty_type
warranty_term_length
Some Important data points to be configured for HardLines heavy retailers:
Data-points to be configured
brand_name
bread_crumb1
bread_crumb2
bread_crumb3
remaining_bread_crumbs
Product_descriptions
Bullet_points
Additional_product_info
color_name
https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 3/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki
size_name
customer_star_ratings
no_of_customer_reviews
external_id
has_retail_offer
offering_availability
list_price
our_price
url
Image_urls
title
Parent_sku
website_sku
model
part_number
item_number
item_package_quantity
manufacturer_name
merchants
merchant_url
merchant_sku
has_3p_offer
item_length_width_height_weight
is_bestseller
bestseller_category
bestseller_rank
shipping_information
item_sale_status
obs_indicator
item_depth
item_height
item_length
item_width
item_weight
material_type
warranty_type
warranty_term_length
Some Important Data-points to be configured for Consumables Heavy retailers:
Data-points to be configured
brand_name
bread_crumb1
bread_crumb2
bread_crumb3
remaining_bread_crumbs
Product_descriptions
Bullet_points
Additional_product_info
color_name
size_name
customer_star_ratings
no_of_customer_reviews
external_id
has_retail_offer
offering_availability
list_price
our_price
url
Image_urls
title
Parent_sku
website_sku
https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 4/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki
model
part_number
item_number
item_package_quantity
manufacturer_name
merchants
merchant_url
merchant_sku
has_3p_offer
is_bestseller
bestseller_category
bestseller_rank
shipping_information
item_sale_status
obs_indicator
ingredients
warranty_type
warranty_term_length
item_weight
Some Important data-points to be configured for BMVD Heavy retailers;
Data-points to be configured
bread_crumb1
bread_crumb2
bread_crumb3
remaining_bread_crumbs
Product_descriptions
Bullet_points
Additional_product_info
color_name
size_name
customer_star_ratings
no_of_customer_reviews
external_id
has_retail_offer
offering_availability
list_price
our_price
url
Image_urls
title
Parent_sku
website_sku
model
part_number
item_number
item_package_quantity
manufacturer_name
merchants
merchant_url
merchant_sku
has_3p_offer
Label
studio
publisher
is_bestseller
bestseller_category
bestseller_rank
shipping_information
item_sale_status
obs_indicator
warranty_type
https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 5/6
5/24/22, 10:14 AM Scrape Configuration Major and Minor Data points (Operation.SMT.Scrape.Configuration.Data-points.WebHome) - XWiki
warranty_term_length
author
format
binding
director
actor
release_date
publication_date
publisher_label
language
pages
Note:
Breadcrumb1, title should be not null.
Breadcrumb1, Breadcrumb2, Breadcrumb3 should be unique and should not contain brand_name.
List_price should be greater than our_price.
Website_sku should be not null and should be unique.
offering_availability should not be null.
Normalize "-" in brand name when using brand repo.
Major Data Point (DP) like brand_name, title and website_sku etc. Should not contain "|".
Special characters which affects the XML document format (say ‘,”,<,>,[,],{,},&) should be encoded.
For more details about normalization refer: https://w.amazon.com/bin/view/SMT_SOB/Scrape/Beans_for_Scrape_Configuration/ .
Single scrape configuration : https://w.amazon.com/bin/view/SMT_SOB/Scrape/Types_of_Scrape/Single_Scrape_Configuration/ .
Multi-scrape configuration : https://w.amazon.com/bin/view/SMT_SOB/Scrape/Types_of_Scrape/Multi_Scrape_Configuration/ .
Tags:
https://w.amazon.com/bin/view/Operation/SMT/Scrape/Configuration/Data-points/ 6/6