0% found this document useful (0 votes)
2K views

Big Data Analytics Mumbai University

Big Data Analytics book from Mumbai University.

Uploaded by

Gaurav Wankhede
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
2K views

Big Data Analytics Mumbai University

Big Data Analytics book from Mumbai University.

Uploaded by

Gaurav Wankhede
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 95
ponscosen Sy casey ten « eat conmatin S889 arena Pecos CPA ed dag, oO crete asl ga th xp, 7 ame c (interannual tick ah? 7 02 aaa amma asec NWO PH? Ean roe upate rg conmenten aoc ap, 24 can Tangent tea. <2 Austr Rosemerson St, Cv ase common Cltore Fo. 2. Hounconmnitont red on oper ie clan sit ware Pe a = {a atu omni Sj? SED... ete Rromnate Sars. sae connie yt, 2 cn sates ost 620 625, & gee “Wot comand jens (lta coment, RRa 288 a i £ : \ Introduction to Big Data Ineodcon Big Do, By Dua cutee, peso Bi Tron v. Bip Data buess apres. Oat Chang samp ig Ota in Rel, 8 Oss Appear, ‘Seearnng Tope: oncton tg Cats gpleatons ad Dist) INTRODUCTION TO BIG DATA AND HADOOP Firstly We need to know "whats data? (2) Now a day the amount of ata created by vais advanced techologes Uke Soci swung ses, Ecommerce et. s ‘ey large. I erly fiat to store seh he data by wing the ronal data sora tse, (2) Util 2003, te sizeof dn prouced was 5 billion gigabytes. IF ‘is tn stored inthe form of sks may Gil an nie otal fel. 201, te sme roa of dat was ree in every 0 ays and in 2013 was ered in every ten minutes. This is relly wemendous te, __| 100d with CanSeaner Aeeeeeeeeeeeeeeee ae tosesuon to Oat Pera qw.senss 1 jaws aout big dala On & Fede 3) Ins ole, we wil an ad deine common cones elated (0 Big dA. We wij | sce in dp about some of te process and technologie, corey egw inthis fie A What is Big Data ? 1 big bm is mae clon of it ha omnes o gon Arama overt | 2 edit statis ge ad copilot oapnent nopncn eflene o ors 3. Big Da ise esr a, bis mic ae A dt wish corey inpinsen | 4 Momay we nek on dt of se MBQWoIDoe Ete oe ssimam GBQMovs, Cad) fd in Pea Dyes. 108 | Ses sled Big Da | 5 sted at ist 9% of ty’ dt has ben genet epsom % 1.12 Sources of Big Data | Thee ae vse ‘suces f ig data. Now a days in numberof ‘eis sac bug da pet eat Following are he some of 1 Stock Exchange + The daa in fnfrmation aot pecs ad of compares i very ge Socal Mea Data: Te, he share market reguing aus dtl of share of thousands Sale Dt The at ssl ergs cots maton abot ll te accoum holder, thir ost, hat Sey enna ‘whatsapp, there are literally bi lions of users, ra w.2029 005 3 AL ees hremerancon toedcton 9 8), wu sonst ie rot) | routube, ‘Veo sharing poral tke vf which requires 10 3. Video sharing portals ‘Vimeo ee conane mins of videos ea of memory sore Fig 14 Sores big dats 44. Search Engine Data + The search engines like Google and "Yaoo Holds lot mosh of metadstarezading various sites. Transport Data + Transpo ds cousins information about model apaiy, distance and avalabty of various vehicles. 6 Banking Data: The big iat in banking domain lke SBI or CIC bol tare amount of data regarding huge transactions of secouat holes. Di te BIG DATA CHARACTERISTICS FGq. Wat are Characteristics of Big Data? UQ. Dererbe any five characterise of ig Dat, Tar ‘ug. cilin what characterise of Soil Nebr make Be et, Tan xpi ig data along wih SVs ws zoe nat) “Sanned with Camseanner pret sem) tneodon ro. {a Volume ers te volume Le. amount oft tht grow ont te ta volume in Pebyts a) Volo eso amin ata into vale. By ning aces big thea et ves, bsieses may genera even. (9) Veraty fers to te uncertainty of aaable dat. Veaty ‘Sos du toe high volume of da that brings incompleteness sevinconsiseny. (4 Vissatin is the proces of displaying data in chats, grap, map, an ter isl forms. (5) Variety refers ferent data types Le. various data formats ie et lio, videos, (6) Velocity is there which data grows. Social media contributes ‘major te nthe veloc of growing dat, 1) Miraty desis ow quickly information gels spread across _rople to people(P2P) nto. 124 volume ‘+ Asitfotlows fom he mame big dat i sed o refer to enormous snout ofinfoato, ewig san ges ate ad pty ‘Te oT laemet of Tings i creating ex in ™ 8 exponential growth ‘The Yolune of dai projected to change significant inthe gen Ansyes MU SORE tutor Bg Oi) 09. (15) {Daast Rest) ‘Teabyes,Pebyles RecordvAch Tableles Distbuted vw 1.2.2 variety + Variety refers t heteogeneaus sources and the mitre of dn, to structed and unsctred, 1+ ita comes in fret formats ~ from srveted, mame at in tadonal dnbstes wo usted text documents ema, ‘eos, audios, stock ker dia and inal wasactos. This variety of unstucured data poses cena sues for store, sining and analysing daa. + Organizing the data in meaningful way is 90 simple tsk, especialy when the das ise changes pi. + Avother challenge of Big Data precesslag goes beyond the ‘mai volumes a lnreasng velocities of data but aio io ‘manipulating he enormous vay of these ata. = variety { Datain many Forms] Sovcued —Unsienicd «Text = Malin 1.23 veracity + Veracity deseribes whether the dt canbe sted, Vere fers oe | 7 ote wncensity of arbi in, + Hence, Volume is one charctersie which needs to be | * VE@ity ates due to the high volume of dats tat Bonet ‘rsd wh eliag with Big Data, incompleteness and incosseey Hygiene of data in analytes is important esase eerie, you cannot grate the accurcy of your rests ow z229 | Bherwanitenine ow z2en oman Brennan dhe “Sanned with Conseamner 129 Daa Pn. (4, nes — 48 + Becmse d fet ich les mee sss te dt eg Tne eID ioe 4+ Veni is soot eqs ness ET = venty | (Ds indo) Tomes Ary Ate Aly ones tom 0 any i SOUTES i gy erento da 78 SEM, sain om bo Sa Santen crass om] 7 oe we at pom ew | tee eo ee ee a a eeaies an is poe ocean = Prscating educes storage requirements while ‘viding mr espme, acuate ad poise responses ona Antes Sem fnvoduson BpDaa..9. 9.1 1.25 value Ie refers to turming data int value. By turing acceso ip ita {nt vals, businesses may generate even. ‘alu isthe end game After desing volume, velocity, vat, sarah, vrei, and vision ~ which aks a fot of ine, cif and resources ~ you want to be sure your organization is ting abe fom the data For example, data that can be ute o analyze consumer behavior Jn value for your company becase you ean use the research ress to make individualized oes = value {1Dsaieto Money] conetations satin Events % 1.26 Visualization Big data visulztion the proces of doling data in chars, ‘aps, maps and eer visa forms {sus to elp peopl exsily understand and interpret tei dat ata glance, nto cecly show ted and patems tht arise fom thin + Raw stn comes in lifeeat formats, so creating. data visulatons ie process of suheing, managing, and teanforming data into format tht's most usable and meaningful Big Data Visualization mes your data as accessible a possible to even within your ogaiation, whether bey have technical seen we) Brereorctetine Scanned wth Can rp gmgeten 0g 00 PE. (4) fe vialzation ova Resible) Accessible presentation Visual Fors ese 4.27 Wielity «Vly dbs Bow quel Hnfrmaon gts sre aco peopl poe 2 nevOrs tris mess ow quikly dita is 5 usu ene ‘Time isa dna factor long with rate of spread, read and shared 10 cach © viety (Da Speed Thre ee types of ig Daa Analytics + 1 Uae ‘Smewed 3 Semaine 2.13.1 Type et: Unstructured + Aaya st zo fom rth sen ie ssl tt lon aS hy ps Poses muliple challenges. in terms of ‘processing for deriving value out of it. = . poate Ansys Ws Sams) ntoducton Bg Oa “Typical example of unstructured dais, a heterogeneous data source containing 2 combinaon of simple text files, images, videos ike search in Google Engine «Now a dy organizations have wealth of dat avilable with tem tut unforunately thy dort know how to derive valve out of it since this datas ra form or nsiructared Format, “+ Hunan Generated Data Machine Generated Data 4+ Uasinetured ~Example The onpu retuned by Google Search’ ‘a 1.3.1(A) Characteristics of Unstructured Data (a) Disa reiter conforms oa data model nor bas ey tse. {@) Data can ot be sored in the form of rows ad earns asin Databases, (@) Dat does no fellows any seman oes. (4) Data lok any parca fomato seqpence. (6) Dasahas po esl identiable sree. (6) Duet lack of enh sere, tan 9 wsed by computer programs eal. YS 1.3.1(8) Sources of Unstructured Data (2) tnages UPEG, GIF, PNG, et.) () Memos (1) Webpages @) Videos (6) Repos (6) Wor documents nd PowerPoint presentations @ Surveys herrea t= 2020 ons a soa) Fe oa | pu of | and Disadvanta a(c) Advantaae® 12000) Meee pate = advantages er pre wih ks reps format EONS 2 te danint cosine bya fied scheme 3. Vey Rexbl oto absence of sche 4 Dasisporbie 5. isvey sel 4 ean del cay wie btrogeny of sources 1. Tose pe oda have vasey of esies inteligence and ses apleations. 1 pisadvantages 1, nds strand manage unstrcred data de 1 ak of stoma a srt, 2, Ineing te dais fut and err prone due 1 unclear srenre and aot hviagpe-defind tributes. Due to which | seuhrete aent ey aout | suing scr to ai iil ask. 192 Tye #2 :stctued + Ay a te ed ese pee in te teat Sec an + Ont de tn compass ae aie [rence vy ets woe whch | nea ee ut wc nos determining valve out of i. ree Wea ch DOW hg enn, pl es He ‘oui he ons of mpc mye Bat ses ‘loon dbase montguneat ye este ta ‘seo in one example of & ip onta sys) Some davon Dat... 00. (1-11 ‘Structured data isthe da which conforms to data model, has 4 well define srt, follows & consent oder and can be asl accessed and wed bya perso ora computer program + Suctured da uealy stored in welled schemas suchas Database. I is genealytabulr with column and rows that ely deine ts ates. SQL (irctured Query language) is often sed 10 manaee structed data stored in dabases, we 1.3.2(A) Characteristics of Structured Data Duta conforms to a dat nde! and bas easly idenible ata is ordi the form of rows and columns. Example : Database Data is well ogaised 20, Defiiton, Format and Meaaing of ate explcy known. + otaresiesin fie els within record oie. + Sima eites re grouped toate 1 form estions o eases + Bate inthe same group have sane abuts, easy to acest and query, So data can be eal used by oer programs. + ats elements ae aaessbl, so ficient analyse and proces 1. 1.3.2(8) Sources of Structured Data (@) SQLDaabases —@)-Spreadiees sch as Excel @) OLTPSystems 4) Online ferns (6) Sensor sch as OPS or RED as, (6) Network snd Web server logs ) Media devices ws. z2e9 9609 Bhrenneorttin ‘Seanned wth CanScanner spommmepnene process face liy ured Data a 1.3.20) Advan es of Struct oe ave awl ined tte tt Be in ay 1 2. a en be a weve. nove can esl eX fo 4 opetnns ch 6 Upting en deen my de f0 wet secure fom oft uses neligcnce operon such as Data warebousing canbe aly weal | 6 asl ssl incse br ian nrement of dt 1st soa tod sy. ‘Structured -Example agloyes Table [enpoeso[msye tn| Gendr | Doprinart| say nes 1 ex [ame | rruwce | eso 2 ac we | sown | 2som0 d pon |reawe| swes | som a swe [rewue] rwwnce | oro 2% 133 Type #3: Semi structured + Semistructured th hil ype of big dt, Sesstractred data ‘contain bth be fom data Seaisrsured daa persis 0 te det data contining bh the foots mentioned above, tht i, stucted tnd one Suctued and uasiucted es 4 To be press, it refers to the dat hat although has aot Seen clasiid unde a parle reportary (stabs), ye coi ‘ial infomationo tgs that spregt nvidia elements within the dt + Web aplication dat, which unstated, const flog files, ‘easton history Mes ee. "+ Online wansaction procesng systems are bul work with strcrod data wherein tas toed in ean able) ‘+ Semi-structured dams dats tht does tot conform to 2 daa roel but hs some soe. Ick a ied o gd sebema. ie the data tht doesnt resi in arta abate bt that hve some organizational poperies tat mike i easier to analyze. With some processes, we can soe them ia the elation! dase. Dun Ayes Sam. (nvouton oi 7%. 1.5.3(A) Characteristics of Semi-structured Data 1, Dota doesnot conform oa dts model bat bs some site Data can not be stored in the form of rows and earns as ia Duabses 2 Semisuensed data cousins ups and elements (Meads) ‘hich is wed o group dnt nd decide Bow he dt stored. 3 Similar emies re roped together and organized in hierarchy. ies in the same group may of may not bave the same bts ropenies. 4. Does ot contain sficeat meta which makes automation and management of as dificult Sinead ype of he same atibates ina group my fer 6 Due to lack of & well-flned structure, it can not used by ‘compu rograns casi nasa) Bromo “Seanned with CanScanner griroduction to Big Dat)..Pg. ro. (t es = sctured De = | owns 1c) Advantages and Disadvantages of ions Asies Sam) (ntotucton 2 Oss. “+ Persona data sored ina XMI fle (2 Same? Paha Ror /name> MalesIuer> 3 < ‘ime> Seem ‘eclne> Femsle 4 <1ee> ‘ Satish Maven > Male 29 DH 1a DIFFERENCE BETWEEN STRUCTURED, SEM 13 > semi-structured Data STRUCTURED AND UN-STRUCTURED DATA. ‘ DMS Srucruneo ano unstaucrunaD DATA oe seats = [GRAN con bee ce sociedad] 1 pedmeteonity 2 Upswcareloaa 1 2) pai senna ged ee sa 3 Deke ele aa 1 pit er sr! i este! Pesce) scary nucece aay Naesia! ‘5. is suppors users who can aot express their need in SQL. | cs MADE Reme Fee 6 tend ly th be ge ous oo | ore uate one Lack of fixed, rigid schema make it dificult in storage of the aeeetoa: ae raped ts DEMS No = ‘and various | not matured “management 2. ies ip ee da i desk Sac me ‘Senne et ole isto Etre 3. Queries ares fein at compared to structured dat | [Res [Mes [Yeimazors | Yes © santa tample oy gama oe esi ea + Were sesessstddi ass sucwred inform batts | [ally | Risser | Wismore ele | Wismre ‘Sctally sot defined with e.g. a table definition in relaional dependent | then stractmed dtm | Seciida and rd Shin” |ttisoicen [ext | ease |Simcuctaen "|e : =| ae ene atens | 9233 39) (racnanorto is more sealable. ‘Only textual queries ae possible yes Sm 1) vc 1 ig Dat). dats with large volume of oth strctured semi structred ‘and unstructured data, Volume, Velocity and Variety, Veracty and Vale refer to the 3°V characteristics of bi a. ‘ig data ot only refers to large amount of data it refers to extacting meaning data by analyzing the huge amount of complex datasets Wa" Compre big ta analyis with ‘Sr.| Traditional Data T, | Tradioal data i pend | Big data is generated in ‘outside and enterprise level DWo15 TRADITIONAL VS. BIG DATA BUSINESS ‘arn J Inenterrice evel aed 7 oa ; (iq waviasinsln avon? So oe ee GiestyiestoTenbytes, | Peabytes to Zetabytes or Approach. | 16Q. Gpisnin deal Todt! vs ig Osta Business (5 Mar) | +S tem | mimosa ger tay fen race + a ae in cad de wae tae format or feds ina file. For managing and accessing the | 2 dea Stuetored Query Language (SQL is used. | Secrniiritmape medina | “une coie see aa Exabytes. Tradidonal database sysem | Big duis system deals with eas wit srcrured dat | stored, semi siructured and untrue data But big data is genera more frequently nly per ‘Big data souce is distributed and it is managed in 4, | Maina dais generated peru or per day or mare 5 | Trdidonal dita soure is centralized and it is managed iceman form. istibuted form. 6 | Dswimegrionisveryeasy. [Data integration is very diica. 7. | Nomal system coniguaion | High epsem configuration is Aico "ante in dona dt ont d-pocessing application Js cable to process | require to proces big dats. ee eatin data, yun 2229 p39 = ss woe ta te rs | ais very | The size is e/a ‘traditional data size, ee TFrecusoa da tse ss] Special Kino ata be ols 9 | red perfor any | ae eid 10 perform any Se [atte open. Nd of tons et foro en | SP : manipulate data ‘manipulate data, 1s dala model is at schema Ta [ts em eel istic sod an its dynamic, shea aced and iis tai sta rays Sem (itetvcon ip Oat)... > Fraud detection 1+ Froud detection is @ Big Data aplication example for Irsineses which as operations like any typeof claims or transaction processing + Number of times the dtstion of faa is concluded long afer he fac. Athi point the damage has been already done all thas eft to detesse the ham and reve polices to prevent it in fut Ta [ttn sis abe and] Big data is not stable ond nro. known relsionsip. 7S | Tnkon) Gis in| Bigeataisin age volume anagenble oun which becomes wamanngesble Ta [ik & ey oo mage and | cl to manage and snipe eda. manipulate the dt 1s, |i sours incedes ERP ts data courses inlades amaction dua, CRM social media, device dat, raacion da,” fanil | resor dna video, images, a, rzszatona dat, ane, eb anton date, ——— DH_1.6 EXAMPLES OF BIG DATA APPLICATIONS 2H 16 BAMPLES OF BIG DATA APPLICATIONS) ea thi ci ata $9,_Eplinin roe oti a ig. 1.61 Big ata aplentone +The Big Data platforms can anal claims and wansactons of businesses. They en large-scale pres across many transaction or dett anomalous behaviour of @ some use. ‘This eps to avoid tbe fed > 2, rrlopanalytics + An enomoas quantity of logs and wae dt is generated ia 1 soutons and TT deparumets, Many times such am 50 tneramined: organizations simply doot have the manpower ‘or resoure to go trough al ic information “Thee ros i Cera pian shown in Fig 1 a" rst 22a nse estas jek eal lage tiy © preventing oben ava nge TT depart us amps sens eT 2 i sani ie exten err ans Mia 2 och he ah a ee oer ea iy mitt mets, be von rewne lee ey ow ew exams of ra, Webs 2 BE thal By making ene of til ous solions ae ale 0 ee and sf bebavou pale re and proces content sel Socal media anal ‘vis he ep of Socal din we an observe dhe el ine 7 insights ino how the market is esponding (0 products “| re cams ‘Win ely of Bese nih, posible for companis jos Bei prising, prometion, and campaien placemat se optina ress L7G DATA CHALLENGES 1: eel tt +n te iti i fii Stora fone oe + Stee ange 5 ke nt et tn documents, “ + sce a pss ah © ml Blrennno resi at Anas UU Sam 8.7 dovauion Bp Oaa.9. 001-29 1s necessary forthe data to be avalble in an accurate, onlete and timely anne beaut if asin tbe companies information system i tobe used 10 make accurate deisons in time then i becomes necesay for data to be avaabe in this manner. 3. Prlvacy and Security “+ Tes anoter most important challenge with Big Data. Tis sfalleng includes sense, encepuatehnial aswell legal signiicance. Most of the orgiiatons are unsble to maintain regular checks de o lrg amos of data generation. However, it should be necesary to perform security checks and ‘observation in el ime because ts most eae There i some information ofa person which when combined vith exer age data may lead to some facts of 2 person which may be secretive and be might not want the owner to ‘ow his information abou that eon Some ofthe organization colects information of the people Jin oder to add vale to their business. This 18 done by ‘making sighs ao tee ves that they're unaware of Analytical Challenges: ‘There are some huge analytical challenges in big data which ris some main challenges question like how to deal with a problem f dt volune pets to large? ‘+ Orbow tofind out th important dt points? + Orhow tows data tothe Best advantage? ‘These large amount of data 0 which these ype of analysis is to be done can be structured organized dt), semistructured (Semi-oganzed data) or uastacted (unorganized dats. nu 2229 00189 ned wth CanScanner o sain oven gO) PE 9. ‘Tee arto tec ie done 1, Biber incorporate 2 aceine yon which Big dats a ich deision making oes og sive dn vomes nthe analy, 4 Tenet ea uate of ats 1. When tere clleston hs dit comes a et ead vay want ape data sor ‘har be es and cocks, Big data rahe than vig elvan, foros on unity la store os lage smount of data and storage cost Big companies, business lade “is fre aes question hat bow itcan Be ensued that dts vem bow much data woud Be epough for decison making td tert sed dat i accu O20. 3 Fautttolerance 1 Ft lence is another ehaial challenge ad fault tera computing is exreney hard involving intricate algorithms.) 12 Nowadays some ofthe new tecbologies like low! computing aed big ts always intended that whenever the flr occur nage dae shold be with the acceptable threshold that is le tk shold ot eg fom he serach 1 scalablty 1. Big data poets can grow and evolve rpily. The sabi Issue of Big ata as ead toads lod computing, 2, reals to varius challenges like hw to un and execute varios obs 50 that goal of each workload can te achieved os fective at Ay Se no 8g OF 3. ako requires deling withthe system aes in an efficent smanet. Tis leads to bg quesion again tat wha Kinds of sorag devices arto ewe Die _ EXAMPLES OF BIG DATAIN REAL LIFE (2) tn the eaucation Industry ‘The Universi of Alta as more tan 38.00 stents nd an can of, Inthe past when tere were al elutons ame that much daa, some of tem semed wees. Now, sinister an ee analysand dn vials for his dia to daw cat pattems of students revoatoniing te univers operons. ‘ected tention ef (2) tn tne Heatncare Weanble devices and senior have been fatodued in the eather nds which ean vide alse fed to the eerie ‘ea record of pte One such ecology s Ale ‘Apple bss come wp with Apple HealKi, CuK and RescachKit The min gol oem one west oe ad ces ther el-time heal cords nt phones. (2) In Government sector Food and Drop Adiisation (FDA) which russ under be Jstiedon of the Fedral Goveranet of the USA lvenget the ‘tis of big dat tp Gcover pats and sitions to idenify hd exanine te expected or unexpected occurences f fod bed infesons (4) n Media and Entertainment Industry Spo, cn-demsnd mic qreviing parm. wes Big Dea analytes calls dt ron all ts wer aru the ok, ad he he te analyzed ato give infomed muse recommen at suggestions every inva wer TBhrcxaoratcton ee ed with Ca 2223) mo reo Bi Ot “ a 7 Kinde te ems. rdnon Bs = qe, es mse, an Kile oo - enon Eo aay msec seer senor en ae NASA is coletng dt om diferent statis ad over bout (sy rnweather? the pone sop conto, ad oer as f mas or mee ac wwanerearcitt mad] Semone nn me Mae male fa enum coogi ha) wines wet ag Tag wh npc exe oi Ta eae or pf te oA of SAREE FON om eps reaneportaton naustTY Trend er aap anon of tern ees cto vey mip om vey wel Al a ded en edo pedi soy demand Iain ‘ters nd aes hat wil be se ve WP | (2) Banking Sector ‘arous e-moneyladrng sofware sch as SAS AML we ng 1 detect sepsis transitions a aia Aclyses in Bak been a SAS AM fentae esomer ia, Bank of Americ bas ‘steer neha 25 yen. () te marting ‘Arann collet ta about the purchase done by millions opt and the worl, They analyzed the purchase pater 2 vent methods wed by he customer am used the esl 0 Ss ew fers and adverisemens (2) tn Business Site Ned i sing Big Das to understand te user behavior. ‘ype of cote they ie, popular movies on the website, smi cote tata Spgs he user, and which series or movies soe ey inestin, Se or eee Scanned wth Cans Introduction to Big, Data Frameworks piers reseibed SY a Heng Cae Hp Components Hadop Eee, Prema teche Sook What NOGQL? NSOL cat acm ater : Koa ees, apn lr, Clann fy (eit) oe, smarts, Moe etiumig Tope: HOFS ve GF, Nong v= ctor Neat yee eeortaion Ah St ———— OO PH zil_ CONCEPT OF HADOOP_ 3 24.4 WhatieHadoop ? Hadoop is an opensource software Patfrm fr string mse elunes of dit and racing sppicons on clusters (groups) of cammaiy stare. ives ws the mae data storage capabil| assve campatsionl power and he ability to handle ifr! visa ines ob tnt an be a rusting job, waiting jobs ‘as. in eseatinl component to support growing big dt ‘stooges, Geehy suppor forwardbiaking analytics IE sc ac Mackin ening and data ming. Hadoo ‘spat wo bande ditfereot modes of data such ss structured, — nstrcted and semistrctred data. It pves ws the elasticity to colle, proces, and inveique data ta the old daa warebouses concept failed 1 do. 24.2 History of Hadoop ‘+The Hadoop was inrodced by Doug Ctng and Mike Cafuela in 202 bepnning ms the Googe Fle System pope, printed by Google ‘+ Ine year 2002, Doug Cuting and Mike Cafe stated to work on a projet of Apache Nuch Is an open source i. ee web raver software project. 1+ While working on Apacte Nath hey were facing some issoe wi big da To store that hey hve vested ot of money ‘hich becomes the challenging of tat poet fer completion + Duetothie problem appearance of Hadovp came ino existence. + 1n2003, Google presented ile ytem known as GFS (Googe Sle system), is registered dsb He sytem developed 0 Provide efetve acess to a. + In year 2008, Google eloued the concept of whitepaper on Map Rede + This echique makes simpler tbe daa processing on large lasers (gap) + te 2005, Doug Cuning and Mike Cafes preset 2 new fle ‘stem own at NDFS (Nuch Disuibued Fle System), hs le system alo contains Map rece. In 200, Doug Cuting resign Google and oid Yaboo. Based on the Nuch prejet, Dough (Cuting saacance anew poet Hadoop with fle system known ‘as HDFS (HadoopDistrbated Fle Syste). (we. 2a29 9339) uysonsiy coopapoaarem ate. ‘was reused and Doug Cuting gave a Mer his sons toy elephant In 2007, sae aly twn cts 10H ype en se 1 HBL 2 en sn 1 Te 22 and Cael 9 2017 Hao 3.0 aS jon 0.10 1 Halop i es vis post Hoe? ya 213. Features of H a Analysis + Ae ig Data manages v0 be sar nt wel in nate, Hap casters we te or ans of Big Daa, Meanwhile i is ose Troe cal) tht lows oe comping nde a 1 oh tanith sent. Tis concept i cle a a atch eps ineae te prota of Hadop based uta fer Big Dat sgplzans. 1 Sei: Ha by ang extrac Big Das Also, sealing does not rot gplenio oi 2, lt Teerace Hadop nwo asa fit 10 fap daa ono ote caster nodes. So, in the event of lott le faire the dt rocesing cn stil ress data by wg (da ren sor ctr nde, woop clases can ely be scaled 0 any amount ee nodes and thus allows forthe growin of ine adjments 10 opie the wa 24.4 Advantages of Hadoop 1. ast: lo HES the dt dissed over the caster and ape sch away wh Ip i faster eeovey. ven the tok 1 reese daa fe on the se servers, ths reducing He ‘pocesing tine can be efi Seat way o manage the ds I 80 mosis eyes f dain nts and Peta bye in Bours. a Anas Som Trt to ig Dat Fram) P58. 2. Senlable + Hdoop caster ean be extended by us sing nodes inthe clusters failure chee canbe es. 2. Coat Bective + Hadop is open sure and wses commodity hardware 1 siore data, it is eeaper as compared to traionst DMS. 4A. Tough to fallre HDPS has de propery with which ite opiate data over the sewer, 50 if one noe is down some ‘ter network ile happens, thn Hadoo aks the backp dat nd us it Noma, data are replated tree bu thereplaton factors configurable. ww 2.15 challenges of Hadoop 1. Hadcop is complex diseibted system with low-level ‘Applicaton programming interac 12. Specialized skills ar required fo esing Hadoop and preven developers frm fice bringing soins. 4. -Busness lpi and infarct APIS ave no clear separation therefore burdealag come on 9p develope. 4. Automata testing of ead-end soluions 16 wafemsible of teribl, 5. Common dats pater fen requ but des ot support dat steadiness carey. 6 Hao is more thanjust connected tgs, 2. Hadoo ss various colleton of any open sue ejects, Three eens Understanding mnbiple teebaologies and handcoding combination between bem sia. 9. Sigica eto is wasted o simple task data bosons and ETL (Beet, Transfr, Loo. zzz 0) inv. 108g Oat Fram). Pn, regres exe) arg | a Anas MS To, Reale and batch ees umes components 1 iteentprovesing models aye sot dtc bea cos architecture of HadooP eur data tobe stored In speci we 246 Bp the sil pasialy bas MastrSlave Architect pr ig gM ot | ado ribied procesing of data by and is system can be set upon HDS mstbods, In Hadoop, master or slave tectod oro premise. ‘NameNode: NameNode represents all les ised inthe mamespoce. DataNode + DataNode helps you to manage HDFS nade ad allows yout cooperate with is locks. ‘Master Node: The master node allows you to conduct aril and directory which the states of a2 processing of data by se Hadoop MapReduce, ‘Slave node : The sae nodes are the supplementary machines it the Hadoop cluster which pemits you to store dita to conte complex calculations. And ll these save node derives with TSE | “Tracker and DatNode which allows to synchronize the processes sit the NameNode and Jb Tracker comespondingly. tes. Oats Fa... 2. (24) wu sens pbs Ara Tatar ane ries SITET CORE HADOO? COMPONENTS, HAPOOF FCosysTEM we. 2.2.1. Core Hadoop Components ; wa compat ea echt patos copes processing 2d conn ati at proseig “ndoop concepts an opensource PE bic allows computer nwo 1 PON nas thane ere vale CE” are in nme sxe oro ve few nga? SF oa eter og se nese PO Erte Broncrnsice Scanned wth Can “recor component ae descibed 8 f0¥S ‘the Distributed File System: The ethermost important Hatoyp core components ie the idea of the Distributed Syst permits the platform 1 access widely storage de aq the basi foo oad he obtainable data Wel eran, the eens Ts a unig le system because ti tuu-some.)_ iri. 60a Fram). P9 09 ‘Hadoop Common :1¢s ako one ofthe Hadoop cre components and te tool which allow any computer to become past of he FHadoop network unrlatedly of the operating system of the prevent hardware. This modle wes Java tools and parts tht rete viral machine and allows the Hadoop platform to sore ‘ata under its path specifi file system. This component named 25 whic te. | common as it offers the required common fenctonaity, removes the dfferense beween the diferent hardware nodes which may be connected to the network st any time and wore YARN : It is the component which accomplishes all the feformation sources that tore the dats andthe rose the required nays, His a system which sccomplithes the available resources ina network group, as well a schedule the processing tasks to come up with clover solution fr every big dats wanton veiead the individual fle stem of the network ad ongtes and allows machss functionality. The DRS « dep can perform the required dat achievements wie ‘oryag tothe operating system ofthe individual compa ‘ical te network to service superior power and eve fl * the pote of bang to observe withthe diferent compa ‘ose aaa fr we. I ab allows the connection ot ‘en omponets, sch as MapRedc. | MapReduce : MspReduce is another of Hadoop con! res a rsp non, wih ae ca fsa bis dts peatons. vara at cen Te penton) 2 ‘tana sit om frp temas el ecorming the requed anys) the system, =. 2.2.2 Hadoop Ecosystem Overview TiS ek IC ee any Hadoop tem = adoop ecosystem i pltform or framework which asistances in solving the big data problems. It includes diferent components and seve ingesting, toring, azalysng, nd maintaining) inside of ‘Most of the services avilable ln the Hadcop ecosystem which consis the main four core components of Hadoop which inclode DFS, YARN, MspRedice and Common. Hadoop ecosystem os ptewantetececermmen tt lfasns cram oe aia foe the whole o existing 8 of marketable tools and slutions Saat sare am mraz oman Bhrecrneoriiesens a “Seanned wth CanScanner . = ‘Some of the pen source examples are Spat, Hive, Pi 7 Hive, Pig 5 i asi As we hve got some ides about what Hadoop ecosysen 1s, wt i docs, and what ae is components, let concept in el components of Hedoop Ecosystem discuss cat AS we hive sen an overview of Hadop sn ese ein Eon ot we seo es, NOW Weare going ta dss deeply fadoop Components individually and ther specie roles i Seta posing The cmproeat of bp este Soa 4 ‘which runs on java mails! coerce soap. Therese two coo one ae Same HDS tt ae Dt a ‘a! tout Big Ost From) 910.2 rosa updating. In some case of deletion of data they Ttnaly record in is Eat Log file. Dat Node requires Innssiveslorge space duet the operation of eading and writing. “They work accortng othe orders ofthe Name Node. The ta nodes are hardware nthe dstibuted sytem. 2, HIDASE.: It is an opensource framework storing all pes of ta ad oes suport the SQL datakase. They ran ontop of HFS fd they are writen in ava language. Most ofthe companies = them for capri the features ike supporting all os of dn, high sour, se of HBase ables. They play «dame role a analyte! processing. The 2 major compoaents of HBase ae Hace master and Regional Sever, The HBase maser is nswerble for loud balancing i @ Hadoop clase ard conzas the fave cccured. They are answerble for pecforming Fnanapenent rol. The sole ofthe regional server would be & (reer node and responsible for reading, writing tain the cache YARN ¢ It i an imporaat component in te ecosystem and ned as opeting sytem in Hadoop which delivers resource immagement and job scheduling tsk The components a Resource and Node manager, Applicaton manger and contains ‘They avo ac as protectors sess Hadoop casters. They belp a the dynamic alloeston of ester provers andllows maple acces engines. ‘Saoop : too tht ele daa transfer berween HDFS and MYSQL and gives band-on to import and export of din and 3 Na they ave a constr fo fetching end inking data |S Apache Spark ; Tt is ap opensource caster computing ‘poork for data analytes and an import dt procesing gine is writen in Seas abd comes wit poskage standard ve, They ae alio wed by many companies fr thei igh provesing speed and stream prosesing ae wu zoe oman achive rte iowa Aaa Mu Som “Seanned wth CanScanner stot soa Seed aes Pew tyige Path Lane Hsin eet Saoocae ve: san pense Pim sft fr exeaon | eee eed Spent wus ce tame | Sona oes od eciceee eee Sores Semccpe tema "chen norsscesae a 2 1 an APT at | ens in dsb Pras NM Ms ald node which creed by a | Syocnation Couns noe, THY series ie cin I ‘omni orton in stgoes ot the ie 1, One Osc is sje nh Eton woidons in x Hate cote oy a jb i de Sanding Mil jot nce ‘a. 2.2.3. Examples of Hadoop Ecosystem Regarding mapreduce, we can sean exaeple ands such ens Skybox which wis Hadoop to analyse a huge volume of fa Hive ca find simpli 0 Facebook. Frequency of word count ima serfence using mop-edace. MAP performs by taking the count input and perform functions sich as Firing and soring and te Trae Q comlises the resi Hive example on taking students fiom differen sates fom stdeat éaubuies wing various DML commands. 2.2.4 Limitations ease, One ‘sue wth Sell les 2 3. Suppor for Batch Procesing only 4 NoReabime DitaPrecessing 5. Not eration 1. Secuity 9, NoCeching 6. Lateney 8, No Absrction 10, Lengthy Lie of Code > 1, Tse with eal Fes HHadoop is aot suitable (or the stall dia. CHDFS) Hladoop isributed fle sytem wats the capably to profesional “apport the ebirary rein of he smal ils since of ii igh ‘oto design. Sal files are the main problematic in HDFS. A Tal ie is expressively minor thn the HDFS block size (ful {DSMB), I we ae string these vast uber of small le, bea LMDES camot handle thee fs while HDFS is working curly with sll nit of large es by storing large data ws zon oso athe Pare “Sanned with CanScanner cam Apap Sent (roo BBGDUBEST PEI ya ys pnytes -Sen. ee 000 Fam) Fr ves ater an ving sever salle HURTS AT ee} G Laleney fal les, ten the Name wil stburdem Meaney ajo Made framework i tat he comparely stores the smespace of HDFS. | slower so mesntile is supporting the various format, structure > 2. Stow Procesing Spd aden copcy of nloration o tan MapRedace Map {akc he cllection ofthe dt nd decodes iit the ater eto data where the separate elements ae fagented down ato ey-ralue pairs and redoce the output from the pas input and pes ext and MapReduce reqies plenty of the time (0 ecomplish theta ths by increasing the ate. ih Hotoop, with = sport of the pall and distibueg leo te MapRedce procedure the lrg data st. Ther ar, fone tks tat we aed © perform lke Map and Reduce axg thar the MapRedace needs lot of tie to complete these tas ‘tus by increasing he time interval. The Dat i spread ang anled over the chster in MapRedoce Which increases the > 7. Security ee ee aera eee eee ee Hiadoop is challenges in handling the compound application. If > 3. Support for Batch Processing oly the user dosn't know the way 10 enable a platoon who is Haoop supports the uch processing only and it docs na] ‘™anaing the paform tat the data ca bein he ne. recede the seamed data and later complete performance h) At storage and network levels adoop is misig the emerson Sone. The MapRedue framework of the Hadoop does mi) Pat, Which can lead to the mor pot of ones. Hato? jafeenc the memny ofthe Hadoop cluster tothe extreme lev _—_sofors Kerberos authentication, whichis tough to masses | HFS supports access control lists (ACLs) and 2 standard file > 4 ne 4. NoReastie Data Processing | sons model, However, thirpary venos tase coabled Aste Hadoop is forth permissions “ s operon of bach processing, which| gn association to influence te Active Directory Kerberos and tei tou ca ng ext a| LDAP er vein te the oscome. Even tough bch processing is very well: 7 > NoAbstraction ria! fer procesng ado igh volume dependent on the MapRedoee Fdoop does have any Kind of ssration: hs, slo ofthe di that is be aia ing processes snd the computations each process which makes it ‘ower of he sytem bt Bascal an ouput canbe deny aoe | re os the ad ale Fresh ‘ado snot appropri for Relive daa procesing. iat to work. Hae > 9, NoCaching Hadoop i not wellora ‘MapReduce cant store the it 2 adlonl condton which reduces the ed for ishing. In Hadoop, mci dain the memory or Hadop i weld for tized fer te const pocrsing basicaly perormane of te ‘e Hadop dc pet te rpead aow i sun Spelt wee respective ouput of he eae past re Hadoop. Coe ee eee ee eee eee Tea pe mu 2229 90330 name Pain hea” ned wth CanScanner sig own sce puyseni) ne omg Dee Fen). hte 2-1 > 10, Lengthy Line of Code Hadoop as appoints pares of nes genre) BY aout 1.20000 codes of Hine, the he bugs and it will ake “| rnd ofthe time to excel th FOBT £q Eeliniow Hadoop goals ae cones ro distributed fie THDPS and Mapede are two iertant major components of ano, wees HDPS is sf for instep of view nd ManRetce i sf for pograrning concept. To understand the rece bebid seat of Hadop from single-aade fo thousand ‘ameter HDS i very weft I covered the gols of Hadoop a fotows: 1. Handing of large dtaet: As Hoop. soppots dsb somge and processing of large dat set, HFS achitctr is designed astmst etl to stores reuicve age daa. | 21. esl telerance and data replication : In DFS, the dats files | see divide ino bg toeks of dst fr fa olerance ech one Dick i ore oie nodes from wich two nodes ar from sane ick aod oo is fom ferent rack. A block is considered {he amount of daa sored on ech dats nove. The edundiney fan ets robsasy, fat eteton quick recovery of da tod sabi ‘Commodi Hardware HOFS cone thatthe closer will ma ce connote ch at Ie expe or orn | ein Ade igre Hy st Hap “ om aay average commodiy hardware. For alan ad exeution of Hada I doesnot equi any sept ‘compas or highend ara. This educes th veal eos, 4 Data Laclty covet: Data oc Compson li nea enn of moving tase stead of moving dat the 2229530) Brereorenan Besse ee eee eee see ned wth CanScanner ‘computation logic or aplication space . This reduces the tondwikh uation in 3 sytem. HDFS provide interfaces for ations to relate themselves close tothe location where up bws Anan Seon BD i. iz Owa Fam. ra (2 a. (ia a is a «+ "Digital india” program eget introduce inn our country with be ‘sion to make ove India ia a gall empowered society and ‘wuld knowledge economy. us vision goal focused on ree min areas: (]) Digital Infsrucrr asa basic uty to every zen, (i) Governance ad series en demand ad (i) Dial empowement of iens of Ida Subsequeily, to full his vision, aay of diferent Date resources he got inteased. In Blg data huge amount of dt olected and stored whether i fn structured or untre fora or semi-stvtired. This daa thay contains various busines read wanscias xsl images, fv, sarvilance camera videos, Togs and unstreured data from bogs ot messages rom social medi, media! da, banks related wansaction dla, eCoversace daa, media data, defence related da sector. this da get effcey cleaned and then ge analyzed it ean pf ia data visualization for busines trade for various enterprises or orpaiation. ‘Tis gil technology bas make the progress enterprises or Gruanzadons more easier, Das from collected from tweets, fous Bogs and eter sical networks ites ean wef 1 an Cnerprse or egunization to analyze consumer’ view. 1 wil ffl for them to undeniad needs and choices of their ems 23) 0) oe sons. 6vo0 8 The big data suppors five V"s at ‘vay, Veit and Vales 1. Volume : Means the amouat of data it contains. Now ini cial wold oraination may have hue amount of dag sod dl wih hs be ta big data uses the concep Astivted systems so at they can store their dt ay auayeit drought dlbses that ae Sater around iy word and aed vin iteret Vey + In big ds refers to the data transfer fn various digi! ptf sch as online systems, vation sensors, soi medi live web cape, et. AS we a Aoown tat soil media messages get vil in seconds f time, Such scenatio of big data represents to its velocity. ta) some cases, neds o be analysed the dita without trig ® | Vary + Refers to deren types of dita fom many sous it may be in sructied, unstructured or ton scored. Big dus amlyics provides the foiliy Ineo this ta ‘Verse + Means Compesiy which indice thatthe da ses beable Wo tans vin terest multiple data sence seh a8 te cloud and fd esoprepial zones. Managing «| ‘esting shag data i very comples wet, ie tibutes: Volume, Velcy| S Value: ts messi of vis, ble beactts gained by : : oc inv tan nn 7 ME Ho Big Datei ame uacat hee oe fom Big Dut convenag 9 value then it gts ino al ta eS cea Be bp Data a D_23_WORKING WITH APACHE SPARK Se | Soma. (not ig Oat Fram)... (2-1 First, pari intended to enhance, not replace, the Hadoop stack. rom day one, Spark was designed to read and write data from and to HDFS, a6 wel s ober storage systems, such as HBase and Amazoa's $3. As such, Hadoop users can enrich their ‘processing capabilites by combining Spark with Hadcop ‘MapReduce, HBase and otber big data frameworks ‘Second, we have consanly focused on making it as easy as possible for every Hadoop user to take advanage of Spark's capabilities. No matter whether you run Hadoop 1x or Hadoop 20 (YARN), and a0 mater whether you have admiaisuative ples to conigue the Hadoop cluster or not, there is a way for you trun Spark! Ia particular, thee are three ways to deploy Spark na Hadoop cluster standalone, YARN, and SIMR. a san | eon] | Hasep (yaeraow Qoarem eam tie Fie2381 ‘Standalone deployment With the standalone deployment one an saticly allocate resources cm all ora subset of machines ina -Hadoop cluster and run Spark sie by side with Hadoop MR. The ‘ser can the run arbitrary Spark jobs on her HDFS data. Is Simplicity makes his the deployment of choice for many Hagoop eaters Trentannacnnne ad with Ca >_> sem io 6a Fam). Pa.r0 Iadoop wars who RAVE ale, 4+ Hindoop Yarn depleyment deployed ore planing 0 deplyHdop Yar ca simply ng Syst en YARN wid ny retaliation of 2niitty cree reid, This allows users to ssl negate Spark thir Hadop wack ad take advantage of ie fll power of Spa, well as of ter components ung ontop of Sark, Spa ln Mapedce (SIMI For he Hadop se tha a ag rnin YARN yet, acter option, ination to the standlo Aepoymeat, is © vse SIMR to launch Spark jobs ise MapRedue. Wik SIMR, wees can sit experimenting wit Spark and we is sell within a couple of minus ai ovoloading it! This temendously lowers the basier ef eploymen, and tv everyone play wih Spark. Di 24 INTRODUCTION TO NOSGL, NOSQL BUSINESS | DRIVERS 3. 24:1 Introduction to NesQL +A dusts Isa systema collection of data. And a database rmangeneteystem soppots sage and manipulation of dat ich mates data mangeneat es). For example, an online eepooe directory us dastase to stove data of people ke ove eumbers and cer contac dels that can be used by secre prove to manage bling ent eared issues and handle fal dts ete, That means A daabase management system rovies th mechanism to strand eive the dt, Brecneoresans tn Anais (MU-Som 1D (no. gel ET 1 a Fam) 9 9229) ‘Tere ae diferent kinds f management syste ‘RDBMS Our (tations (Oni Asijieat | | ton Database Processing) . ‘Management Ee) Syea) + Neb fr ih tr att fon the reiional dashise management system or RDBMS principles, NoSOL ae he sew set of dtbees tat has emerged recent pasa an alemaive Slain retina dates, ‘Carl Strout introduced the frm NoSQL 1 name hs fe asd database in 1998, + NoSQL does ot repent single pods or tiny bat it Fepresels «group protean various lad dat core for stomge and management, NOSQL isan app wo database management that cn accommo a wide ay ft models focading hey-aloe, documet, cola and gh fom. NNaSQLdotabsse generally means tht i i¢ soneaina, invited, eile nd salable. So. wecan iit op: 4 NoSQL an approaih to dasha den tat provides Nexte feemat for the storage and reel of dus beyond te tran able trctures ou in ata database relates to age dataset sees and mpl on 2 Web ale. wm 2.4.2 Brief History of NOSQL Databases 1998. Calo Soi ase te ten NoSL for Ht ihe opensoarce eatin dita + 2000- Graph daahase Neds nee m2 8.8 tors. 9.00 am) 3. ne (2 = somes 1 24- GoogeBigTabl sane + 205-CauehDB is ced ama is reesed. 1) 2o07- The esearch pope on Amazon Dyan 1 2008- Facebook open sources the Cassada poe ‘np The em NoSOL ws eos 24.3. Why NoSQL? ‘Te onc of No SQL datsbuses became poplar wit inter Got ke Google, Facebook, Amon ot. Who deals id he hme ofS be system espe tne becomes ow When We ie [RDBMS for manive volumes of dla 010 resolve this problem, me ld acl pou syste by pring our exiting ardware bu is ves sa expensive. So ateanve fr tis sue Is to ste faatne fad on mile Hots whenever the Toad increases, his ‘neo naw os aig ot [NoSOL dase re on-eitona o they scle-ou beter han eacaldambses. As ty designed withthe web appliations i rind, Now NoSQL daaase is exactly type pt database that c= unl ll one of se arucured daa usted dita, pil hanging da ad big So, 1 resolve te problems reste to pe ‘lune sd se sete data, NOSQL databases ve emerged. 2.44 CAP Theorem ©@.Vnat 8 CAP Theorem? ow ie i sppicble to OSL! seers? (Mais plas ingot oe in NoSQL daubaes. CAP theorem ils? cal bowers thor whieh sides that i is imponible for # tbe dataset fer mor thn to otf tes punanes. 2 Sem BT ete eB Ol Fa Pg no. (2.2 pou a So ui, sme NESQL dhs te aie ad partion tolerance. While some offer avy and aon tolerance, Bu pation eras common as NaSQL dubs ae isuibted in ate so hae on requiement, we can sono NeSL- attase bas to be wsed. Diteten iypes of NoSQL databases ae svallable based on data modes, © Consistency + Tis means that the data in he dtbse rain consistent er the exeston of en penton + For example, afer an opt operon all cents see the same a == Avaiabity ‘This means that te sytem is alays on (Jersice guarantee sabi), no donnie © partition Tolerance + This means tat he system contines to funtion even the communion among te servers is ure, Le, the serves may be pried into rape soups tat tat commanicte ith oe sath, ussaz9aes3) Bren “Sanned wth CamScanner vi op sn Fo) F3.10. (229 ny sige a3 DEN CAP Feet adhe 99 8 ly meta me, a cent ESOL 2 are cit C- APF caren Me ae dine tae oan CA CP, ar ‘ta ous u.Seme A Snge ce nfo nos sy et, Seana ems Che seo dun my scl Ot Aspens oa oe pti 08 te Teas of he war crises in CAP adits ef ACT sete ane eee 1s CAP, eve cxsseney efs econ of he do alte copes of be sae dt tem in epted Shree ACI, tee eft ht asco i sii emery comin pci he data cea 245 characteristics / Features of NOSQL. 'UQ. Describe characters of « NoSQL database” 1+ NeSQL dats eve flow the elatoal mode ‘Neer poide bes wid tectum cod. ork With lconinedagrgutes or BLOB Does regute objection apps ony upping and dat igen MU Som 2.) ato. o Daa Frm + No comple features ike query languages query planes referential inept jing, ACID [NoSQL dtauss doe eure expensive censng ee and ean run on inexpensive harware, rendering tir deployment cost efiscive, 3. Schema-tree 1+ NoSQL duutoses ae ele scheme or have rele tebe 1+ Dp not rete any srt of defaon af he sem of the ta + Offerseropeneous st of in inthe same domo 4 Simple APE Offers sey toe races for strge ad querying data 4+ APlatlow towel dita manipula & selection methods Bhneonstance a ow sn 98:99) TBhrenasoraaton ed with Ca ig bmn Fr)... 00. (228 ied wit HTTP REST wit gpa Amys Sem 8 ‘exttaed protools mostly 4508 Moy id o standard tases bey AE acd dase rnning a terete Seve + Webens iste Muigle NoSQL asses ca futon fers utsesing and flbver capabilities | (fea ACID concept canbe saciid for saability and hoor Moxly no systonous elation between dstibuted nodes [Agnetronous Maser Replston, peer-to-peer, HDFS Releion 4+ Ontyprovng eves consistency | + Stared Nehng Antti, This erable less eoerdinaton sedbisherdstaton 7 be executed in # stibutes 7%. 2.46 Advantages and Disadvantages of NoSQL © Aavantages of NosQL 1. Selina) 2% SOL sw ety se, Tht meas at ou cn cra he fn ap se by cei ngs Ue BANCFUeS8D Baton te eb NoSQL aac Secu sie Ts met ut you handel by 4, Rattan SOL dae ‘Simple dat motel everjon) 4 Stein oie 5. Relahty &_ Schemes 0 modi of retyping) a 17. Rp development Flexible as it can handle semi-stuctued, unstured and structed at 9. Cheaper thn relational database 10 Creates caching ayer 1. Wide da pe variety 12 Uses lage inary objects or storing large data 13, Botk upload 14, Lower administration 15, Distutedstorage 16, Reshioe analysis & pisadvantages of NosQL 2, Coane SQL 4, Data 5. Nosefeeniainteprty 6. Lackof availa o experise 2. 2.4.7 _bifference betwoen RDBMS and NoSQL. 1, ACID wansactons 3, Cana perform sexehes 1UQ.”Ditferentiate between a ROBMS and NoSQL database. | ‘oso Sr. RDBMS 1. | Have fixed or sine or| Have dynamic schema redefined schema Best sled for Merrcical du storage 2, | Not sot for teraehiea dita storage ional salable [5 [eran sate w-2220 00199 TBhrcntaohttenins nad wth CanScanner yo. to Big Oat Fram).Pp no 4a, im ‘RDBMS ‘NeSQL RS [Fano Aci prepeny | Follows CAP. (Consiteny avalatiiy, anit | tolerance) = | Reaioo ones tanacons with jens) —__| Daabase| NoSQL databases — aoa fears tansacons (ao | OPPO eASEctioNs (ype onl simple tanseton) i 6 | Reaicas! —_—_ dnubase | NoSQL database can manag manages oly srucired | suctived, unstrvtured ag a semi-structured dats 7, | Reload bese ave nse pit of fare with falover % | Reioast —_Daabase TNeSQL databases have os singe pin of failure, [NoSQL Database suppor Supa 8 pomet iey| very snp qey language eas 9 [shee eyed ees to wad td wi sty ssl. 10. | Transactions | ‘enn | Tansactons wetco la a Jocation. Jocations, "| 1. [kssersconpes pos sini anions gic marino ey, 1M is used to handle data coming in high veloc 1. | amples > 9 + MSL, Onsei,Salte, PosgresQL S05 SQL et, rerwerence {8 NosQt Business Drivers ‘The sclemspilotopber Thoms Kohn ccined the term paradigm shit euy a ecuring process. He obered at in sence, where Innovative Mets eae ia tasted paced the world n nonlinear ways. Well se Kuba's cong ofthe paraigm sift as 2 way to think abou and expsin the NoSQL movement and the changes in thought pater, aetectrs, and mtodsemering cody ‘Many orgniaions ste soportng to the sagle CPU reltionst systems that hve filed the need oftheir rgaizations 3 per the requirement Businesses have initited the vale in fst caching and examining huge quantity of autble stand making ret changes fn their osinesos tse te dts tn they obtains alo these Aiivers applies borden on single procesoe aoe model is basis ‘its ess tend and ine extended encour te eration’ SS bed a Fie243 ‘Fig. 243 In hi, we ss how the business divers volume speed, avibiiy, and agiliy create pressue on «single CPU sytem, Tesuling in cracks. Volume ad velocity state 10 the capability (0 wy. zea e190 ‘Seanned wth CanScanner — ceed 3 tesness mdieston ‘see mje a) vole (3) Varley —)_Asilty sins diners for NoSOL 88 @) velsiy Tr vom + Uniosty, he msn factor forcing organizations to ook ening cate of cmmndy processors. ‘od! ound 208, performance problems were elimi by purchases taser pocesirs. Over time, the ability t» speed pte process is 0 leger an oto, Ast hp dens inreaes, the est anno longer dissolve ‘iy coough without chip overeating. This phenomenon, now & the power wall fred system designers to sit thi focus from increasing speeds on single hip to uiag ‘moe poeson woking together ‘The teed sale out (io known ashoviontl scaling), slematiest Bel extng RDBMS {5 10 lavessse big) ‘Many single pocessoe RDBMS cannot meet the demand for online queries on databases created by restime inter. Sad ublicfacing websites. RDBMS frequently index multiple columns ofeach new row, a process that reduces system perfomance When single processor RDBMSs se uid a 3 backend in front of web store, random outbrtin web fe redoce the respoos for everyone, and tuning the system can be expensive when both igh read and wate rouphpat i reed Variabitty CCompaies that sek oeaprre and repor exceptional dat conflts wen atempting 10 use the sigorus daubase Schema srctre imposed by RDBMSS. Fo example, if 8 business unit wants to capure some custom fede for 8 specific castomer, it mat store this infomation even if it oes not apply tal customer ows inthe tbe Adding new columns to RDBMS requis shuting down the system and runing the ALTER TABLE command. Whea ‘he tabaci re, ths proces can affect the avallabity of Sem 7 (ne 2890s Fam). 0 ‘abe han see pst pce moved erations the sytem coting tinea ony. {op Ri eel pocesing whee dts potens are AY Gideon cnr sare processor 6 Agility is biiy to accept change easly and quickly. 2 Veaty Among the vay of agitydimesons sok ab model + Wide te det tac agliy (caw and speek of changing dat modules, remibtons moving away fom Rowe, MANY ‘operational agility (ese models), operational agility (ease seg proces sen ROEMSS. the abit of and speed of changing operonal specs), and — ite dt quickly is abso Progamming apy (ese and speed of application eas developmen) ~ on of the st importa ste ability reweneeams — wz qmsy Bross rates Scanned wth CanScanner 19 890alaFam)..Pp no. usinss gro. 8808 fay seat an aplication 10 econo) gocty and seams 7 Tage amounts of users connet The rst complex prt of buling applications iy [RDBMS is the roves of poling ito and EEG dy ‘ato te dots. 1 your dts fas nested and repented SUBBTOUPS of day you need to cud an object-eltional — yee ‘The resposiiliy of this ayer it to generate the cone cootinaion of Inver, updste, delete and selet sq sreent to move objet dat 10 and from the RDBMS pose | ‘This process i ot simple and is associated withthe large err che en ecg oF sig lens Gray et as ping ees expen sna cog ve wine lng eqs can cae Sonus yest xing sleds ‘sete eet veo by NOSOL dah. Te duis we senate and can be sealed dove They a sonnet pon hangs ely and eat ean lee of an cc This setly has become bu ome business diver as foe NOSQL Bp Data Ayes U-Sem br to 18g st Fan) 9 DiL_2.5__NOSQU DATA ARCHITECTURE PATTERNS UQ. What are the different architectural pattems in NSO? pli Graph data store aed Column Fay Store patterns vith aetant expe NoSQL easbass were bom out ofthe rigidly of wadonsl seston or SQL datbses, hich ate bls, clas, and rows to csublsh relationships scott dita. Developers welcomed NoSQL databases beans they dat eqice an vc chem desig: they ‘were able to go staght te development. Andis this Neil. his adhoc” approach to exsniziog dat, dar has apubly been NoSOL's pretest selling point which cominaes 10 appeal to crgniations that need to sore, rie, and analyze iter unsrostared oc apily changing dats. The data sored in NoSQL fllos any of the foor data srcece pater, "KeyValue Sos Colum fil Bigabe Stores Document toes Groph Sores 25.1 key-value stores Sg a a) ; ‘Identify two applications that can use thispattern. ‘ne of the most basic NaSQU abate models is this model. The hn is colleted inthe ptr of Key-Valoe Pisa the mame lnpis, A seis of erings, inept or caters is pial the key, buitcan lo Be a more advanced frm of dt ei Brosnonman ms 0199 Bhraromaseatin ned with CanScanner om sues or coreated 0 the ihe value is coasted or cole 10 tHe RE. Thy + Baal eyo psoge opel sr norman ete fr ss be where ec Ky sun “ere va may be of any form (vein ONeet Notey (aso, Binary Large Objet BLOB) stings 1). “Applian: This sie of acest is

You might also like