0% found this document useful (0 votes)
28 views

Pig Commands

The document provides steps to perform various operations on employee data using Pig commands. These include loading data from local file system to HDFS, loading data into Pig, filtering records based on conditions, finding counts of records by department, calculating total salary and average salary.

Uploaded by

powinik586
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Pig Commands

The document provides steps to perform various operations on employee data using Pig commands. These include loading data from local file system to HDFS, loading data into Pig, filtering records based on conditions, finding counts of records by department, calculating total salary and average salary.

Uploaded by

powinik586
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

=======================pig commands 21-03-2024 ===========================

/home/cloudera/Desktop/1a0523/pig_local/input_data:

---------------------------------------------------

001,Rajiv,Reddy,Technical Manager, 65000,9848022337,Hyderabad

002,siddarth,Battacharya,Hr, 55000,9848022338,Kolkata

003,Rajesh,Khanna,Admin, 45000,9848022339,Delhi

004,Preethi,Agarwal,Manager, 78000,9848022330,Pune

005,Trupthi,Mohanthy,Hr, 65000,9848022336,Bhuwaneshwar

006,Archana,Mishra,Admin, 95000,9848022335,Chennai

load it into hdfs:

-----------------

[cloudera@quickstart ~]$ hdfs dfs -copyFromLocal


/home/cloudera/Desktop/1a0523/pig_local/input_data 1A0523/pig_latin

[cloudera@quickstart ~]$ hdfs dfs -ls -R 1A0523/pig_latin

-rw-r--r-- 1 cloudera cloudera 321 2024-03-20 20:57 1A0523/pig_latin/input_data

[cloudera@quickstart ~]$

1. now load data from hdfs to grunt

---------------------------------

grunt> data2 =load 'hdfs://quickstart.cloudera:8020/user/cloudera/1A0523/pig_latin/input_data'


USING PigStorage(',') as(id, firstname, lastname, designation, salary, phone, city);

2024-03-20 21:19:08,108 [main] INFO org.apache.hadoop.conf.Configuration.deprecation -


fs.default.name is deprecated. Instead, use fs.defaultFS

2024-03-20 21:19:08,108 [main] INFO org.apache.hadoop.conf.Configuration.deprecation -


mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
grunt>

2. Dump the result

------------------

grunt> dump data2;

2024-03-20 21:19:22,342 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in


the script: UNKNOWN

2024-03-20 21:19:24,741 [JobControl] INFO


org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1

2024-03-20 21:19:24,741 [JobControl] INFO


org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1

2024-03-20 21:19:24,742 [JobControl] INFO


org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to
process : 1

Input(s):

Successfully read 6 records (713 bytes) from:


"hdfs://quickstart.cloudera:8020/user/cloudera/1A0523/pig_latin/input_data"

Output(s):

Successfully stored 6 records (379 bytes) in: "hdfs://quickstart.cloudera:8020/tmp/temp-


2129984429/tmp1634684284"

Counters:

Total records written : 6

Total bytes written : 379

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

Job DAG:

job_1710556796484_0016
output:

-------

(001,Rajiv,Reddy,Technical Manager, 65000,9848022337,Hyderabad)

(002,siddarth,Battacharya,Hr, 55000,9848022338,Kolkata)

(003,Rajesh,Khanna,Admin, 45000,9848022339,Delhi)

(004,Preethi,Agarwal,Manager, 78000,9848022330,Pune)

(005,Trupthi,Mohanthy,Hr, 65000,9848022336,Bhuwaneshwar)

(006,Archana,Mishra,Admin, 95000,9848022335,Chennai)

3.display details of preethi(FILTER):

---------------------------------------

grunt> preethi_data = FILTER data2 BY firstname=='Preethi';

2024-03-20 21:27:57,797 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 1 time(s).

Input(s):

Successfully read 6 records (713 bytes) from:


"hdfs://quickstart.cloudera:8020/user/cloudera/1A0523/pig_latin/input_data"

Output(s):

Successfully stored 1 records (69 bytes) in: "hdfs://quickstart.cloudera:8020/tmp/temp-


2129984429/tmp-1605779950"

Counters:

Total records written : 1

Total bytes written : 69

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

Job DAG:
job_1710556796484_0017

output:

grunt> count_emp = foreach group_data generate group.designation as department,


COUNT(data2.designation) ;

2024-03-20 21:37:26,605 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> dump count_emp;

-------

(004,Preethi,Agarwal,Manager, 78000,9848022330,Pune)

4. display the details pf persons who are working in admin department:

----------------------------------------------------------------------

grunt> admin_data = FILTER data2 BY designation=='Admin';

2024-03-20 21:30:28,837 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt>

grunt> dump admin_data;

2024-03-20 21:30:36,555 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 1 time(s).

Input(s):

Successfully read 6 records (713 bytes) from:


"hdfs://quickstart.cloudera:8020/user/cloudera/1A0523/pig_latin/input_data"

Output(s):

Successfully stored 2 records (125 bytes) in: "hdfs://quickstart.cloudera:8020/tmp/temp-


2129984429/tmp-240494981"

Counters:
Total records written : 2

Total bytes written : 125

Spillable Memory Manager spill count : 0

Total bags proactively spilled: 0

Total records proactively spilled: 0

Job DAG:

job_1710556796484_0018

output:

-------

(003,Rajesh,Khanna,Admin, 45000,9848022339,Delhi)

(006,Archana,Mishra,Admin, 95000,9848022335,Chennai)

5.display the count of employees department:

---------------------------------------------

grunt> group_data = group data2 by (firstname, designation);

2024-03-20 21:34:58,422 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt>

grunt> count_emp = foreach group_data generate group.designation as department,


COUNT(data2.designation) ;

2024-03-20 21:37:26,605 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> dump count_emp;


output:

------

(Technical Manager,1)

(Admin,1)

(Admin,1)

(Manager,1)

(Hr,1)

(Hr,1)

empcount

--------

grunt> group_data = group data2 all;

2024-03-20 21:47:18,975 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt>

grunt> dump group_data;

grunt> empcount= foreach group_data generate COUNT(data2.salary);

2024-03-20 21:48:51,887 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> dump empcount;

(6)
6.display the total salary given by company:

-------------------------------------------

grunt> group_data = group data2 all;

2024-03-20 21:42:17,631 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt>

grunt> totalsalary = foreach group_data generate COUNT(data2.salary) ;

2024-03-20 21:43:12,422 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt>

grunt> dump totalsalary;

2024-03-20 21:43:28,763 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in


the script: GROUP_BY

grunt>

grunt> empcount= foreach group_data generate SUM(data2.salary);

2024-03-20 21:51:19,221 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> totalsalary= empcount;

2024-03-20 21:51:31,529 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> dump totalsalary;

(403000.0)
7.display avg salary

--------------------

grunt> group_data = group data2 all;

grunt> avgsalary= foreach group_data generate AVG(data2.salary);

2024-03-20 21:53:38,391 [main] WARN org.apache.pig.PigServer - Encountered Warning


IMPLICIT_CAST_TO_CHARARRAY 2 time(s).

grunt> dump avgsalary;

(67166.66666666667)

8. department wise emp count:

----------------------------

grunt>

grunt> groupdata= Group data2 by designation;

grunt>

grunt> ecount= FOREACH groupdata GENERATE group, data2.designation, COUNT(data2);

grunt>

grunt> dump ecount;

output:

-------

(Hr,{(Hr),(Hr)},2)

(Admin,{(Admin),(Admin)},2)

(Manager,{(Manager)},1)

(Technical Manager,{(Technical Manager)},1)

grunt>
9.separate first name and last name:

------------------------------------

name_sep= FOREACH x GENERATE TOKENIZE(name);

DUMP name_sep;

10. find wordcount of name:

---------------------------

You might also like