0% found this document useful (0 votes)
40 views

Session 17 Snowflake Snowpipe

Uploaded by

bhaskar1082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

Session 17 Snowflake Snowpipe

Uploaded by

bhaskar1082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

AGENDA:

-------
- WHAT IS CONTINOUS LOADING
- SNOWPIPE
- STEPS IN CREATING PIPE
- SNOWPIPE SYNTAX
- SNOWPIPE DDL
- TROUBLESHOOTING SNOWPIPE
- MANAGING SNOWPIPE

- BULKLOADING means loading large amounts data at one time.


- CONTINOUS DATA LOADING means When ever the data is aviable at the cloud stoage
side means loading small volumne of data continously.

WHAT IS CONTINOUS DATA LOADING:


-------------------------------
- Loading small volume of data in continous manner like for every hour or for
evry 10 minutes.
- Live or real time data
- This ensures users have the latest data for business analysis.
- Snow flake uses snowpipe for continous data laods in to snowflake tables.

SNOWPIPE:
----------

- A pipe is named database object that contains Copy command used to load the
data.
- Snowpipe loads data within minutes after files are added to a stage and
submitted for ingestion.
- Snowpipe uses compute resources provided by snowflake,it is a server less task.
- One time setup.
- Suggested micro file size is 100 - 250 MB.
- Snowpipe uses file loading metadata associated with each pipe object to
privent reloading the same files.

How SNowpipe works:


-------------------
clould storage
notification
file----------------------> stage [aws,azure blob,GCP Cloud
storage]---------------------->Load to table.

Steps in creating snow pipe:


---------------------------
1.Create storage integration Object
2.Create stage object using storage integration object
3.Creat and test Copy command to load the data
4.Create a Pipe by using Copy command.
5.Set up event notification at Cloud Storage provider's End.

Snowpipe Syntax:
----------------
CREATE OR REPLACE PIPE PIPE_NAME
AUTO_INGEST= [TRUE|FALSE]
AS
<Copy_Statement>

SNOW PIPE DDL:


---------------
- CREATE PIPE --> TO Create a Pipe
- ALTER PIPE --> To alter Pipe or to pause or resume pipe.
ALTER PIPE PIPE_NAME PIPE_EXECUTION_PAUSED= TRUE|FALSE
- DROP PIPE --> TO drop the pipe
- DESCRIBE PIPE --> To Describe the Pipe properties,to get ARN
- SHOW PIPES --> To see all the pipes

TROUBLESHOOTING SNOW PIPE


-------------------------

STEP1: CHECK THE PIPE STATUS


STEP2: VIEW THE COPY HISTORY FOR THE TABLE
STEP3: VALIDATE THE DATA FILES

STEP1: Checking the Pipe status:


---------------------------------

select system$pipe_status('pipe_name');

lastreceivedmessage Timestamp:
------------------------------
- Specifies the timestamp of the last event message received from the message
queue.
- If the timestamp is erleir than expected ,this indicates an issue with the
service configuration(i.e Amazon SQS)
- Verify whether any settings were changed in your servcie configuration.

lastfarwardedmessage Timestamp:
-------------------------------
- Specifies the timestamp of the last "crete object" event message that was
farworded to the pipe.
- If this value is not similar to above timestamp,then there is mismatch between
the cloud storage
Path where the new data files are created and the path specified in the
snowflake stage object
- Verify the paths and correct them.

STEP2: View the Copy History:


-----------------------------
- Copy history shows the history of all file loads and errors if any.
- View the Copy history by using below query.

SELECT * FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY


(table_name => 'table_name'
START_TIME => 'timestamp or expression')
);

STEP3: Validate the Data files


------------------------------
- If the load operation encounters errors in the data files,the COPY_HISTORY
table function describes the first error encountered in each file.
- To validate the data files,query the VALIDATE_PIPE_LOAD table.

SELECT * FROM TABLE(INFORMATION_SCHEMA.VALIDATE_PIPE_LOAD


(PIPE_NAME => 'pipe_name',
START_TIME => 'timestamp or expression')
);

MANAGING PIPES:
---------------

- Use DESC pipe_name command to see the pipe properties and the copy command.
- Use SHOW pipes command to see all the pipes.
- We can pause/resume pipes with PIPE_EXECUTION_PAUSED= true/false.
- It is best practice to pause and resume pipes before and after performing below
actions.
- when modifying the stage object.
- when mdofifying file format object if stage is using.
- when modyfying Copy command.
- To modifying the copy command ,recreating pipe is the only possible way
- When you recreate a pipe,all the load history will be dropped.

-----------------------------------------------------------------------------------
---------------------------------------------------------------
-Login into AWS account and S3 bucket and create folder pipes

Amazon S3>Bucket>awss3bucketjana>pipes/csv

//create data bases and schemas if not exists

CREATE DATABASE IF NOT EXISTS MYDB;


CREATE SCHEMA IF NOT EXISTS MYDB.file_formats;
CREATE SCHEMA IF NOT EXISTS MYDB.external_stages;

//alter your storage integration


alter storage integration s3_int
set STORAGE_ALLOWED_LOCATIONS = (
's3://awss3bucketjana/csv/','s3://awss3bucketjana/json/','s3://awss3bucketjana/
pipes/csv/');

(or)

CREATE OR REPLACE STORAGE integration s3_int


TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::555064756008:role/snowflake_access_role'
STORAGE_ALLOWED_LOCATIONS =
('s3://awss3bucketjana/csv/','s3://awss3bucketjana/json/','s3://awss3bucketjana/
pipes/csv/',
COMMENT = 'Integration with aws s3 buckets';

//create a file format object of csv type

CREATE OR REPLACE file format MYDB.file_formats.csv_fileformat


type = csv
field_delimiter = ','
skip_header = 1
empty_field_as_null = TRUE;

//Create a stage object using storage intergation

CREATE OR REPLACE stage mydb.externals_stages.stage_aws_pipes


URL = 's3://awss3bucketjana/pipes/csv/',
STORAGE_INTEGRATION = s3_int
FILE_FORMAT = MYDB.file_formats.csv_fileformat;

//List the files in stage

LIST @mydb.externals_stages.stage_aws_pipes

//Create a table to load the data these files


CREATE OR REPLACE TABLE MYDB.PUBLIC.emp_data
(
id INT,
first_name STRING,
last_name STRING,
email STRING,
location STRING,
department STRING
);

//Create a schema to keep pipe objects

CREATE OR REPLACE SCHEMA MYDB.PIPES;

// Create a pipe
CREATE OR REPALCE PIPE MYDB.PIPES.EMPLOYEE_PIPE
AUTO_INGEST = TRUE
AS
COPY INTO MYDB.PUBLIC.emp_data
FROM @mydb.externals_stages.stage_aws_pipes
pattern = '.*employee.*'

//Describe pipe to get ARN

DESC pipe_employee_pipe;

//GET Notification channel ARN and update the same in event notifications SQS queue

Go to amazon S3 bucket:
--------------------------

Amazons3>Buckets>awss3bucketajana> Create event notification

EVENT_NAME(snowfile_employee)> Prefix-optional(pipes/csv)>sufix-optional(ignore
it)>
EVENT_TYPES(Object creation [enable(All object create events)]>DESTINATION(enable
SQS radio buttion)>
specify SQS Queue[enable enter SQS Queue ARN radio button and notification chaneel
from snow pipe and paste in SQS QUeue in AWS and do save changes.
//Upload the file and verify the data in the table after a minute

SELECT * FROM MYDB.PUBLIC.emp_data;

//Trouble shooting pipes

************************************************************************END********
*************************************************************************

You might also like