0% found this document useful (0 votes)
22 views8 pages

52 Scheduling - Cron - Airflow Job and While True

The document discusses two methods for scheduling tasks in Python: using 'while True' for full control within Python and cron jobs for system-level automation. It provides example scripts for both methods, highlights their use cases, limitations, and compares their features. Additionally, it includes an overview of an ETL process using Airflow and MySQL, along with a brief author bio.

Uploaded by

joanantoranjith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views8 pages

52 Scheduling - Cron - Airflow Job and While True

The document discusses two methods for scheduling tasks in Python: using 'while True' for full control within Python and cron jobs for system-level automation. It provides example scripts for both methods, highlights their use cases, limitations, and compares their features. Additionally, it includes an overview of an ETL process using Airflow and MySQL, along with a brief author bio.

Uploaded by

joanantoranjith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Gowtham SB

www.linkedin.com/in/sbgowtham/ Instagram - @dataengineeringtamil

🔁 Python Task Scheduling: while True vs Cron (with


Example)

🧩 PART 1: Scheduling with while True + sleep() in Python


✅ Use Case:

Useful when:

● You want full control inside Python

● You don’t want to use system tools like cron

● You're running in PyCharm, Jupyter, or scripts

🧪 Example Script: run_every_2min.py


import time
from datetime import datetime

def task():
with open("/home/ubuntu/time_log.txt", "a") as f:
f.write(f"Script ran at: {datetime.now()}\n")
print(f"Task ran at: {datetime.now()}")

while True:
task()
time.sleep(120) # Wait 2 minutes

💻 Run This in PyCharm:

1. Create a Python file run_every_2min.py

2. Paste the code above

3. Run it in PyCharm
Gowtham SB
www.linkedin.com/in/sbgowtham/ Instagram - @dataengineeringtamil
4. It will keep running every 2 minutes, printing and logging time

⚠ Limitations of This Method


Feature While True

Background Scheduling ❌ Manual only

Auto-start on reboot ❌ No

Time-based control ✅ Custom logic

Resource-efficient ❌ Keeps script running


forever

PART 2: Scheduling with Cron Job (Linux/macOS Only)

✅ Use Case:

Best for:

● System-level automation

● Tasks like backups, data processing, email jobs

● Script runs in background every 1m, 2h, daily, etc.

📦 Step-by-Step to Schedule with Cron

📌 1. Create Python Script (e.g., print_time.py)


from datetime import datetime

with open("/home/ubuntu/time_log.txt", "a") as f:


f.write(f"Script ran at: {datetime.now()}\n")

🧪 2. Test It Manually
python3 /home/ubuntu/print_time.py
Gowtham SB
www.linkedin.com/in/sbgowtham/ Instagram - @dataengineeringtamil

📂 3. Open Cron Editor


crontab -e

⏰ 4. Add Cron Line (Every 2 minutes)


*/2 * * * * /usr/bin/python3 /home/ubuntu/print_time.py >> /home/ubuntu/cron_debug.log 2>&1

✅ Done! Now your script runs automatically every 2 mins.

🌐 Use This Website to Create Cron Expressions:


👉 Visit: https://crontab.guru

It gives:

● Human-readable meaning of your cron expression

● Helps generate complex patterns easily

For example:

*/2 * * * * → every 2 minutes


0 9 * * 1 → every Monday at 9 AM

✅ Compare Cron vs While True


Feature while True cron

Works on Windows ✅ Yes ❌ No (Linux/macOS only)

Background Execution ❌ Manual handling ✅ Built-in

System Reboot ❌ No ✅ Yes


Persistence

Simple to Start in PyCharm ✅ Very easy ❌ Needs terminal

Best for real-time apps ✅ Yes ❌ No


Gowtham SB
www.linkedin.com/in/sbgowtham/ Instagram - @dataengineeringtamil

✅ Summary
Use while True if:

● You're learning, using PyCharm, or just testing

Use cron if:

● You want system-level background scheduling that just works!

Airflow - Python ETL Automation


Etl_dag.py

from airflow import DAG


from airflow.operators.bash import BashOperator
from datetime import datetime, timedelta

default_args = {
'owner': 'airflow',
Gowtham SB
www.linkedin.com/in/sbgowtham/ Instagram - @dataengineeringtamil
'depends_on_past': False,
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=1),
}

dag = DAG(
'mysql_etl_dag', # DAG name
default_args=default_args,
description='A simple ETL DAG',
schedule_interval=timedelta(minutes=5),
start_date=datetime(2023, 7, 21),
catchup=False,
)

run_etl = BashOperator(
task_id='run_etl',
bash_command='bash /home/ubuntu/wrapper_script.sh ',#give a space after the path
dag=dag,
)

Edl_script.py

import pymysql
import pandas as pd
from datetime import datetime
import os

def fetch_data_from_mysql():
mysql_config = {
'host': 'localhost',
Gowtham SB
www.linkedin.com/in/sbgowtham/ Instagram - @dataengineeringtamil
'user': 'root',
'password': 'root',
'database': 'etl_example'
}

connection = pymysql.connect(**mysql_config)
query = 'SELECT * FROM sample_data'
df = pd.read_sql(query, connection)
connection.close()
return df

def transform_data(df):
df_transformed = df[df['age'] > 30]
return df_transformed

def write_data_to_file(df):
output_dir = '/home/ubuntu/extract'
os.makedirs(output_dir, exist_ok=True)
timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
file_name = f'etl_output_{timestamp}.csv'
file_path = os.path.join(output_dir, file_name)
df.to_csv(file_path, index=False)
print(f'Data written to {file_path}')

def etl_process():
df = fetch_data_from_mysql()
df_transformed = transform_data(df)
write_data_to_file(df_transformed)

if __name__ == "__main__":
etl_process()

Mysql_ddl

CREATE DATABASE IF NOT EXISTS etl_example;

USE etl_example;

CREATE TABLE IF NOT EXISTS sample_data (


id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255),
Gowtham SB
www.linkedin.com/in/sbgowtham/ Instagram - @dataengineeringtamil
age INT,
city VARCHAR(255),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO sample_data (name, age, city) VALUES


('Alice', 30, 'New York'),
('Bob', 25, 'Los Angeles'),
('Charlie', 35, 'Chicago');

INSERT INTO sample_data (name, age, city) VALUES


('kumar', 40, 'New York');

wrapper_script.sh

#!/bin/bash
python3 /home/ubuntu/etl_script.py

About the Author


Gowtham SB is a Data Engineering expert, educator, and content creator with a
passion for big data technologies, as well as cloud and Gen AI . With years of
experience in the field, he has worked extensively with cloud platforms, distributed
systems, and data pipelines, helping professionals and aspiring engineers master the
art of data engineering.

Beyond his technical expertise, Gowtham is a renowned mentor and speaker, sharing
his insights through engaging content on YouTube and LinkedIn. He has built one of
the largest Tamil Data Engineering communities, guiding thousands of learners to
excel in their careers.
Gowtham SB
www.linkedin.com/in/sbgowtham/ Instagram - @dataengineeringtamil
Through his deep industry knowledge and hands-on approach, Gowtham continues to
bridge the gap between learning and real-world implementation, empowering
individuals to build scalable, high-performance data solutions.

𝐒𝐨𝐜𝐢𝐚𝐥𝐬

𝐘𝐨𝐮𝐓𝐮𝐛𝐞 - https://www.youtube.com/@dataengineeringvideos

𝐈𝐧𝐬𝐭𝐚𝐠𝐫𝐚𝐦 - https://instagram.com/dataengineeringtamil

𝐈𝐧𝐬𝐭𝐚𝐠𝐫𝐚𝐦 - https://instagram.com/thedatatech.in

𝐂𝐨𝐧𝐧𝐞𝐜𝐭 𝐟𝐨𝐫 𝟏:𝟏 - https://topmate.io/dataengineering/

𝐋𝐢𝐧𝐤𝐞𝐝𝐈𝐧 - https://www.linkedin.com/in/sbgowtham/

𝐖𝐞𝐛𝐬𝐢𝐭𝐞 - https://codewithgowtham.blogspot.com

𝐆𝐢𝐭𝐇𝐮𝐛 - http://github.com/Gowthamdataengineer

𝐖𝐡𝐚𝐭𝐬 𝐀𝐩𝐩 - https://lnkd.in/g5JrHw8q

𝐄𝐦𝐚𝐢𝐥 - [email protected]

𝐀𝐥𝐥 𝐌𝐲 𝐒𝐨𝐜𝐢𝐚𝐥𝐬 - https://lnkd.in/gf8k3aCH

You might also like