0% found this document useful (0 votes)
22 views9 pages

Shell Script That Saved Me 5 Hours A Week

The document describes a shell script that automates daily health checks for database servers, saving the author over 5 hours a week. It outlines the repetitive tasks performed manually and how the script consolidates these into one automated process, including logging and alert notifications. The guide also provides step-by-step instructions for creating and scheduling the script, emphasizing the importance of automation in system administration.

Uploaded by

charanmeher90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Shell Script That Saved Me 5 Hours A Week

The document describes a shell script that automates daily health checks for database servers, saving the author over 5 hours a week. It outlines the repetitive tasks performed manually and how the script consolidates these into one automated process, including logging and alert notifications. The guide also provides step-by-step instructions for creating and scheduling the script, emphasizing the importance of automation in system administration.

Uploaded by

charanmeher90
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Shell Script That Saved Me 5 Hours a Week

(A Practical Guide for DBAs & Sysadmins)

In the world of system and database administration, time leaks often go unnoticed.
We fix the same issue over and over.
We run the same checks manually.
We write the same SQLs from scratch.

Here’s how one shell script I wrote reclaimed 5 hours a week , and how you can build your
own.

Problem: Repetitive Daily Health Checks

Every morning, I was manually:

• Logging into multiple Linux DB servers


• Checking disk space, CPU load, DB listener status
• Looking for error logs or failed backups
• Running Oracle health checks (AWR/ASH if needed)

~45 minutes daily. 5+ hours a week. Gone.

Solution: One Unified Shell Script

I wrote a single script that:

• Logs into multiple remote DB servers via SSH


• Runs all checks automatically
• Saves output to a local timestamped log
• Sends me an email or Slack/Telegram notification only if something looks off
Below is a ready-to-use Bash script that automates the most common daily checks for a fleet of
Oracle or PostgreSQL servers, logs the results, and only notifies you when something crosses a
threshold. Copy it into a file such as daily_db_healthcheck.sh, make it executable (chmod +x …), and
schedule it with cron.

The following steps assume you are logged in to the jump host where the script will live and
that your user has sudo or more privileges.
Step Command / keys What to do
1. Open a new
vi daily_db_healthcheck.sh This starts vi on an empty file.
file
2. Enter Insert
Press i You can now type or paste text.
mode
- In most terminals: Shift + Insert or right-
3. Paste the
click → Paste- In macOS Terminal: Cmd The full Bash script below appears in the file.
script
+V
:wq = write + quit. You return to the shell
4. Save and quit Press Esc, then type :wq and hit Enter
prompt.
5. Make it Sets the execute bit so cron (and you) can run
executable chmod +x daily_db_healthcheck.sh it.
Verify that logs appear in /var/log/db_health
6. Test-run once ./daily_db_healthcheck.sh
and that no errors show.
7. Open your If prompted for an editor, select vi (usually
user’s crontab crontab -e option 1).
Press i to enter Insert mode, then paste:
Runs the script at 06:00 Monday–Friday and
8. Add the 0 6 * * 1-5
appends output to a rotating log. Adjust the
scheduled line /path/to/daily_db_healthcheck.sh >> absolute path if needed.
/var/log/db_health/[Link] 2>&1
9. Save and quit
Press Esc, type :wq, hit Enter Cron installs the new job immediately.
crontab
10. Verify crontab -l Confirm the entry is listed.

#!/usr/bin/env bash
######################################################################
##########
# daily_db_healthcheck.sh
# Automates routine DB server health checks and sends an alert
summary.
#
# Saves ~45-60 minutes per weekday by:
# • Logging in to each host once
# • Gathering CPU, memory, disk, listener, backup, and DB session
metrics
# • Storing a timestamped report locally
# • Emailing or posting to Slack only if any threshold is breached
#
# Requirements:
# • Password-less SSH (keys) from the jump host to all DB hosts
# • `mailx` (or `sendmail`) for email alerts OR a Slack webhook URL
# • Bash 4+ on the jump host
######################################################################
##########

set -Eeuo pipefail

############################
# CONFIGURE THESE SECTIONS #
############################
# 1. Hosts to check
HOSTS=("db1" "db2" "db3")

# 2. Email settings (comment out if using Slack)


ALERT_EMAIL="dba_team@[Link]"
MAIL_SUBJECT_PREFIX="[DB-Health-Alert]"

# 3. Slack settings (comment out if using email)


#
SLACK_WEBHOOK="[Link]
"

# 4. Thresholds
CPU_THRESHOLD=80 # percent
MEM_THRESHOLD=80 # percent used
DISK_THRESHOLD=80 # percent used on /u01 and /

SESSIONS_THRESHOLD=250 # Oracle or PostgreSQL sessions


BACKUP_MAX_AGE_HOURS=24 # hours since last successful backup
# 5. Local log directory
LOG_DIR="/var/log/db_health"
mkdir -p "$LOG_DIR"

############################
# INTERNAL FUNCTIONS #
############################

send_alert() {
local host="$1"
local message="$2"

if [[ -n "${SLACK_WEBHOOK:-}" ]]; then


curl -s -X POST --data-urlencode \
"payload={\"text\":\"${host}: ${message}\"}" \
"$SLACK_WEBHOOK" > /dev/null
else
echo "$message" | mailx -s "${MAIL_SUBJECT_PREFIX} ${host}"
"$ALERT_EMAIL"
fi
}

check_host() {
local host="$1"
local tmp_log
tmp_log="$(mktemp)"

{
echo "##### $(date '+%F %T') HEALTH CHECK for ${host} #####"

############ CPU and MEM ############


read cpu mem <<<"$(ssh "$host" \
"awk -v cpu_t=$CPU_THRESHOLD -v mem_t=$MEM_THRESHOLD '
/Cpu/ {sub(/%/,\"\", \$2); cpu=\$2}
/Mem/ {mem=($3/$2)*100}
END {printf(\"%d %d\", cpu, mem)}' <(top -b -n1 | head -5))"

echo "CPU Used: ${cpu}%"


echo "Mem Used: ${mem}%"

############ DISK ############


disk_alert=0
while read -r pct mount; do
pct_clean=${pct//%/}
echo "Disk ${mount}: ${pct_clean}%"
[[ "$pct_clean" -ge "$DISK_THRESHOLD" ]] && disk_alert=1
done < <(ssh "$host" "df -hP | awk '/\/u01|\/\$/{print \$5,
\$6}'")

############ LISTENER ############


listener_status=$(ssh "$host" "lsnrctl status | grep -qi
'listener.*running' && echo OK || echo DOWN")
echo "Listener: ${listener_status}"

############ DB SESSIONS ############


sessions=$(ssh "$host" "(echo \"set heading off feedback off; \
select count(*) from v\\$session where status='ACTIVE';\" |
sqlplus -s / as sysdba) 2>/dev/null" \
|| echo 0)
sessions=${sessions//[^0-9]/}
echo "Active Sessions: ${sessions}"

############ BACKUP ############


backup_hours=$(ssh "$host" "stat -c '%Y'
/path/to/last_successful_backup.log 2>/dev/null || echo 0")
if [[ "$backup_hours" -gt 0 ]]; then
backup_age=$(( ($(date +%s) - backup_hours) / 3600 ))
else
backup_age=$((BACKUP_MAX_AGE_HOURS + 1))
fi
echo "Backup Age (h): ${backup_age}"
} > "$tmp_log"

cp "$tmp_log" "$LOG_DIR/${host}_$(date +%F_%H%M).log"

############################
# EVALUATE ALERT CONDITIONS
############################
local alert_msg=""
if (( cpu >= CPU_THRESHOLD )); then alert_msg+="
CPU=${cpu}%"; fi
if (( mem >= MEM_THRESHOLD )); then alert_msg+="
MEM=${mem}%"; fi
if (( disk_alert )); then alert_msg+="
DISK>=${DISK_THRESHOLD}%"; fi
if [[ "$listener_status" != "OK" ]]; then alert_msg+="
LISTENER_DOWN"; fi
if (( sessions >= SESSIONS_THRESHOLD ));then alert_msg+="
SESSIONS=${sessions}"; fi
if (( backup_age > BACKUP_MAX_AGE_HOURS )); then alert_msg+="
BACKUP_AGE=${backup_age}h"; fi

if [[ -n "$alert_msg" ]]; then


send_alert "$host" "ALERT:${alert_msg}\nSee log:
$LOG_DIR/${host}_$(date +%F_%H%M).log"
fi

rm -f "$tmp_log"
}

############################
# MAIN LOOP #
############################

for host in "${HOSTS[@]}"; do


echo "=== Checking ${host} ==="
check_host "$host" &
done

wait
echo "Health checks completed. Logs saved at ${LOG_DIR}"
######################################################################
##########

How it saves real hours each week


1. Single touchpoint: One script handles every host.
2. Conditional alerts: You read email or Slack only when something breaks.
3. Logs for audit: Timestamped files let you review status changes fast.
4. Cron job: Schedule once (0 6 * * 1-5) and reclaim your mornings.

Feel free to adapt paths, thresholds, and backup checks for your environment.

What I Learned

• Automate what’s boring but important.


• Save logs for traceability.
• Silence is golden, only alert on anomalies.
• One script can multiply your reach across 10+ servers.

Want to Do the Same? Start Here

1. List your repeatable daily/weekly tasks.


2. Convert one into a function. Wrap it in a bash or cron job.
3. Test it. Log all output, and test on a staging server.
4. Add conditions. Use grep, if, and mail for basic alerting.
5. Document it. Use [Link]
6. , comments, and version control (e.g., Git).

Result

• Saved 5+ hours/week
• Less morning stress
• Alert-driven, not routine-driven
• Became reusable across teams and client environments

Final Thought
A good DBA solves problems.
A great DBA prevents them and automates the boring parts.

Your bash prompt is more powerful than you think.

What’s the best script you’ve ever written?


Drop your favorites (or struggles) in the comments

You might also like