MySQL Stored Procedures: Building High Performance Web Applications
The document details the use of MySQL stored procedures, functions, triggers, and events as tools for building high-performance web applications. It discusses their advantages, such as improved performance and security, along with the performance implications and alternatives for various tasks, including finding prime numbers and generating sales statistics. Additionally, it highlights best practices for optimizing stored programs and provides performance comparisons to other programming languages.
● Programsas Database Schema Objects ● Executed in-process with the Database ● Types of Stored Routines: ● Procedures ● Functions ● Triggers ● Events (Temporal triggers; new in MySQL 5.1) ● Language: ● Subset of Standard SQL:2003 SQL/PSM ● Procedural, Block structured ● Do not confuse with User Defined Functions (UDF)! Overview
4.
● StoredProcedures & Functions ● Encapsulate tasks or Calculations for reuse ● Single point of definition for Business Logic ● Source Safely stored and backed up ● Added layer of Security ● Triggers ● Data-Driven ● Enforce Data quality through Basic validation ● Enforce complex Business Rules ● Automatically Update Aggregate tables ● Events (MySQL Server 5.1 beta) ● Schedule Code Execution in time. ● Use instead of cron or windows event scheduler ● Automatically Update Aggregate tables
5.
● Performance ●Save network roundtrips, lower latency ● Portability and Reuse ● Single point of definition ● Reusable from many application contexts ● Security ● DEFINER versus INVOKER ● Grant only Execution Privilege ● Ease of Maintenance ● Code stored in the database ● Browse using information_schema database ● 'Headless' administrative tasks ● No additional runtime environment required Advantages
6.
● Performance ●Overhead may result in higher latency ● Increased usage of database server computing power may negatively affect throughput Disadvantages
Stored program tofind prime numbers CREATE PROCEDURE sp_nprimes(p_num int) BEGIN DECLARE i INT; DECLARE j INT; DECLARE nprimes INT; DECLARE isprime INT; SET i=2; SET nprimes=0; main_loop: WHILE (i<p_num) do SET isprime=1; SET j=2; divisor_loop: WHILE (j<i) DO IF (MOD(i,j)=0) THEN SET isprime=0; LEAVE divisor_loop; END IF; SET j=j+1; END WHILE ; IF (isprime=1) THEN SET nprimes=nprimes+1; END IF; SET i=i+1; END WHILE; SELECT CONCAT(nprimes,' prime numbers less than ',p_num); END;
9.
Oracle implementation ofthe prime number procedure PROCEDURE N_PRIMES ( p_num NUMBER) IS i INT; j INT; nprimes INT; isprime INT; BEGIN i:=2; nprimes:=0; <<main_loop>> WHILE (i<p_num) LOOP isprime:=1; j:=2; <<divisor_loop>>
10.
Oracle implementation ofthe prime number procedure(cont.) WHILE (j<i) LOOP IF (MOD(i,j)=0) THEN isprime:=0; EXIT divisor_loop; END IF; j:=j+1; END LOOP ; IF (isprime=1) THEN nprimes:=nprimes+1; END IF; i:=i+1; END LOOP; dbms_output.put_line(nprimes||' prime numbers less than '||p_num); END;
The MySQL storedprogram language is relatively slow when it comes to performing arithmetic calculations. Avoid using stored programs to perform number crunching.
13.
Feeling less enthusiastic about stored program performance ?????
Stored program togenerate statistics CREATE PROCEDURE sales_summary( ) READS SQL DATA BEGIN DECLARE SumSales FLOAT DEFAULT 0; DECLARE SumSquares FLOAT DEFAULT 0; DECLARE NValues INT DEFAULT 0; DECLARE SaleValue FLOAT DEFAULT 0; DECLARE Mean FLOAT; DECLARE StdDev FLOAT; DECLARE last_sale INT DEFAULT 0; DECLARE sale_csr CURSOR FOR SELECT sale_value FROM SALES s WHERE sale_date >date_sub(curdate( ),INTERVAL 6 MONTH); DECLARE CONTINUE HANDLER FOR NOT FOUND SET last_sale=1; OPEN sale_csr; sale_loop: LOOP FETCH sale_csr INTO SaleValue; IF last_sale=1 THEN LEAVE sale_loop; END IF;
16.
Stored program togenerate statistics ( cont.) SET NValues=NValues+1; SET SumSales=SumSales+SaleValue; SET SumSquares=SumSquares+POWER(SaleValue,2); END LOOP sale_loop; CLOSE sale_csr; SET StdDev = SQRT((SumSquares - (POWER(SumSales,2) / NValues)) / NValues); SET Mean = SumSales / NValues; SELECT CONCAT('Mean=',Mean,' StdDev=',StdDev); END
Stored programs donot incur the network overhead of languages such as PHP or Java. If network overhead is an issue, then using a stored program can be an effective optimization.
Avoid Self-Joins withProcedural Logic Finding the maximum sale for each customer SELECT s.customer_id,s.product_id,s.quantity, s.sale_value FROM sales s, (SELECT customer_id,max(sale_value) max_sale_value FROM sales GROUP BY customer_id) t WHERE t.customer_id=s.customer_id AND t.max_sale_value=s.sale_value AND s.sale_date>date_sub(curdate( ),interval 6 month); we first need to create a temporary table to hold the customer ID and maximum sale value and then join that back to the sales table to find the full details for each of those rows.
23.
Stored program toreturn maximum sales for each customer over the last 6 months CREATE PROCEDURE sp_max_sale_by_cust( ) MODIFIES SQL DATA BEGIN DECLARE last_sale INT DEFAULT 0; DECLARE l_last_customer_id INT DEFAULT -1; DECLARE l_customer_id INT; DECLARE l_product_id INT; DECLARE l_quantity INT; DECLARE l_sale_value DECIMAL(8,2); DECLARE counter INT DEFAULT 0; DECLARE sales_csr CURSOR FOR SELECT customer_id,product_id,quantity, sale_value FROM sales WHERE sale_date>date_sub(currdate( ),interval 6 month) ORDER BY customer_id,sale_value DESC; DECLARE CONTINUE HANDLER FOR NOT FOUND SET last_sale=1; OPEN sales_csr; sales_loop: LOOP FETCH sales_csr INTO l_customer_id,l_product_id,l_quantity,l_sale_value; IF (last_sale=1) THEN LEAVE sales_loop; END IF; ** ** IF l_customer_id <> l_last_customer_id THEN /* This is a new customer so first row will be max sale*/ INSERT INTO max_sales_by_customer (customer_id,product_id,quantity,sale_value) VALUES(l_customer_id,l_product_id,l_quantity,l_sale_value); END IF; SET l_last_customer_id=l_customer_id; END LOOP; END we can use a stored program to retrieve the data in a single pass through the sales table
Optimize Correlated UpdatesCorrelated UPDATE statement UPDATE customers c SET sales_rep_id = (SELECT manager_id FROM employees WHERE surname = c.contact_surname AND firstname = c.contact_firstname AND date_of_birth = c.date_of_birth) WHERE (contact_surname, contact_firstname, date_of_birth) IN (SELECT surname, firstname, date_of_birth FROM employees and ); Here employee table is accessed twice
26.
Stored program alternativeto the correlated update CREATE PROCEDURE sp_correlated_update( ) MODIFIES SQL DATA BEGIN DECLARE last_customer INT DEFAULT 0; DECLARE l_customer_id INT ; DECLARE l_manager_id INT; DECLARE cust_csr CURSOR FOR select c.customer_id,e.manager_id from customers c, employees e where e.surname=c.contact_surname and e.firstname=c.contact_firstname and e.date_of_birth=c.date_of_birth; DECLARE CONTINUE HANDLER FOR NOT FOUND SET last_customer=1;
27.
Stored program alternativeto the correlated update(cont.) OPEN cust_csr; cust_loop: LOOP FETCH cust_csr INTO l_customer_id,l_manager_id; IF (last_customer=1) THEN LEAVE cust_loop; END IF; UPDATE customers SET sales_rep_id=l_manager_id WHERE customer_id=l_customer_id; END LOOP; END; Here table is only accessed once and cursor is used to store data
28.
Performance of acorrelated update and stored program alternative
29.
Optimizing Loops :Move Unnecessary Statements Out of a Loop A poorly constructed loop WHILE (i<=1000) DO SET j=1; WHILE (j<=5000) DO SET rooti=sqrt(i); SET rootj=sqrt(j); SET sumroot=sumroot+rooti+rootj; SET j=j+1; END WHILE; SET i=i+1; END WHILE; There are 1000 different values of i, however because square root is calculated inside j loop . It is calculated 1000*5000 ie 5 million times
30.
Moving unnecessary calculationsout of a loop WHILE (i<=1000) DO SET rooti=sqrt(i); SET j=1; WHILE (j<=5000) DO SET rootj=sqrt(j); SET sumroot=sumroot+rootj+rooti; SET j=j+1; END WHILE; SET i=i+1; END WHILE;
Ensure that allstatements within a loop truly belong within the loop. Move any loop-invariant statements outside of the loop.
33.
Use LEAVE orCONTINUE to Avoid Needless Processing Loop that iterates unnecessarily divisor_loop: WHILE (j<i) do /* Look for a divisor */ IF (MOD(i,j)=0) THEN SET isprime=0; /* This number is not prime*/ END IF; SET j=j+1; END WHILE ;
34.
Loop with aLEAVE statement to avoid unnecessary iterations divisor_loop: WHILE (j<i) do /* Look for a divisor */ IF (MOD(i,j)=0) THEN SET isprime=0; /* This number is not prime*/ LEAVE divisor_loop; /* No need to keep checking*/ END IF; SET j=j+1; END WHILE ;
35.
Modifying the WHILEcondition to avoid unnecessary iterations divisor_loop: WHILE (j<i AND isprime=1) do /* Look for a divisor */ IF (MOD(i,j)=0) then SET isprime=0; /* This number is not prime*/ END IF; SET j=j+1; END WHILE ;
36.
Effect of usingLEAVE or modifying WHILE clause to avoid unnecessary iterations
37.
Make sure thatyour loops terminate when all of the work has been done, either by ensuring that the loop continuation expression is comprehensive or if necessary by using a LEAVE statement to terminate when appropriate.
38.
IF and CASEStatements Test for the Most Likely Conditions First Poorly constructed IF statement IF (percentage>95) THEN SET Above95=Above95+1; ELSEIF (percentage >=90) THEN SET Range90to95=Range90to95+1; ELSEIF (percentage >=75) THEN SET Range75to89=Range75to89+1; ELSE SET LessThan75=LessThan75+1; END IF; Optimized IF statement IF (percentage<75) THEN SET LessThan75=LessThan75+1; ELSEIF (percentage >=75 AND percentage<90) THEN SET Range75to89=Range75to89+1; ELSEIF (percentage >=90 and percentage <=95) THEN SET Range90to95=Range90to95+1; ELSE SET Above95=Above95+1; END IF;
If an IFstatement is to be executed repeatedly, placing the most commonly satisfied condition earlier in the IF structure may optimize performance.
41.
Avoid Unnecessary ComparisonsIF statement with common condition in each expression IF (employee_status='U' AND employee_salary>150000) THEN SET categoryA=categoryA+1; ELSEIF (employee_status='U' AND employee_salary>100000) THEN SET categoryB=categoryB+1; ELSEIF (employee_status='U' AND employee_salary<50000) THEN SET categoryC=categoryC+1; ELSEIF (employee_status='U') THEN SET categoryD=categoryD+1; END IF; Nested IF statement to avoid redundant comparisons IF (employee_status='U') THEN IF (employee_salary>150000) THEN SET categoryA=categoryA+1; ELSEIF (employee_salary>100000) THEN SET categoryB=categoryB+1; ELSEIF (employee_salary<50000) THEN SET categoryC=categoryC+1; ELSE SET categoryD=categoryD+1; END IF; END IF;
If your IFor CASE statement contains compound expressions with redundant comparisons, consider nesting multiple IF or CASE statements to avoid redundant processing.
44.
CASE Versus IFCASE customer_code WHEN 1 THEN SET process_flag=7; WHEN 2 THEN SET process_flag=9; WHEN 3 THEN SET process_flag=2; ELSE SET process_flag=0; END CASE; IF customer_code= 1 THEN SET process_flag=7; ELSEIF customer_code= 2 THEN SET process_flag=9; ELSEIF customer_code=3 THEN SET process_flag=2; ELSE SET process_flag=0; The IF statement is roughly 15% faster than the equivalent CASE statemen presumably this is the result of a more efficient internal algorithm for IF in the MySQL code. END IF ;
45.
Recursion Recursive implementationof the Fibonacci algorithm CREATE PROCEDURE rec_fib(n INT,OUT out_fib INT) BEGIN DECLARE n_1 INT; DECLARE n_2 INT; IF (n=0) THEN SET out_fib=0; ELSEIF (n=1) then SET out_fib=1; ELSE CALL rec_fib(n-1,n_1); CALL rec_fib(n-2,n_2); SET out_fib=(n_1 + n_2); END IF; END
46.
Nonrecursive implementation ofthe Fibonacci sequence CREATE PROCEDURE nonrec_fib(n INT,OUT out_fib INT) BEGIN DECLARE m INT default 0; DECLARE k INT DEFAULT 1; DECLARE i INT; DECLARE tmp INT; SET m=0; SET k=1; SET i=1; WHILE (i<=n) DO SET tmp=m+k; SET m=k; SET k=tmp; SET i=i+1; END WHILE; SET out_fib=m; END
Cursors Two equivalentstored programs, one using INTO and the other using a cursor CREATE PROCEDURE using_into ( p_customer_id INT,OUT p_customer_name VARCHAR(30)) READS SQL DATA BEGIN SELECT customer_name INTO p_customer_name FROM customers WHERE customer_id=p_customer_id; END; CREATE PROCEDURE using_cursor (p_customer_id INT,OUT p_customer_name VARCHAR(30)) READS SQL DATA BEGIN DECLARE cust_csr CURSOR FOR SELECT customer_name FROM customers WHERE customer_id=p_customer_id; OPEN cust_csr; FETCH cust_csr INTO p_customer_name; CLOSE cust_csr; END;
50.
Relative performance ofINTO versus CURSOR fetch : over 11,000 executions, the INTO -based stored program was approximately 15% faster than the stored program that used an explicit cursor.
51.
If you knowthat a SQL statement will return only one row, then a SELECT ... INTO statement will be slightly faster than declaring, opening, and fetching from a cursor.
52.
Trigger Overhead "Trivial"trigger CREATE TRIGGER sales_bi_trg BEFORE INSERT ON sales FOR EACH ROW SET @x=NEW.sale_value; When we implemented this trigger for 100,000 sales row the time of execution increased by 45%.
53.
A more complextrigger CREATE TRIGGER sales_bi_trg BEFORE INSERT ON sales FOR EACH ROW BEGIN DECLARE row_count INTEGER; SELECT COUNT(*) INTO row_count FROM customer_sales_totals WHERE customer_id=NEW.customer_id; IF row_count > 0 THEN UPDATE customer_sales_totals SET sale_value=sale_value+NEW.sale_value WHERE customer_id=NEW.customer_id; ELSE INSERT INTO customer_sales_totals (customer_id,sale_value) VALUES(NEW.customer_id,NEW.sale_value); END IF; END
54.
This trigger increasedthe time of execution by 100 times Index to support our trigger CREATE UNIQUE INDEX customer_sales_totals_cust_id ON customer_sales_totals(customer_id )
The optimization ofstored program code follows the same general principles that are true for other languages. In particular: Optimize loop processing: ensure that no unnecessary statements occur within a loop; exit the loop as soon as you are logically able to do so. Reduce the number of comparisons by testing for the most likely match first, and nest IF or CASE statements when necessary to eliminate unnecessary comparisons. Avoid recursive procedures. Because MySQL triggers execute once for each row affected by a DML statement, the effect of any unoptimized statements in a trigger will be magnified during bulk DML operations. Trigger code needs to be very carefully optimized expensive SQL statements have no place in triggers.
...and we arehiring ^_^ For more information, please feel free to drop in a line to [email_address] or visit http://www.osscube.com www.osscube.com
Editor's Notes
#48 Not only is the recursive algorithm less efficient for almost any given input value, it also degrades rapidly as the number of recursions increases (which is, in turn, dependent on which element of the Fibonacci sequence is requested). As well as being inherently a less efficient algorithm, each recursion requires MySQL to create the context for a new stored program (or function) invocation. As a result, recursive algorithms tend to waste memory as well as being slower than their iterative alternatives.