Showing posts with label indexes. Show all posts
Showing posts with label indexes. Show all posts

Thursday, March 31, 2016

SQL - Statistics in SQL Server

Statistics are very important object in the SQL Server which is used for query optimization to define the query execution plans. They contain statistical information about the distribution of values in one or more columns of a table. 
The optimiser does the smart work and always uses a cost based weighting to choose a best plan, and the statistics are used to estimate the number of rows that will be processed by a given operator, and hence the cost of that operator. 
With the help of statistics information, the optimizer would know the selectivity for the data.
Whenever statistics are created against any table then Database Engine sorts the values of the columns on which the statistics are being built. The query optimizer computes a histogram on the column values in the first key column of the statistics object. A histogram measures the frequency of occurrence for each distinct value in a data set. Histogram has limitation of 200 steps, separated by intervals. 



What Are Statistics Objects?

Each statistics object is created on a list of one or more table columns which includes a histogram displaying the distribution of values in the first column. Statistics objects on multiple columns also store statistical information about the correlation of values among the columns. These correlation statistics, or densities, are derived from the number of distinct rows of column values.


In a very simple manner, we can understand that histogram is nothing but it is something which specifies how many rows exactly match each interval value, how many rows fall within an interval, and a calculation of the density of values, or the incidence of duplicate values, within an interval.
As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less-than-optimal decisions on how to process a query. However, the query optimizer still considers it to be a good candidate based on the index's outdated distribution statistics that are based on the data before the update.

Multicolumn statisticsJust what they sound like, statistics over multiple columns.  These are created by default whenever we create a composite index. We can also create them ourself without an index using the CREATE STATISTICS command. This will be helpful us when we have queries that filter on multiple columns absence of indexes.
Filtered statistics We can also put a filter on our statistics.  This is useful when we have a very large table but normally only query on a small subset of rows.  If these rows are highly selective, creating filtered statistics will help the optimizer choose the best path.
What are Filtered Statistics?

Filtered statistics are nothing but they play a very important role to improve query performance for SQL data queries that select from well-defined subsets of data. Filtered statistics use a filter predicate to select the subset of data that is included in the statistics. Well-designed filtered statistics can improve the query execution plan compared with full-table statistics.
SQL Server is responsible to maintain statistical information about key value distribution for all b-tree indexes so that it can more accurately estimate the number of qualifying rows. This information is used by the cost-based optimizer to generate the most efficient execution plan for the query, and aid in decisions about whether to use or not use an index for a particular query.


The most important thing is that Statistics do not have the b-tree structures to facilitate locating the data. However, they do provide cardinality statistics that the optimizer can use to develop more accurate row count estimates, resulting in more efficient query plans. Cardinality estimator is only interested in predicates. 

How to view Statistics within Data table?
DBCC SHOW_STATISTICS displays current query optimization statistics for a table or indexed view.  The syntax lets you specify a table or indexed view along with a target index name, statistics name, or column name:

DBCC SHOW_STATISTICS ( table_or_indexed_view_name , target )
[ WITH [ NO_INFOMSGS ] < option > [ , n ] ]
< option > :: =
STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM

Arguments
table_or_indexed_view_name: Name of the table or indexed view for which to display statistics information. Name of the table that contains the statistics to display. The table cannot be an external table.
Target: Name of the index, statistics, or column for which to display statistics information. Target is enclosed in brackets, single quotes, double quotes, or no quotes. If target is a name of an existing index or statistics on a table or indexed view, the statistics information about this target is returned
NO_INFOMSGS: Suppresses all informational messages that have severity levels from 0 through 10.
STATS_STREAM is Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.



NOTE: If target is the name of an existing column, and an automatically created statistics on this column exists, information about that auto-created statistic is returned. If an automatically created statistic does not exist for a column target, error message 2767 is returned. In SQL Data Warehouse and Parallel Data Warehouse, target cannot be a column name.
Example:

DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]);

DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]) WITH STATS_STREAM;









The Different way statistics being created
There are many ways to create statistics on the indexes on tables. Some of them are given below:
  1. The query optimizer creates statistics on key column for indexes on tables when the index is created.
  2. The query optimizer creates statistics for single columns in query predicates.
  3. Composite indexes creates Multi column statistics.
  4. sp_createstats stored procedure.
  5. Create Statistics Statement.
We will explain them one by one here.
References: https://msdn.microsoft.com/en-gb/library/ms174384.aspx

Wednesday, January 13, 2016

SQL - TempDB is Full

We already know that the tempdb database is a system database that contains temporary tables created by the server and by the user and always recreated each time SQL Server is restarted. Another points is that tempdb database consists of two parts: the data file(s) in the primary file group, and the tempdb log file(s).
Tempdb information
The tempdb database is a temporary workspace and SQL Server uses tempdb to perform many tasks which are given below:
  1. Storage of explicitly created temporary tables
  2. Worktables that hold intermediate results that are created during query processing and sorting
  3. Materialized static cursors
  4. Storage of version records when snapshot isolation levels or read-committed snapshot isolation levels are used
We know that temporary database is the one storing the snapshot of the tables before committing the commands so that on event of roll back, it can be restored. Consequently, it holds lot of space depending on the size of the table and the data to be processed.
In the SQL, we need to pull the data into temporary tables from the database to do more actions to meet the business requirements. These activities are the part of our day to day jobs which are the main root cause for the growing temporary database. Tempdb can grow very large on some systems if a lot of ‘dirty’ queries are run that sluff a lot of data into tempdb. We should be habitual to drop these temp tables after accomplish the tasks.
If our temporary database is full then we should need to look and carefully examine our functions, views, stored procedures and other data components. Test run each of them with analyzer to know how much does it take to query the data needed. If it returns on a longer time than expected, revise the codes to simplify.
How to Find out the correct size of tempdb?
To find the correct size of tempdb files after a shrink operation, execute the following statement in SQL Server Management Studio:
USE TEMPDB
GO

SELECT NAME, PHYSICAL_NAME, TYPE_DESC,
---- convert into KB
ACTUALSIZEKB=(SIZE*8)
FROM
SYS.DATABASE_FILES

NAME
PHYSICAL_NAME
TYPE_DESC
ACTUALSIZEKB
tempdev
d:datatempdb.mdf
ROWS
28070784
templog
d:datatemplog.ldf
LOG
9027840


Many programmers don’t consider the load of the database server when executing queries which is not a right approach. If you think that your tempDB is growing and taking a lots of the space of your disk then you should move your database to another location because-
  1. TempDB grows big and the existing drive does not have enough space.
  2. Moving TempDB to another file group which is on different physical drive helps to improve database disk read, as they can be read simultaneously.
Shrink tempdb
We can use to shrink the tempdb database in Microsoft SQL Server where tempdb resets to its configured size after each restart of the instance of SQL Server. A limitation of below method is that it operates only on the default tempdb logical files tempdev and templog. If more files are added to tempdb, you can shrink them after you restart SQL Server as a service. All tempdb files are re-created during start-up.
ALTER DATABASE tempdb MODIFY FILE
   (NAME = 'tempdev', SIZE = target_size_in_MB)
   --Desired target size for the data file

   ALTER DATABASE tempdb MODIFY FILE
   (NAME = 'templog', SIZE = target_size_in_MB)
   --Desired target size for the log file
       /*
       Note: The file size, max size cannot be greater than 2147483647 in units of a page size.
       The file growth cannot be greater than 2147483647 in units of both page size and percentage.
       */


Move TempDB from one drive to another drive
Before move the TempDB to another location, we should make sure that TempDB is set to autogrow and do not set a maximum size for TempDB. We need to check the file size of the tempdb becasue SQL Server locks tempdb while it is expanding and If the current drive is too full to allow autogrow events, then arrange a bigger drive, or add files to TempDB on another device. 
USE master
GO
---- change the location of tempdev file
ALTER DATABASE TempDB MODIFY FILE
(NAME = tempdev, FILENAME = 'd:datatempdb.mdf')
GO
---- change the location of templog file
ALTER DATABASE TempDB MODIFY FILE
(NAME = templog, FILENAME = 'd:datatemplog.ldf')
GO
The above commands are capable to change the physical location of the tempdb but no changes are made to TempDB till SQL Server restarts. So, we need to stop and restart SQL Server and after that it will create TempDB files in new locations.

Sunday, January 10, 2016

SQL - Remove non-numeric characters from a string

SQL - Remove expected characters from a string
In our day to day activities, we need to remove non-numeric, numeric or sometimes need to remove special characters from the string. We understand that there are many situations where you need to do a lot of string manipulation using T-SQL. The true fact is that many things which ideally should be done via SQLCLR. To handle this situation, we need an innovative solution which will be capable to return the expected result.
We are going to create a User defined SQL function which will be return the expected result based on the passed arguments. To meet the requirement, this function will perform the action based on the two parameters such as -
  1. First parameter will take the requested string to perform the action and,
  2. Another parameter will tell the action type to function.
There are lot of methods to get the expected result but we will use the ASCII value to get the expected result sets because every character position already defined in the ASCII.

/*
-- Removes special characters from a string value.
-- All characters except 0-9, a-z and A-Z are removed and
-- the remaining characters are returned.
@inpActionType = 1 : Remove Special Characters
@inpActionType = 2 : Remove all characters
@inpActionType = 3 : Remove integer values
*/
CREATE FUNCTION dbo.fun_GetString
(
----- Input String Argument on which action will be performed
@inpString VARCHAR(50),
----- Input Action Type Argument
@inpActionType int
)
---- Set return type
RETURNS VARCHAR(50)
  
BEGIN
   ---- check the nullables
   IF @inpString IS NULL
      RETURN NULL
   ----- declare local variable with default blank value
   DECLARE @outResult VARCHAR(50) = ''
   ----- declare local variable to get the length of the string
   DECLARE @intLength INT = 0, @intStartingPoint int=1
   ----- Set the length into the variable
   SET @intLength = LEN(@inpString)
  
   ---- check the length with the default starting point
   WHILE @intStartingPoint <= @intLength
  
   BEGIN
      ---- declare local variable to get the ASCII value
      DECLARE @ASCII INT
                    ---- Set the ASCII Value into the variable
      SET @ASCII = ASCII(SUBSTRING(@inpString, @intStartingPoint, 1))
                    ----- Remove Special Characters
                    IF @inpActionType = 1
                    BEGIN
      IF @ASCII BETWEEN 48 AND 57 OR @ASCII BETWEEN 65 AND 90 OR @ASCII BETWEEN 97 AND 122
         SET @outResult = @outResult + CHAR(@ASCII)
      END
                    ---- Remove all characters
                    ---- Contains only numeric ASCII Values
                    IF @inpActionType = 2
                    BEGIN
                    IF @ASCII BETWEEN 48 AND 57
         SET @outResult = @outResult + CHAR(@ASCII)
                    END
                    ---- Remove integer values
                    ---- Contains only character ASCII Values
                    IF @inpActionType = 3
                    BEGIN
                    IF @ASCII BETWEEN 65 AND 90 OR @ASCII BETWEEN 97 AND 122
         SET @outResult = @outResult + CHAR(@ASCII)
                    END
                    ---- set the next length position
      SET @intStartingPoint = @intStartingPoint + 1
      END
   ---- Return Null when output result string lenth is 0
   IF LEN(@outResult) = 0
      RETURN NULL
   ---- Return Output result
   RETURN @outResult
   END
   ----- Execute the function with arguments
   declare @inpRealString Varchar(50)='Ryan$010Arjun#123%India'
   select
   ----- Result without special characters
   NoSpecialChar=dbo.fun_GetString (@inpRealString,1),
   ----- Result for numeric only
   OnlyNumeric=dbo.fun_GetString (@inpRealString,2),
   ----- Result for characters only
   OnlyString=dbo.fun_GetString (@inpRealString,3)
NoSpecialChar
OnlyNumeric
OnlyString
Ryan010Arjun123India
10123
RyanArjunIndia

Conclusion
There’s SQL replace and pattern functions, but they only work once on a string. By using the above function, we can get the expected result based on  passed arguments. 
Learn more on another feature of SQL as:

  1. SQL - Stored Procedure
  2. SQL - Create Stored Procedure
  3. SQL - Execute Stored Procedure
  4. SQL - Alter Stored Procedure
  5. SQL - Views
  6. SQL – Stored Procedure Vs Ad-Hoc (In-Line)
  7. SQL - Merge Statement
  8. SQL - Functions
  9. SQL - Cursors
  10. SQL - Logical IIF() Function
  11. SQL - ISNULL Function
  12. SQL - CONCAT() function
  13. SQL - Stuff() Function

Thursday, December 31, 2015

SQL - EXP() Mathematical Function

SQL - EXP()
Exponential is a mathematical function which is almost always understood to mean the natural exponential function. The function EXP() returns the exponential value for the numeric expression. I have come across the EXP built in function in SQL Server and this function plays a very important role in data analytics calculations.
This function has one mandatory argument and it returns float type exponential value. In SQL, we have use the following syntax:
SQL Syntax: EXP(Argument)
Argument: Any numeric expression.
Return Type: float.
Note: If value is numeric and not equal to zero then it will return float type otherwise it will return 1 for the 0 value. if argument is null then return type should be null.
The exponential function is equal to ex. The value for the constant float number e is approximately 2.718281828 (natural logarithm).
----- declare local veriables
declare @inpfloat float=1.2,
@inpInt INT=1,
@inpDecimal Decimal=1.2
 
----- output parameters
select EXP(@inpfloat) as 'Float EXP()',
EXP(@inpINT) as 'Int EXP()',
EXP(@inpDecimal) as 'Decimal EXP()'
 
Float EXP()
Int EXP()
Decimal EXP()
3.320116923
2.718281828
2.718281828
---- set value 0 for veriable
SET @inpfloat=0
SET @inpInt=0
SET @inpDecimal=0
 
----- output parameters
select EXP(@inpfloat) as 'Float EXP()',
EXP(@inpINT) as 'Int EXP()',
EXP(@inpDecimal) as 'Decimal EXP()'
Float EXP()
Int EXP()
Decimal EXP()
1
1
1
We can view the difference on the above results. this is because 0 is to the power anything is equal to 1.
Conclusion
In simple words EXP() function is used to get the value of e ^ x because Exp expects a float and returns one, so the results from a decimal type include an implicit conversion to and from float and in turn this behaviour is due to inherent imprecision of FP numbers.