Microsoft Business Intelligence (Data Tools)|indexes

Showing posts with label indexes. Show all posts

Thursday, March 31, 2016

SQL - Statistics in SQL Server

Statistics are very important object in the SQL Server which is used for query optimization to define the query execution plans. They contain statistical information about the distribution of values in one or more columns of a table.

The optimiser does the smart work and always uses a cost based weighting to choose a best plan, and the statistics are used to estimate the number of rows that will be processed by a given operator, and hence the cost of that operator.
With the help of statistics information, the optimizer would know the selectivity for the data.

Whenever statistics are created against any table then Database Engine sorts the values of the columns on which the statistics are being built. The query optimizer computes a histogram on the column values in the first key column of the statistics object. A histogram measures the frequency of occurrence for each distinct value in a data set. Histogram has limitation of 200 steps, separated by intervals.

What Are Statistics Objects?

Each statistics object is created on a list of one or more table columns which includes a histogram displaying the distribution of values in the first column. Statistics objects on multiple columns also store statistical information about the correlation of values among the columns. These correlation statistics, or densities, are derived from the number of distinct rows of column values.

In a very simple manner, we can understand that histogram is nothing but it is something which specifies how many rows exactly match each interval value, how many rows fall within an interval, and a calculation of the density of values, or the incidence of duplicate values, within an interval.

As the data in a column changes, index and column statistics can become out-of-date and cause the query optimizer to make less-than-optimal decisions on how to process a query. However, the query optimizer still considers it to be a good candidate based on the index's outdated distribution statistics that are based on the data before the update.

Multicolumn statistics – Just what they sound like, statistics over multiple columns. These are created by default whenever we create a composite index. We can also create them ourself without an index using the CREATE STATISTICS command. This will be helpful us when we have queries that filter on multiple columns absence of indexes.
Filtered statistics – We can also put a filter on our statistics. This is useful when we have a very large table but normally only query on a small subset of rows. If these rows are highly selective, creating filtered statistics will help the optimizer choose the best path.
What are Filtered Statistics?

Filtered statistics are nothing but they play a very important role to improve query performance for SQL data queries that select from well-defined subsets of data. Filtered statistics use a filter predicate to select the subset of data that is included in the statistics. Well-designed filtered statistics can improve the query execution plan compared with full-table statistics.

SQL Server is responsible to maintain statistical information about key value distribution for all b-tree indexes so that it can more accurately estimate the number of qualifying rows. This information is used by the cost-based optimizer to generate the most efficient execution plan for the query, and aid in decisions about whether to use or not use an index for a particular query.

The most important thing is that Statistics do not have the b-tree structures to facilitate locating the data. However, they do provide cardinality statistics that the optimizer can use to develop more accurate row count estimates, resulting in more efficient query plans. Cardinality estimator is only interested in predicates.

How to view Statistics within Data table?

DBCC SHOW_STATISTICS displays current query optimization statistics for a table or indexed view. The syntax lets you specify a table or indexed view along with a target index name, statistics name, or column name:

DBCC SHOW_STATISTICS ( table_or_indexed_view_name , target )

[ WITH [ NO_INFOMSGS ] < option > [ , n ] ]

< option > :: =

STAT_HEADER | DENSITY_VECTOR | HISTOGRAM | STATS_STREAM

Arguments

table_or_indexed_view_name: Name of the table or indexed view for which to display statistics information. Name of the table that contains the statistics to display. The table cannot be an external table.

Target: Name of the index, statistics, or column for which to display statistics information. Target is enclosed in brackets, single quotes, double quotes, or no quotes. If target is a name of an existing index or statistics on a table or indexed view, the statistics information about this target is returned

NO_INFOMSGS: Suppresses all informational messages that have severity levels from 0 through 10.

STATS_STREAM is Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.

NOTE: If target is the name of an existing column, and an automatically created statistics on this column exists, information about that auto-created statistic is returned. If an automatically created statistic does not exist for a column target, error message 2767 is returned. In SQL Data Warehouse and Parallel Data Warehouse, target cannot be a column name.

Example:

DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]);

DBCC SHOW_STATISTICS('[dbo].[Employee]', [_WA_Sys_00000001_0EA330E9]) WITH STATS_STREAM;

The Different way statistics being created

There are many ways to create statistics on the indexes on tables. Some of them are given below:

The query optimizer creates statistics on key column for indexes on tables when the index is created.
The query optimizer creates statistics for single columns in query predicates.
Composite indexes creates Multi column statistics.
sp_createstats stored procedure.
Create Statistics Statement.

We will explain them one by one here.
References: https://msdn.microsoft.com/en-gb/library/ms174384.aspx

Mukesh Singh

With over 17 years of experience in the Data Engineering stack across a variety of cloud and on-premises systems, I have successfully delivered more than ten complete business product solutions. My expertise lies in building robust infrastructure and architecture to support data engineering, data analytics, and machine learning processes. These solutions have significantly improved collaboration among cross-functional teams, including data scientists, business analysts, software engineers, and stakeholders. Key Contributions Data Modelling and Integration • Data Modeling: Developed various data models to produce suitable data for business users, data analytics, data science, and data visualization teams. • Legacy Systems and Cloud Technologies: Integrated legacy systems with modern cloud-based technologies (AWS, Azure, GCP), data lakes, and data warehouses. • Streamlined Data Pipelines: Built efficient data pipelines, data warehouses, BI reports, and dashboards to streamline data access and insights.

Wednesday, January 13, 2016

SQL - TempDB is Full

We already know that the tempdb database is a system database that contains temporary tables created by the server and by the user and always recreated each time SQL Server is restarted. Another points is that tempdb database consists of two parts: the data file(s) in the primary file group, and the tempdb log file(s).
Tempdb information
The tempdb database is a temporary workspace and SQL Server uses tempdb to perform many tasks which are given below:

Storage of explicitly created temporary tables
Worktables that hold intermediate results that are created during query processing and sorting
Materialized static cursors
Storage of version records when snapshot isolation levels or read-committed snapshot isolation levels are used

We know that temporary database is the one storing the snapshot of the tables before committing the commands so that on event of roll back, it can be restored. Consequently, it holds lot of space depending on the size of the table and the data to be processed.

In the SQL, we need to pull the data into temporary tables from the database to do more actions to meet the business requirements. These activities are the part of our day to day jobs which are the main root cause for the growing temporary database. Tempdb can grow very large on some systems if a lot of ‘dirty’ queries are run that sluff a lot of data into tempdb. We should be habitual to drop these temp tables after accomplish the tasks.

If our temporary database is full then we should need to look and carefully examine our functions, views, stored procedures and other data components. Test run each of them with analyzer to know how much does it take to query the data needed. If it returns on a longer time than expected, revise the codes to simplify.

How to Find out the correct size of tempdb?

To find the correct size of tempdb files after a shrink operation, execute the following statement in SQL Server Management Studio:

USE TEMPDB

SELECT NAME, PHYSICAL_NAME, TYPE_DESC,

---- convert into KB

ACTUALSIZEKB=(SIZE*8)

FROM

SYS.DATABASE_FILES

NAME	PHYSICAL_NAME	TYPE_DESC	ACTUALSIZEKB
tempdev	d:datatempdb.mdf	ROWS	28070784
templog	d:datatemplog.ldf	LOG	9027840

Many programmers don’t consider the load of the database server when executing queries which is not a right approach. If you think that your tempDB is growing and taking a lots of the space of your disk then you should move your database to another location because-

TempDB grows big and the existing drive does not have enough space.
Moving TempDB to another file group which is on different physical drive helps to improve database disk read, as they can be read simultaneously.

Shrink tempdb

We can use to shrink the tempdb database in Microsoft SQL Server where tempdb resets to its configured size after each restart of the instance of SQL Server. A limitation of below method is that it operates only on the default tempdb logical files tempdev and templog. If more files are added to tempdb, you can shrink them after you restart SQL Server as a service. All tempdb files are re-created during start-up.

ALTER DATABASE tempdb MODIFY FILE

(NAME = 'tempdev', SIZE = target_size_in_MB)

--Desired target size for the data file

ALTER DATABASE tempdb MODIFY FILE

(NAME = 'templog', SIZE = target_size_in_MB)

--Desired target size for the log file

Note: The file size, max size cannot be greater than 2147483647 in units of a page size.

The file growth cannot be greater than 2147483647 in units of both page size and percentage.

Move TempDB from one drive to another drive

Before move the TempDB to another location, we should make sure that TempDB is set to autogrow and do not set a maximum size for TempDB. We need to check the file size of the tempdb becasue SQL Server locks tempdb while it is expanding and If the current drive is too full to allow autogrow events, then arrange a bigger drive, or add files to TempDB on another device.

USE master

---- change the location of tempdev file

ALTER DATABASE TempDB MODIFY FILE

(NAME = tempdev, FILENAME = 'd:datatempdb.mdf')

---- change the location of templog file

ALTER DATABASE TempDB MODIFY FILE

(NAME = templog, FILENAME = 'd:datatemplog.ldf')

The above commands are capable to change the physical location of the tempdb but no changes are made to TempDB till SQL Server restarts. So, we need to stop and restart SQL Server and after that it will create TempDB files in new locations.

Mukesh Singh

Sunday, January 10, 2016

SQL - Remove non-numeric characters from a string

SQL - Remove expected characters from a string

In our day to day activities, we need to remove non-numeric, numeric or sometimes need to remove special characters from the string. We understand that there are many situations where you need to do a lot of string manipulation using T-SQL. The true fact is that many things which ideally should be done via SQLCLR. To handle this situation, we need an innovative solution which will be capable to return the expected result.

We are going to create a User defined SQL function which will be return the expected result based on the passed arguments. To meet the requirement, this function will perform the action based on the two parameters such as -

First parameter will take the requested string to perform the action and,
Another parameter will tell the action type to function.

There are lot of methods to get the expected result but we will use the ASCII value to get the expected result sets because every character position already defined in the ASCII.

-- Removes special characters from a string value.

-- All characters except 0-9, a-z and A-Z are removed and

-- the remaining characters are returned.

@inpActionType = 1 : Remove Special Characters

@inpActionType = 2 : Remove all characters

@inpActionType = 3 : Remove integer values

CREATE FUNCTION dbo.fun_GetString

(

----- Input String Argument on which action will be performed

@inpString VARCHAR(50),

----- Input Action Type Argument

@inpActionType int

)

---- Set return type

RETURNS VARCHAR(50)

BEGIN

---- check the nullables

IF @inpString IS NULL

RETURN NULL

----- declare local variable with default blank value

DECLARE @outResult VARCHAR(50) = ''

----- declare local variable to get the length of the string

DECLARE @intLength INT = 0, @intStartingPoint int=1

----- Set the length into the variable

SET @intLength = LEN(@inpString)

---- check the length with the default starting point

WHILE @intStartingPoint <= @intLength

BEGIN

---- declare local variable to get the ASCII value

DECLARE @ASCII INT

---- Set the ASCII Value into the variable

SET @ASCII = ASCII(SUBSTRING(@inpString, @intStartingPoint, 1))

----- Remove Special Characters

IF @inpActionType = 1

BEGIN

IF @ASCII BETWEEN 48 AND 57 OR @ASCII BETWEEN 65 AND 90 OR @ASCII BETWEEN 97 AND 122

SET @outResult = @outResult + CHAR(@ASCII)

END

---- Remove all characters

---- Contains only numeric ASCII Values

IF @inpActionType = 2

BEGIN

IF @ASCII BETWEEN 48 AND 57

SET @outResult = @outResult + CHAR(@ASCII)

END

---- Remove integer values

---- Contains only character ASCII Values

IF @inpActionType = 3

BEGIN

IF @ASCII BETWEEN 65 AND 90 OR @ASCII BETWEEN 97 AND 122

SET @outResult = @outResult + CHAR(@ASCII)

END

---- set the next length position

SET @intStartingPoint = @intStartingPoint + 1

END

---- Return Null when output result string lenth is 0

IF LEN(@outResult) = 0

RETURN NULL

---- Return Output result

RETURN @outResult

END

----- Execute the function with arguments

declare @inpRealString Varchar(50)='Ryan$010Arjun#123%India'

select

----- Result without special characters

NoSpecialChar=dbo.fun_GetString (@inpRealString,1),

----- Result for numeric only

OnlyNumeric=dbo.fun_GetString (@inpRealString,2),

----- Result for characters only

OnlyString=dbo.fun_GetString (@inpRealString,3)

NoSpecialChar	OnlyNumeric	OnlyString
Ryan010Arjun123India	10123	RyanArjunIndia

Conclusion
There’s SQL replace and pattern functions, but they only work once on a string. By using the above function, we can get the expected result based on passed arguments.

Learn more on another feature of SQL as:

Mukesh Singh

Thursday, December 31, 2015

SQL - EXP() Mathematical Function

SQL - EXP()

Exponential is a mathematical function which is almost always understood to mean the natural exponential function. The function EXP() returns the exponential value for the numeric expression. I have come across the EXP built in function in SQL Server and this function plays a very important role in data analytics calculations.

This function has one mandatory argument and it returns float type exponential value. In SQL, we have use the following syntax:

SQL Syntax: EXP(Argument)

Argument: Any numeric expression.

Return Type: float.

Note: If value is numeric and not equal to zero then it will return float type otherwise it will return 1 for the 0 value. if argument is null then return type should be null.

The exponential function is equal to e^x. The value for the constant float number e is approximately 2.718281828 (natural logarithm).

----- declare local veriables

declare @inpfloat float=1.2,

@inpInt INT=1,

@inpDecimal Decimal=1.2

----- output parameters

select EXP(@inpfloat) as 'Float EXP()',

EXP(@inpINT) as 'Int EXP()',

EXP(@inpDecimal) as 'Decimal EXP()'

Float EXP()	Int EXP()	Decimal EXP()
3.320116923	2.718281828	2.718281828

---- set value 0 for veriable

SET @inpfloat=0

SET @inpInt=0

SET @inpDecimal=0

----- output parameters

select EXP(@inpfloat) as 'Float EXP()',

EXP(@inpINT) as 'Int EXP()',

EXP(@inpDecimal) as 'Decimal EXP()'

Float EXP()	Int EXP()	Decimal EXP()
1	1	1

We can view the difference on the above results. this is because 0 is to the power anything is equal to 1.

Conclusion

In simple words EXP() function is used to get the value of e ^ x because Exp expects a float and returns one, so the results from a decimal type include an implicit conversion to and from float and in turn this behaviour is due to inherent imprecision of FP numbers.

Mukesh Singh

Thursday, March 31, 2016

SQL - Statistics in SQL Server

Wednesday, January 13, 2016

SQL - TempDB is Full

Sunday, January 10, 2016

SQL - Remove non-numeric characters from a string

Thursday, December 31, 2015

SQL - EXP() Mathematical Function

Popular Posts