SQL Server 2017 is bringing the new capability
to deploy Python based machine learning application in enterprise production environment
which is already running in SQL Server. This additional capability allows us to
bring any open source Python packages in SQL server to execute within T-SQL based stored procedures or codes.
Architecture - Python in SQL Server
Microsoft introduced sp_execute_external_script
which is a system stored procedure to execute external scripts. We can submit R
or Python script to this stored procedure in the secure manner. SQL server runs
this query with the help of external process called Launchpad which launches
this query and this concept is known as runtime launcher. After executing, it’s
connected back to SQL Server on the SQL Server satellite data channel. It’s working
as additional post process.
Python integration
in SQL Server
gives us the following key advantages-
Eliminate data movement with in-database machine
learning means you are free to do machine learning on the Python based compute
on the large datasets and there is no need to move your data into another
place.
Deploy your machine learning model and Python
scripts right in T-SQL stored procedure with will talk more about in the breath.
In this case, any app that can talk to SQL server and invoke with T-SQL stored
procedures without being aware or knowing that Python package or Python model
is running in the background.
It is allowing the performance and scale
advantages also. Database analytics integration in SQL Server 2016 introduced
new communication channel to exchange the data from SQL server to scripts which
are running inside SQL Server which is like columnar based compression channel,
it’s much faster than the ODBC. So, Python also uses the same integration
because SQL Server also has the in-memory column store indexes and in-memory
tables which are very highly recommended for large scale data analytics.
In addition of this integration, SQL Server 2017
is also bring the something called revoscalepy which is equalent to revoscaleR which
are not bonded by the memory or scale like that if we are doing the liner
regression on the billion rows of dataset then it’s really doable without any
memory limiting which means enterprise level scale grade.
Key scenarios which are enables by Python integration in SQL Server
Data scientists are able to do data exploration
and data modeling development on their local machines as well as SQL Servers machine. This is
the main advantage of the revoscale Python package which makes the local or remote
compute more transparent and there is no need to pushing data round because computed results come back to the local work-stations from the remote SQL Server.
Application developers are capable to operationalize
the Python code because it’s running inside the stored procedures and they do not
worry to care about that which version of Python is running inside T-SQL code.
We can say that intelligence is truly with the data and application developers do not
worry all about it.
In additional key benefit is for database administrator
where admin can choose to say that this kind of logins cannot use 5-6% CPU. So,
Python code cannot use 5-6% CPU. A dba can also see the database logs and views
to verify that who is executing the Python code as the regular part of the
database management.
Conclusion
Native integration of Python in SQL server for
in-database machine learning and the high-performance and distributed
statistical and machine learning functions from the RevoScaleR, RevoScalePy and MicrosoftML packages in
Microsoft R will be available as Python functions for use within SQL Server.
Reference - Microsoft blogs