0% found this document useful (0 votes)

13 views19 pages

Boost Your Python Development Speed by 80x With PyKX 1703203067

The document discusses PyKX, a tool that enhances Python's performance by making it 80x faster and 600x more efficient, particularly in data analytics and machine learning applications. It highlights the integration of PyKX with kdb, allowing Python users to leverage high-performance data management and analytics capabilities while maintaining familiarity with Python. Various use cases, including transaction cost analysis and anomaly detection, demonstrate significant improvements in execution speed and memory efficiency when using PyKX.

Uploaded by

Harsh Karanwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views19 pages

Boost Your Python Development Speed by 80x With PyKX 1703203067

Uploaded by

Harsh Karanwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

80x faster | 600x more efficient

Accelerate Your Python

Development Journey With PyKX
Author: Steve Wilcockson
[Link]

[Link]
Python 80x faster | 600x more memory
efficient
Python is an amazing language, and NumPy, SciPy and Pandas are outstanding
packages. But sometimes you need to run Python models and analytics faster with
more data, deploy them more efficiently, and move to production sooner.

PyKX can help, see the example below:

XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

In this whitepaper, we investigate several use cases and examples – from fast and
big data running in research notebooks, through to mission critical models. We also
explore streamlining the journey from data analytics research to production –
referencing workflows ranging from simple aggregations to deep neural networks.

We examine use cases like those depicted above, where PyKX can enhance
Python’s efficiency by over 600x and speed by 80x. Additionally, we dive into
scenarios where the performance gains might be less significant in terms of
multiples but equally impactful.

Let’s start by exploring the very foundations of Python, NumPy and Pandas
alongside kdb from KX, which has a surprisingly similar background.

1
Python and kdb: Yin and Yang
Python has an interesting history. It was developed in the early ‘90s as the
brainchild of Dutchman Guido Van Rossum, subsequently dubbed “Benevolent
Dictator for Life of the Python Programming Language”. Its name can be
attributed to his love of the British comedy series Monty Python’s Flying Circus –
hence explaining eric1 and IDLE2 as supporting IDEs along the way.

It was used originally as much for software testing and web scripting as for data
analysis, but its versatility and ease of use contributed to widespread adoption
among programmers, engineers and scientists. As a result, it has become the
lingua franca of data scientists for data analytics, machine learning and artificial
intelligence. Much of that success can be attributed to some of its key design
features:

Easy-to-read syntax: Python uses indentation to define code blocks, making it

highly readable.
Dynamic typing: It's declarative and performs type checking at runtime,
allowing flexible and dynamic programming.
Interpreted nature: Programs are executed by an interpreter, enabling quick
development and testing.
Cross-platform compatibility: It's available on major operating systems like
Windows, macOS, and Linux, ensuring portability of code.

Add to that an unsurpassed community of libraries powering modern data science

and AI supported by millions of experts across the world.

The q programming language in kdb has a similar vintage. It grew from the k
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

programming language, based on APL, and shares similar design principles in terms
of its dynamic typing, being interpretative and providing cross-platform
compatibility. While it may not have comparative readability, those well versed in
it would argue it's compensated by mathematical and data management
simplicity, which comes from columnar design, vector processing and efficient
compression.

Kdb and q are optimal for time series and vector-native use cases in data science
and AI use cases. Performance is proven in multiple independent benchmarks from
Securities Technology Analysis Center (STAC)3 and used widely by major banks and
innovators in automotive, healthcare, telecommunications, and manufacturing
organizations.

2
In the capital markets, specifically at hedge fund AQR capital management, Wes
McKinney, introduced Pandas (PANel Data & AnalysiS), which sits with NumPy
(numerical Python) and the SciPy (scientific Python) stack. This brought
foundational mathematical operations and data types to Python, and focussed on
usability and convenience rather than performance, helping application
development of data management and analytics, not just in finance but all
industries and academia. However, ease of use and convenience does not
automatically mean production-level performance and the ability to service big
and fast data use cases.

Whether your Pythonic data analysis use case is in fraud detection, capital
markets, predictive maintenance, healthcare, or retail analytics—regardless of
data set and model type—consider this: How capable are your Python apps really?
Can they take on more data, deliver absolute performance and efficiency, and
become production-worthy sooner?
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

3
PyKX: Combining Ease of Use, Data
Management and Hyper-Efficient
Analytics
By adding PyKX into Jupyter Notebooks, Python users can tap into the capabilities
of kdb to instantly get the best of all worlds. Python users can:

Query kdb-served data from Python via an easily implemented API, ideal for the
Python-using analyst and data scientist wanting clean, fast, big data.
Store, query, manipulate and use high-performance q objects within a Python
process.
Embed Python functionality, for example deep learning capabilities, within q
sessions (for the kdb/q developers).

In addition, PyKX provides a module interface that can easily load q scripts,
namespaces, and contexts. Once loaded, their production-worthy hyper-efficient
functions can be accessed as Python modules. Add to that support for ANSI SQL
querying and your data science becomes a lot more powerful.

With PyKX, then, Python developers can deploy their skills, re-use their libraries and
enjoy the high performance and resource efficiency of kdb.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

4
Successful Users Agree
The benefits of combining the power of kdb with the familiarity of Python were
compellingly told by Erin Stanton, Data Scientist at Virtu Financial, at KXCON[23].
She explained how it enabled them to accelerate their machine learning adoption
by allowing Python users in research, model, and analytics functions to focus on
model selection and feature engineering rather than data management. She
further explained how their response times were accelerated, citing one 8-hour
SQL-based report executing in 5 minutes with Python supported by kdb. That was
attractive enough to provide the same valuable ML services directly to their
customers, distinguishing Virtu from its peers.

At the same event, a leading high frequency hedge fund discussed how PyKX
helped embed kdb functionality directly into its Python applications to deliver real-
time analytics and process over 1 trillion events per day. For the fund, the more
agile, convenient Python programming language could take on production
capabilities that previously might have been limited to C++.

That same combination of Python for data manipulation and kdb for number
crunching was commended by Alex Donoghue, TD Securities who explained how
data engineers, quants and electronic traders could leverage their Python skills
and libraries over kdb without learning the q language. However, he also
commented that once customers were familiar with q, performance and efficiency
doors opened.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

5
Using PyKX
PyKX is easy to install and use - just add PyKX to your Jupyter Notebook as shown
below. From there on, it's simply “Python as normal”.

Well, not quite “as normal” - it’s all the familiarity of Python but powered by kdb –
for the best of both worlds. Actually, it’s the best of three worlds. Along with
Pythonic usability and readability, users receive fast data and high-performance
analytics to:
Test with more data.
Run faster code, including real-time microsecond applications.
Take code from research to production faster (kdb serves both production and
research environments).

The code below shows the time difference when generating 100 million floats
compared to normal NumPy. The change is clear - in q it took 232 milliseconds, in
NumPy it took 648.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

With support for ANSI SQL, data engineers can access the benefits of kdb without
having to learn q as its querying language.
6
Once again, both easy and fast. That speed is illustrated below in running a SQL
query in PyKX (2.31 milliseconds) compared to Pandas (39.5 milliseconds).
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

As memory space is shared between Python and q, data access is simplified and
highly efficient, normally zero copy. This is shown below in transferring one million q
floats to NumPy in what is effectively a constant time operation. Similarly, data
can transfer from NumPy to q in the same manner.

7
Developers can also interact with PyKX tables. As well as supporting ANSI SQL and
an API for qSQL, a Pandas API allows users to reference the metadata as they
would in Pandas, but also index into PyKX data with the same syntax. Moreover, it
tends to be faster, as illustrated below. The same query takes 158 milliseconds when
operating on a standard Pandas dataframe, but 46 milliseconds on a PyKX table.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

8
Sample Use Cases
1. Transaction Cost Analysis

Transaction Cost Analysis (TCA) measures what actually occurred versus what was
expected, particularly in financial markets. However, similar examples can apply in
other contexts. For instance, how do my actual discounted grocery sales over the
day compare to my expectation, or is my machine in my manufacturing pod
processing the expected number of components per unit of time?

For those interested in the detail, TCA is normally both an internally useful and a
regulatory and compliance obligation to quantify trade efficiency for financial
institutions. Because transactions happen at high volumes and velocities in
finance, kdb is commonly the production environment, while Python is often the
algorithmic prototyping environment.

Consequently, bringing these two technologies closer together makes a lot of

sense. Interoperability lowers the overheads of the production platform, making it
more accessible and powerful for larger Python development and quant teams.

In our example use case, we first ingest datasets of quotes (9 million), market
trades (900K) and broker trades (20K) into Pandas dataframes and PyKX tables. To
then quantify slippage, we calculate the difference between execution price and a
chosen benchmark, assessing the bid-ask spread, its volatility and market impact.
For volatility specifically, an already existing highly efficient q function operates
within PyKX. See below.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

Native Python functions combine with SQL and kdb to derive the bid-ask spread. In
this case a moving average calculation is used.

9
To quantify liquidity, a PyKX-enabled SQL query within the code calculates trading
volumes of 10-minute intervals during the trading day.

Results are then visualized in the notebook. The graphs on the left-hand side
(below) show heightened volatility at start and end of day, and how, as volatility
increases, so too does the spread. The right-hand side show the corresponding
increase in trading volume at market opening and close. For those familiar with the
domain, that’s standard market behaviour, but important to understand.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

Having run these calculations to understand the so-called slippage factor — the
difference between the executed price and expected price — and the times it may
become elevated, the 'asof' join function combines quotes and executions tables.
The merged table accurately informs the calculation and plotting of the slippage
factor over time and across venues.

10
Venue 1 has higher slippage than the other two venues, a useful insight for the
trading desk and its management.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

But most importantly, whether you care about slippage and trade execution or not,
the execution times and resource usage when processing the files and running the
calculations are transformed. The example below shows an 85x response
improvement with PyKX compared to Pandas (1150 versus 13.5 milliseconds).

11
Even more impressive, memory usage shows an almost 630x reduction.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

12
Sample Use Cases
2. Anomaly Detection

Anomaly detection is key to many use cases, including fraud detection, criminal
investigations, and cybersecurity. It’s particularly important in manufacturing
processes for predictive maintenance in helping to identify unusual behavior that
can signify maintenance requirements. With early intervention, downtime can be
reduced or negated, and yields improved. Achieving it requires the ability to
process streams of data from the machinery, normally sensors, and correlate those
streams with expected data drawn from historical data using statistical and
machine learning techniques. Once again, coupling Python with kdb to integrate
and allow the data to be interrogated makes sense. In the use case below, we
deploy Python with the low/no code kdb Insights Enterprise.

kdb Insights Enterprise provides an intuitive GUI to define, run and maintain data
pipelines from ingestion and transformation to storage and publication. The drag-
and-drop interface, shown below, outlines how data is captured, decoded and
transformed from an MQTT messaging service in JSON format and written to a kdb
database.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

Once the data is prepared, it can be accessed by SQL for simple queries and
analytics. In this instance, calculating 3 sigma upper and lower control limits to
account for 99.7% of expected temperature fluctuations. Temperatures recorded
outside that range could signify trouble and justify investigation. This is a
prescriptive approach.

13
Alternatively, or in addition, a deep neural network regression model could run a
data-driven approach to assess the likelihood of breakdowns, dynamically
accessing live data with trained model sets. The architecture is illustrated below.

With Python, functionality can be written to calculate time intervals between

threshold breeches and moving averages as below.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

14
They can then be incorporated as a parallel process within the data pipeline:

From a machine learning model registry, an appropriate model can be selected:

XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

This too can be added to the data pipeline and fed data from the newly defined
processes:

15
Operators can then compare the prediction of breeches in temperature levels that
would warrant intervention to "real-world" data as below, further fine-tuning the
model to improve its accuracy for production deployment.

Conclusion
Whatever your use case, if a large amount of data/ constant streams of data are
to be analyzed, whether real-time streaming or frequent batch analyses, Python
can draw upon the ultra-efficient kdb world to provide data or models. This way,
whether you want to stay in Python, NumPy, and Pandas or explore the highest
performance of q like TD Securities and others did in the use case examples;
efficiencies can be made. This allows larger data volumes and faster velocities to
be processed, improving the capacity of analytics, and helping take solutions to
production faster.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

16
Getting Started and More Information
The PyKX documentation site provides an overview of PyKX, a comprehensive user
guide, details on its APIs, and supporting examples of it in use.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

17
A thorough reference card of key functionality in kdb q is available to Python
developers and listed below.
XKyP htiW yenruoJ tnempoleveD nohtyP ruoy etareleccA

Two ways to get started

1. Click here for more introductory information on PyKX. To gain hands-on
experience, you can easily install it from PyPI using a simple “pip install pykx”
command - this helpful guide will kickstart your journey.

2. Visit the KX Academy page to access PyKX in a sandbox environment to learn

how to store, query, manipulate and use kdb objects.

Either way, you are on the brink of being able to access the power of kdb from the
familiarity of Python. Enjoy!

Python 2
No ratings yet
Python 2
18 pages
Python Tutorial
No ratings yet
Python Tutorial
18 pages
What Is Python?: Why Python For Data Science?
No ratings yet
What Is Python?: Why Python For Data Science?
3 pages
Python Programming1
No ratings yet
Python Programming1
27 pages
2 IntroPython
No ratings yet
2 IntroPython
18 pages
Python Libraries Seminar Report
100% (2)
Python Libraries Seminar Report
16 pages
BSM 461 Introduction To Big Data: Kevser Ovaz Akpınar, PHD
No ratings yet
BSM 461 Introduction To Big Data: Kevser Ovaz Akpınar, PHD
26 pages
Traning Report
No ratings yet
Traning Report
19 pages
Python for Developers & Analysts
No ratings yet
Python for Developers & Analysts
23 pages
Python Syllabus for Data Analysts
No ratings yet
Python Syllabus for Data Analysts
6 pages
Ds Python Unit-I
No ratings yet
Ds Python Unit-I
30 pages
Python for Data Analysis Overview
No ratings yet
Python for Data Analysis Overview
49 pages
Python Programming For Economics and Finance
No ratings yet
Python Programming For Economics and Finance
267 pages
Introduction
No ratings yet
Introduction
45 pages
Languages Data Scientist
No ratings yet
Languages Data Scientist
13 pages
Python for Enterprise Data Analysis
No ratings yet
Python for Enterprise Data Analysis
4 pages
Python Programming For Economics Finance
No ratings yet
Python Programming For Economics Finance
267 pages
Introduction To Python
No ratings yet
Introduction To Python
71 pages
DDI Book Chapter Tools and Techniques
No ratings yet
DDI Book Chapter Tools and Techniques
13 pages
Sample Project
No ratings yet
Sample Project
12 pages
Explore The World of Data Analytics in Python With Digikull
No ratings yet
Explore The World of Data Analytics in Python With Digikull
9 pages
Diabetes Prediction in Pima Women Using Python
No ratings yet
Diabetes Prediction in Pima Women Using Python
39 pages
Python Programming For Economics Finance
No ratings yet
Python Programming For Economics Finance
267 pages
Lecture2 Python1
No ratings yet
Lecture2 Python1
27 pages
Pai 6
No ratings yet
Pai 6
17 pages
Python vs. R for Data Science Tasks
No ratings yet
Python vs. R for Data Science Tasks
16 pages
NCC 316 Note
No ratings yet
NCC 316 Note
78 pages
DSBDA
No ratings yet
DSBDA
145 pages
Report File
No ratings yet
Report File
40 pages
Presentation Python
No ratings yet
Presentation Python
17 pages
Python Data Wrangling Guide
No ratings yet
Python Data Wrangling Guide
24 pages
Dsbda Unit4
No ratings yet
Dsbda Unit4
110 pages
NumPy Arrays: Python Data Analysis
No ratings yet
NumPy Arrays: Python Data Analysis
40 pages
Python Basic
No ratings yet
Python Basic
145 pages
Basics of Python Programming and Statistics
No ratings yet
Basics of Python Programming and Statistics
56 pages
Jacky Bai - Pandas Hands-On - Data Analysis Crash Course (2020)
No ratings yet
Jacky Bai - Pandas Hands-On - Data Analysis Crash Course (2020)
139 pages
Software Environment
No ratings yet
Software Environment
11 pages
Introduction to Python & Jupyter Notebook
No ratings yet
Introduction to Python & Jupyter Notebook
49 pages
PythonDASE - 2025 Version1
No ratings yet
PythonDASE - 2025 Version1
44 pages
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
No ratings yet
A Report Submitted in Partial Fulfillment of The Requirement of The Award of Degree of
35 pages
Python Libraries for B.Tech Students
No ratings yet
Python Libraries for B.Tech Students
17 pages
PythonUNIT 5
No ratings yet
PythonUNIT 5
20 pages
02 Phan Tich Dau Tu Nang Cao - Co So Ve Lap Trinh Python
No ratings yet
02 Phan Tich Dau Tu Nang Cao - Co So Ve Lap Trinh Python
253 pages
UNIT-6 (Data Analytics and Visualization With Python)
No ratings yet
UNIT-6 (Data Analytics and Visualization With Python)
41 pages
B.tech It Batchno 211
No ratings yet
B.tech It Batchno 211
24 pages
Data Science Lecture No 5
No ratings yet
Data Science Lecture No 5
16 pages
Hands-On Data Preprocessing in Python
No ratings yet
Hands-On Data Preprocessing in Python
32 pages
Zomato Statistics Analysis in Python
No ratings yet
Zomato Statistics Analysis in Python
33 pages
Python U 5 ONE SHOT Notes
No ratings yet
Python U 5 ONE SHOT Notes
80 pages
IP PROJECT Vedant
No ratings yet
IP PROJECT Vedant
31 pages
Python (Basics, If-Else, Looping) 2022
No ratings yet
Python (Basics, If-Else, Looping) 2022
145 pages
Ip Project Class Xii
No ratings yet
Ip Project Class Xii
51 pages
AIES Assignment1
No ratings yet
AIES Assignment1
15 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
12 pages
R Vs Python For Data Science
No ratings yet
R Vs Python For Data Science
7 pages
Guide Python Data Science
100% (2)
Guide Python Data Science
13 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
Top 18 Python Libraries for Data Science
100% (1)
Top 18 Python Libraries for Data Science
11 pages
Plans and Prices - Chigasaki Yanagishima Campsite
No ratings yet
Plans and Prices - Chigasaki Yanagishima Campsite
2 pages
設備トラブルお問い合わせフォームEquipment trouble and failure in…
No ratings yet
設備トラブルお問い合わせフォームEquipment trouble and failure in…
1 page
Chigasaki Yanagishima Campsite - The Only Seaside
No ratings yet
Chigasaki Yanagishima Campsite - The Only Seaside
1 page
LINEMO Best Plan V Eligible! PayPay Points Giveaway Campaign (Official) LINEMO - Linemo Low-Cost SIMlow-cost Smartphone
No ratings yet
LINEMO Best Plan V Eligible! PayPay Points Giveaway Campaign (Official) LINEMO - Linemo Low-Cost SIMlow-cost Smartphone
1 page
LINEMOベストプランV対象！PayPayポイントプレゼントキャンペーン｜【公式】LINEMO - ラインモ｜格安SIM／格安スマホ
No ratings yet
LINEMOベストプランV対象！PayPayポイントプレゼントキャンペーン｜【公式】LINEMO - ラインモ｜格安SIM／格安スマホ
1 page
Python OOP Exercises
100% (3)
Python OOP Exercises
245 pages
1 PENTAGO Game
No ratings yet
1 PENTAGO Game
4 pages
ESIoT Lab 2 - Programming Fundamentals With Python and Problem-Solving For The IoT
No ratings yet
ESIoT Lab 2 - Programming Fundamentals With Python and Problem-Solving For The IoT
16 pages
Annual CL Ix 2025
No ratings yet
Annual CL Ix 2025
3 pages
"Guide to Becoming a Data Scientist"
100% (1)
"Guide to Becoming a Data Scientist"
16 pages
Python BMDS for Toxicology Modeling
No ratings yet
Python BMDS for Toxicology Modeling
7 pages
Resume ParagSinghal
No ratings yet
Resume ParagSinghal
1 page
Crack The Code - Python Interview Prep
No ratings yet
Crack The Code - Python Interview Prep
3 pages
Module 1
No ratings yet
Module 1
5 pages
Python Programming - Edureka Course Content
No ratings yet
Python Programming - Edureka Course Content
8 pages
Huawei Confidential Huawei Confidential
No ratings yet
Huawei Confidential Huawei Confidential
33 pages
GPIO Zero
No ratings yet
GPIO Zero
216 pages
Web Based EV Charging Station Finder and Slot Booking
No ratings yet
Web Based EV Charging Station Finder and Slot Booking
5 pages
(Ebook PDF) Introduction To Programming Using Python An 1 PDF Download
100% (1)
(Ebook PDF) Introduction To Programming Using Python An 1 PDF Download
53 pages
Computer Vision Research
No ratings yet
Computer Vision Research
2 pages
DSA Lab 1
No ratings yet
DSA Lab 1
21 pages
Abhyudya Bhatnagar: Education
No ratings yet
Abhyudya Bhatnagar: Education
2 pages
Python Using AI Workshop Notes
No ratings yet
Python Using AI Workshop Notes
21 pages
Python Cho
No ratings yet
Python Cho
8 pages
Python Paper
No ratings yet
Python Paper
1 page
TPI Python Special Class (Level 1 - Timetable)
No ratings yet
TPI Python Special Class (Level 1 - Timetable)
1 page
Better Python Code
100% (1)
Better Python Code
289 pages
Int Report
No ratings yet
Int Report
20 pages
Python Training Report
No ratings yet
Python Training Report
6 pages
INF1511 Notes For Exam
No ratings yet
INF1511 Notes For Exam
22 pages
Movie Ticket Booking System Cs Project STD - Xii - Nandini Narayanan and Samanyu Nikam
No ratings yet
Movie Ticket Booking System Cs Project STD - Xii - Nandini Narayanan and Samanyu Nikam
39 pages
Insurace (1) REPORT
No ratings yet
Insurace (1) REPORT
35 pages
CSC243 Object-Oriented Programming Syllabus
No ratings yet
CSC243 Object-Oriented Programming Syllabus
5 pages
Half Yearly Syllabus 11commerce
No ratings yet
Half Yearly Syllabus 11commerce
3 pages
Python Sales Data Visualization Guide
No ratings yet
Python Sales Data Visualization Guide
19 pages

Boost Your Python Development Speed by 80x With PyKX 1703203067

Uploaded by

Boost Your Python Development Speed by 80x With PyKX 1703203067

Uploaded by

80x faster | 600x more efficient

Accelerate Your Python

PyKX can help, see the example below:

Easy-to-read syntax: Python uses indentation to define code blocks, making it

Add to that an unsurpassed community of libraries powering modern data science

Consequently, bringing these two technologies closer together makes a lot of

With Python, functionality can be written to calculate time intervals between

From a machine learning model registry, an appropriate model can be selected:

Two ways to get started

2. Visit the KX Academy page to access PyKX in a sandbox environment to learn

You might also like