Applications of Artificial Intelligence To Cryptography
Applications of Artificial Intelligence To Cryptography
ARROW@TU Dublin
2020
Napo Mosola
University of KwaZulu-Natal
Part of the Computer Engineering Commons, and the Electrical and Computer Engineering Commons
Recommended Citation
Blackledge, J. & Mosola, N. (2020) Applications of Artificial Intelligence to Cryptography, Transactions on
Machine Learning & Artifical Intellengence6th June 2020. doi:10.14738/tmlai.83.8219
This Article is brought to you for free and open access by the School of Electrical and Electronic Engineering at
ARROW@TU Dublin. It has been accepted for inclusion in Articles by an authorized administrator of ARROW@TU
Dublin. For more information, please contact [email protected], [email protected],
[email protected].
Volume 8 No 3
ABSTRACT
This paper considers some recent advances in the field of Cryptography using Artificial Intelligence (AI). It
specifically considers the applications of Machine Learning (ML) and Evolutionary Computing (EC) to
analyze and encrypt data. A short overview is given on Artificial Neural Networks (ANNs) and the principles
of Deep Learning using Deep ANNs. In this context, the paper considers: (i) the implementation of EC and
ANNs for generating unique and unclonable ciphers; (ii) ML strategies for detecting the genuine
randomness (or otherwise) of finite binary strings for applications in Cryptanalysis. The aim of the paper
is to provide an overview on how AI can be applied for encrypting data and undertaking cryptanalysis of
such data and other data types in order to assess the cryptographic strength of an encryption algorithm,
e.g. to detect patterns of intercepted data streams that are signatures of encrypted data. This includes
some of the authors’ prior contributions to the field which is referenced throughout. Applications are
presented which include the authentication of high-value documents such as bank notes with a
smartphone. This involves using the antenna of a smartphone to read (in the near field) a flexible radio
frequency tag that couples to an integrated circuit with a non-programmable coprocessor. The
coprocessor retains ultra-strong encrypted information generated using EC that can be decrypted on-line,
thereby validating the authenticity of the document through the Internet of Things with a smartphone.
The application of optical authentication methods using a smartphone and optical ciphers is also briefly
explored.
Keywords: Artificial Intelligence, Artificial Neural Networks, Evolutionary Computing, Machine Learning,
Cryptography, Cryptanalysis, Radio Frequency Identification, Optical Authentication.
1 Introduction
The term Artificial Intelligence (AI) covers a range of methodologies and applications that are designed to
enable a computer to undertake tasks that are conventionally the domain of human intelligence. AI is
‘the ability of a digital computer or a computer-controlled robot to perform tasks commonly associated
with intelligent beings’ [1]. The principal test of AI is predicated on the Turing Test developed in 1950, by
Alan Turing. This is the test of a machine's ability to exhibit intelligent behavior equivalent to, or
indistinguishable from, that of a human [2]. The current applications of AI range from speech recognition,
DOI: 10.14738/tmlai.83.8219
Publication Date: 06th June, 2020
URL: http://dx.doi.org/10.14738/tmlai.83.8219
Jonathan Blackledge and Napo Mosola; Applications of Artificial Intelligence to Cryptography, Transactions on
Machine Learning and Artificial Intelligence, Volume 8 No 3 June, (2020); pp: 21-60
finance, avionics, navigation, gaming, robotics and medicine, for example. This paper has been composed
to give an overview of such applications focusing on cryptography, while referencing more technically
oriented papers that are relevant to the material. In particular, the paper focuses on the application of
Evolutionary Computing and Artificial Neural Networks for generating unique and unclonable non-linear
ciphers using real-world noise sources.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 22
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
then the inputs to the 4 nodes of the hidden layer are (from top to bottom in Figure 1) given by the
elements of the vector:
(𝑤𝑤11 𝑥𝑥1 + 𝑤𝑤12 𝑥𝑥2 + 𝑤𝑤13 𝑥𝑥3 , 𝑤𝑤21 𝑥𝑥1 + 𝑤𝑤22 𝑥𝑥2 + 𝑤𝑤23 𝑥𝑥3 , 𝑤𝑤31 𝑥𝑥1 + 𝑤𝑤32 𝑥𝑥2 + 𝑤𝑤33 𝑥𝑥3 , 𝑤𝑤41 𝑥𝑥1 + 𝑤𝑤42 𝑥𝑥2 + 𝑤𝑤43 𝑥𝑥3 )
where 𝑤𝑤𝑖𝑖𝑖𝑖 , 𝑖𝑖 = 1,2,3, 4; 𝑗𝑗 = 1,2,3 are the 12 weights required to modify the 3 inputs from the input layer
as they are fed into the 4 nodes of the hidden layer (indicated graphically by the arrows given in Figure 1).
From the description above, and, with reference to Figure 1, it is clear that there are four issues that need
to be considered in the design of any such network of input-outputs: (i) What should the size of the feature
vector be, thereby specifying the number nodes associated with the input layer?; (ii) how many outputs
are required in the output Layer?; (iii) how many hidden layers should be used and how many nodes
should each layer contain?; (iv) how can we automate the process by which the weights are initialized
and/or then adjusted? In regard to point (iv), it is necessary to design an algorithm for automation of the
adjustment of the weights subject to knowledge of the expected output(s) in the output layer. This
requires training data to be provided so that the weights can be updated by comparing the outputs they
provide with ‘target’ data.
Figure 1. Example of a simple neural network consisting of an input layer with three inputs or nodes (i.e. a
three-component feature vector) a single 4-node hidden layer and an output layer consisting of two output
nodes [3].
There are a number of different, but closely related methods (algorithms) to train an ANN by adjusting
the weights until an output to the network for a given input is the same as the training data (within a
specified tolerance). The original and arguably the most common approach, is to apply the back-
propagation algorithm [4] for application to feed forward networks. This is based on the Gradient
Descent Method; an iterative optimization algorithm for finding a local minimum of a differentiable
function. By computing the gradient of a loss function with respect to all the weights (taken to be a
discrete vector) and iterating the process, the weights can be adjusted to generate an output that is close
to the training data. The algorithm has a synergy to the application of a least squares method for solving
a system of linear equations (especially over-determined systems) subject to minimizing an error function
and is an example of dynamic programming. Application of the chain rule is used to compute the gradient
of the loss function with respect to each weight and a threshold function applied which determines
whether or not the value from one node should ‘flow’ further (i.e. continue on to the next node(s) in the
next layer).
URL: http://dx.doi.org/10.14738/tmlai.83.8219 24
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
The principal issue that is driving the deep learning approach is the increase in computing power and
efficiency. For example, Google’s Inception-V3 is based on using a network consisting of 49 layers with
more than 20 million interconnections [13]. But this is of little value unless there is a sufficient amount of
training data to compute the millions of weights now required. However, with the increase in
computational power coupled with the exponential growth in the Internet of Things (IoT), it is becoming
much easier to obtain online access to the training data required. In this sense, deep ANN’s are possible
because of the growth in the Big Data Society that is now prevalent as well as the computational
performance available. The growth in computing power, which is allowing a deep learning paradigm to
evolve, is primarily industry driven by companies such as Facebook, Google, and, by national security
services where issues such as face recognition and track and trace, for example, are primary concerns.
This is possible because super computers from some ten years ago, consisting of thousands of processor
units and accommodated in buildings with a large floor space can now be condensed into single Graphical
Processing Units (GPUs) required for high resolution gaming, for example.
Figure 2. Examples of a single layer ‘shallow’ neural network (left) that has a 5 -element feature vector with
a single layer and a deep neural network consisting of 4 hidden layers, both networks consisting of a 4-node
output layer [12].
Irrespective of the computational power available, ANNs require real world data to operate in real world
environments. Although this necessity is catered for by the development of access to big data, in some
cases, certain data may not be available or is missing from existing data fields. In this case, ANN’s can also
be of value by training them to create the missing data using other related (real world) data that is
available. Thus, not only is AI benefiting from the big data society but can be active in the development,
growth and diversity of the big data society (by creating even more data).
Coupled with the training data, the design of an ANN can vary considerably (i.e. the number of layers, the
nodes per layer and homogeneity or otherwise of the connections provided etc.) as can the details of the
algorithms designed to compute the weights. This includes diffusing the values of the weights over a layer
and/or a partition through the layers to a specific depth by applying various spatially invariant filters using
the (discrete) convolution operation. These Convolutional Neural Networks (CNNs) were first introduced
in the early 1980’s. They are a class of deep learning ANNs developed for and applied mainly in the fields
of digital image processing and computer vision [14]. In this context, they are replacing many digital image
processing algorithms originally developed for pattern recognition using (among other techniques)
convolutional filters to segment images into regions of similarity and/or discontinuity (edge detection, for
example) [15]. This approach has and continues to be used to develop a set of deterministic, geometric
URL: http://dx.doi.org/10.14738/tmlai.83.8219 26
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
with both data encryption and cryptanalysis. In the latter case, this can be extended to problems
associated with information hiding when it is required to detect the signature of a plaintext hidden in
another plaintext (Steganalysis) and the signature of a ciphertext hidden in a plaintext
(Steganocryptanalysis). In the following section, we briefly review the role of ANNs in Cyber Security.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 28
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
write the integer plaintext and cipher vectors as binary strings (using ASCII, for example). In binary space,
the ciphertext is then given by 𝒄𝒄 = 𝒙𝒙 ⊕ 𝒑𝒑 where ⊕ denotes the binary exclusive OR (XOR) operator, the
decrypt being given by 𝒑𝒑 = 𝒙𝒙 ⊕ 𝒄𝒄. In this case, the vector notation used to denote the binary space data
𝒄𝒄, 𝒙𝒙 and 𝒑𝒑 denotes binary strings that are of a finite length L>N where typically, L>>N for a standard
plaintext massage.
Whatever the method of encryption that is implemented, a principal issue is how to design an algorithm
or a class of algorithms that output a cipher with properties that are consistent with strong encryption.
These properties include ensuring that 𝒙𝒙 is statistically unbiased so that the histogram of the cipher is
uniformly distributed and equally so, has a power spectral density function that is uniform. Most cipher
generating algorithms are based on iterations in which the initial value is the key. They produce pseudo
random number streams for which certain critical conditions are required to be met. These conditions
include: (i) ensuring that the algorithm generates random numbers that are equally and uniformly
distributed irrespective of the key that is used; (ii) given that the cycle length of any finite state
computations is itself finite, is the characteristic cycle length of the iteration longer than the length of the
plaintexts that are to be used, thereby avoiding patterns in the random number stream that are correlated
cyclically.
There are numerous cipher-generating algorithms based on the design of Pseudo Random Number
Generators (PRNGs) that have been developed. They tend to fall into three classes which generate: (i)
decimal integer random number streams; (ii) floating point random number streams; (iii) random binary
streams. The traditional focus has been on the computation of integer streams because of the
computational efficiency associated with integer arithmetic. This has typically involved the coupling of
modular arithmetic with prime numbers which is why prime number-based cryptography has evolved in
the way that it has. By way of an example, the Blum Blum Shub (BBS) cipher [22] is a PRNG first proposed
in 1986 that is based on the iteration (for 𝑘𝑘 = 0, 1, 2, . . . , 𝑁𝑁 − 1)
𝑥𝑥𝑘𝑘+1 = 𝑥𝑥𝑘𝑘2 mod (𝑀𝑀), 𝑀𝑀 = 𝑝𝑝𝑝𝑝 (2)
where p and q are prime numbers and 𝑥𝑥0 is the key (the initial condition or ‘seed’) which is a co-prime
to M meaning that p and q are not factors of 𝑥𝑥0 and not 1 or 0. This algorithm operates on, and outputs a
string of pseudo random values that are decimal integers.
The design of cipher generating algorithms that output decimal integers evolved in parallel with the
limited computing power available primarily, the limited processing time associated with floating point
array processing. With the more recent development of fast floating-point processors and co-processors
and specialist real-time digital signal processors, floating-point cipher generation has been able to exploit
the study of chaos to produce a new class of chaotic iterators that are based on non-linear iterations and
require high precision floating-point arithmetic to be performed.
An example of such an iterator is the Vurhulst cipher given by
𝑥𝑥𝑘𝑘+1 = 4𝑟𝑟𝑥𝑥𝑘𝑘 (1 − 𝑥𝑥𝑘𝑘 ), 𝑟𝑟 ∈ (0,1) (3)
which has a certain synergy with the BBS PRNG given that both are quadratic iterators. In this case, the
initial condition 𝑥𝑥0 is a floating-point number between 0 and 1. However, this iteration only provides full
chaos when r=1 which is prohibitive given that 𝑥𝑥𝑘𝑘 converges and then bifurcates multiple times as r
Thus, for 𝑘𝑘 = 1, it is possible to invert this equation to obtain the key 𝑥𝑥0 meaning that the key can be
obtained from knowledge of the first iteration (or higher order iterations) which is clearly not acceptable
in the design of a cipher generating algorithm. This result has a synergy with Equation (2) in so much as
this equation can be written in the form
𝑘𝑘 mod 𝐶𝐶(𝑀𝑀)
𝑥𝑥𝑘𝑘 = �𝑥𝑥02 � mod 𝑀𝑀
where 𝐶𝐶 is the Carmichael function [25]. Thus, in order to use an iteration of the type given by Equation
(4), the nonlinear function must be proved to be a one-way function and not to have an equivalent
analytical solution that is invertible.
2.1.2 Equal Probability of States
What-ever nonlinear function is applied, modified or invented, the distribution of 𝒙𝒙 is rarely uniform and
the vector must be post-processed to generate such a distribution. This can be done by windowing all
values of the vector that conform to a uniform distribution. However, this wastes data and processing
time required to generate it, given that many computed floating-point values may have to be discarded.
This is an important issue because in assuming the existence of one-way functions, there can exist
probability distributions, which are not uniform and are not even statistically close to a uniform
distribution, but are nevertheless, computationally indistinguishable from real-world chaos [26]. Hence,
checking for equal probability of the states is fundamental. One way of enforcing this requirement is to
URL: http://dx.doi.org/10.14738/tmlai.83.8219 30
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
implement a partitioning strategy to output a random binary string as illustrated in Figure 3, which shows
the histogram for Equation (3) and is uniformly distributed for 𝑥𝑥𝑘𝑘 ∈ [0.3,0.7] (approximately). A binary
string can therefore be created by outputting a ‘0’ when 𝑥𝑥𝑘𝑘 ∈ [0.3,0.5] and a ‘1’ when 𝑥𝑥𝑘𝑘 ∈ (0.5,0.7] or visa-
versa. In this way, a so-called maximum entropy bit stream cipher is obtained in which the encryption
process is undertaken using a standard XOR operation.
Figure 3. Histogram of the output for a Vurhulst process given by Equation (3) and the partitioning strategy
used to generate a maximum entropy binary cipher [24].
Further examples of such ciphers include the functions or maps given in Table 1 which specify the form of
the function given in Equation (4). These are examples of stream ciphers obtained by inputting a key
𝑥𝑥0 ∈ (0,1), optimizing their output (through adjusting the value of 𝑟𝑟) to produce a chaotic number-stream
and post processing this output to generate a stream that is uniformly distributed [27].
Table 1. Examples of chaotic maps for the generation of stream ciphers.
Not all invented or otherwise maps are suitable from a probabilistic point view even though they may
exhibit chaos. For example, the map 𝑓𝑓(𝑥𝑥) = 𝑟𝑟|1 − tan [sin(𝑥𝑥)]|, 𝑟𝑟 = 1.5 has a highly non-uniform
distribution over all probability states and can therefore not be conditioned to produce a maximum
entropy cipher using the partitioning strategy illustrated in Figure 3, for example.
We require that λ > 1 for all 𝑥𝑥0 ∈ (0,1). A high, but strictly positive value of the Lyapunov exponent is
preferable because the iteration function it is taken to characterize will then generate chaotic trajectories
within a few iterations. Thus, ideally what we require is a nonlinear function for which λ ≫ 1 for all
URL: http://dx.doi.org/10.14738/tmlai.83.8219 32
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
Many aspects of EC are stochastic, but the starting point of candidate solutions can be either deterministic
or stochastic. In EC, genetic algorithms deliver methods to model biological systems that are linked to the
theory of dynamical systems, since they are used to predict the future states of the system.
Within the field of EC itself, a software system called ‘Eureqa’ was one of the first such systems to be
developed by the Cornell Creative Machine Laboratory (Cornell University, USA) and then commercialized
by Nutonian Inc. [35]. The underlying principle is to use genetic programming to generate dynamic
equations, each of which provides an increasingly better ‘fitness function’ to model a given dataset. If this
dataset is complex, such as a noise field, the system iteratively generates a sequence of non-linear
functions to approximate the input signals [36]. Thus, Eureqa is a modeling engine predicated on machine
learning, using evolutionary searches to determine an equation that best represents a given data set.
Eureqa claims to ‘automate the heavy lifting inherent in analytics and data science’ [37]. The system
automatically discovers analytical models requiring almost no human intervention and randomly creates
equations via sequences of mathematical building blocks based on a combination of common functions.
It is therefore suitable for generating non-linear functions by seeding the system with data from real world
noise sources. The functions can then be iterated as required, to produce a key seeded pseudo random
number sequence.
Evolutionary Computing is associated with the field of Computational Intelligence, and like AI, involves
the process of continuous optimization. AI aims, through iterative processes, to compute a set of optimal
weights that determine the flow of information (the amplitude of a signal at a given node) through a
network that simulates a simple output subject to a complex input. In this sense, an ANN simulates a high
entropy input with the aim of transforming the result into a low entropy output. However, this process
can be reversed to generate a high entropy output from a low entropy input. In this sense, ANNs can be
used to generate ciphers by simulating natural noise once it has been trained to do so.
Diagram 1. Schematic of the processes for evolving a stream cipher using EC.
Diagram 2. Schematic of the processes for generating a stream cipher using an ANN.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 34
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
semiconductor junction. This can be provided in the form of an external USB interface manufactured and
supplied by Araneus Information Systems in Finland, for example. Their Alea II is a compact true random
number generator, also known as a hardware random number generator, non-deterministic random bit
generator, or entropy source [42].
2.4.2 Evolved Cipher Generation Examples
Given that this paper is a publication composed for the Transactions of a Society for Science and
Education, the authors have chosen to present examples based on previous and current teaching activities
associated with the Information and Communications Security course (COS 792) given as part of the
Honors program in Computer Science in the Department of Computer Science at the University of Western
Cape (UWC), South Africa. One of the course work assignments is as follows: Read the paper
‘Cryptography using Evolutionary Computing’ by Blackledge et al. 2013 [38]. Using the information and
online references discussed in this paper, develop another example of the cipher given in Section V of this
paper using the same approach and online facilities.
Figure 4 shows a screen shot of Eureqa which generated the map given by the following non-linear
function:
𝑓𝑓(𝑥𝑥) = 0.7529 − 0.4697 sin(0.7529 + sin(sin(sin(−4.334 × 105 cos(𝑥𝑥)))))
− 0.569𝑥𝑥 cos(𝑥𝑥) acos(𝑥𝑥) sin (4.059𝑥𝑥 2 sin (sin (−4.331 × 105 cos(𝑥𝑥)) − 2.212 × 105 cos (𝑥𝑥)))
Figure 4. Screen shot of Eureqa evolving a nonlinear equation to approximate an input vector consisting of
random data obtained from Random.org.
obtained by one of the students of the 2018 class [43]. The figure shows the outputs of the system which
includes the list of equations obtained post 100 iterations, the solution chosen, its goodness of fit, the
correlation coefficient, the maximum error and the mean squared error. The plots given in Figure 4 are
of the normalized solution fit to an input vector consisting of 138 random data elements (top-right) and
the solutions accuracy (Error) plotted against the complexity (lower-right) as the iterations evolve
different equations (of greater complexity but with a lower error). The selected equation had a 50%
Stability and 76.2% Maturity and used 95% of the training data obtained from Random.org [41].
By repeating the process and using Eureqa to generate functions that approximate different random
vectors (obtain using different noise sources or different noise fields from the same source), it is possible
to generate an unlimited number of key dependent stream ciphers by iterating the function obtained
2.5 Discussion
While the output maps evolved by Eureqa should be routinely tested against the NIST CAVP (since the
maps are essentially an approximation of real-world noise), it can be expected that they will conform to
the basic tests for randomness. Thus, the distribution will, by-default, be uniformly distributed and there
is no need to apply a partitioning strategy as illustrated in Figure 3, for example, to generate a maximum
entropy cipher. In principle, any evolutionary computed cipher can be iterated according to Equation (4),
so that it can be used a number of times for a different key 𝑥𝑥0 ∈ (0,1).
Another approach is to use an EC ciphers once and once only, in which case, iteration is not required. This
approach over comes the issue of ensuring that the cipher is structurally stable so that the iterator has
(almost) the same cycle and Lyapunov exponents for all initial condition (given that most of the known
pseudo-chaotic systems do not possess this property). However, this assumes that a large data-base of
such ciphers has been created a priori and/or can be created in real time as and when required. But since
each cipher requires a real-world noise sequence to generate a map, it is arguable that instead of using
EC to evolve a map, the original data might be used for encryption instead, which then obviates the need
for EC in the first place. On the other hand, since either the map or the data from which it has been
evolved must be exchanged between Alice and Bob, the exchange of a map will be preferable given that
maps such as those given in Table 2 require less bits to transmit over an unsecured network using a known
key exchange algorithm. In either case, a key exchange algorithm is necessary to exchange the map (once)
URL: http://dx.doi.org/10.14738/tmlai.83.8219 36
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
and then the keys used to generate different initial conditions for each plaintext, or the map alone. In the
latter case, the map becomes equivalent to the key for a known algorithm.
In addition to providing a cipher, the method can also be used to generate encryption keys. This has been
considered in [44] which uses EC to generate keys for encrypting cloud-bound data with Advanced
Encryption Standard (AES-128, 192 and 256), the Data Encryption Standard (DES and 3DES) and a novel
cryptosystem called ‘Cryptor’. One of the problems faced in cryptography is key management. The key
management in this case is addressed using an ANN to learn patterns of the encryption key. Once learnt,
the key is then discarded to thwart an attack on the key. The key is then reconstructed by the ANN for the
purpose of decryption.
2.5.1 Issues on Practical Implementation
An important issue is that EC generated maps can take a significant amount of CPU time to evolve on a
standard personal computer, for example, and once generated, may require a large number of high
precision floating point operations to compute. In general, the longer EC is left to evolve a function, the
more complex the output map becomes. However, this issue is essentially a hardware development
concern, and, like the rise of deep learning using complex and computationally intensive ANNs (as
discussed in Section 1.3), provides a way forward to generating an unlimited number of unique and
unclonable ciphers without the maps having to be ‘designed by hand’. This makes available a solution to
encrypting data using one-time pads where an evolutionary computed cipher (and the key used to set the
initial condition as required) is used once and once only, after which it is discarded. In this context, a
known algorithm attack becomes a legacy of the past and conventional PRNGs such as those listed in [45]
may eventually become of historical interest alone.
The purpose of allowing an EC system such as Eureqa to iterate for what may be many hours on a
conventional CPU, is to evolve a cipher that provides a best approximation to the input noise. However,
for applications in cryptography, this may not be required as the output obtained after just a few iterations
may provide a ‘solution’ of the type specified by Equation (4) that generates suitable chaos, thereby
significantly reducing the evolutionary computational time required. Further, such maps can be
implemented on a complementary basis in which a data base of maps (constructed a priori) is accessed
so that different maps can be randomly selected and used to encrypt different blocks of data over a
randomized window of plaintext – multi-algorithmic cryptography.
Practical cryptography is based on passing known statistical tests available at [40], for example, which is
designed to ensure the pseudo-random property of a generator. Pseudo-random sequences are used
instead of truly random sequences in most cryptographic applications. This is because, a symmetric crypto
system can then focus on the application of a key to initiate an iterative formula, a formula that is known
to both Alice and Bob and preferably in the public domain. We have considered a method of designing
algorithms for generating pseudo random (chaotic) sequences using truly random strings to evolve an
iterator that is taken to be an approximation to these sequences. This approach pays no attention to the
algorithmic complexity of the iterator which is one of the main problems in the application of chaos to
cryptography. Neither does it consider the structural stability of the iterator or its algorithmic complexity.
However, it does provide a practical solution to the problem of developing a large database of PRNGs in
the application of personalizing encryption algorithms for strictly 1-to-1 communications or ‘1-to-Cloud’
URL: http://dx.doi.org/10.14738/tmlai.83.8219 38
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
then the spectrum of the plaintext is given by 𝑃𝑃 = 𝑋𝑋𝑋𝑋. In practice, this approach requires application of
the Fast Fourier Transform. Also, in practice, the condition that |𝑋𝑋|2 > 0 can be satisfied by letting |𝑋𝑋|2 =
1 if |𝑋𝑋|2 = 0 where the ′0′ is taken to be a floating point number whose value is below the floating point
accuracy available on a given processor. This method has been successfully applied for encrypted
information hiding (specifically, image Steganocryptography) – see [49], [50] and [51], for example. One
of the more useful properties that convolution-based encryption provides, is that it generates a ciphertext
that is data redundant. This facilitates a decrypt subject to a ciphertext with relatively significant data
error compared to an XOR process which requires bit error correction algorithms to be applied if the
ciphertext incurs errors due to transmission noise, for example.
Another method of encryption is to apply a phase-only process [52], [53]. In this case, the ciphertext is
obtain using the equation 𝑐𝑐𝑘𝑘 = 𝑝𝑝𝑘𝑘 𝑒𝑒 𝑖𝑖𝑥𝑥𝑘𝑘 , the decrypt then being given by 𝑝𝑝𝑘𝑘 = 𝑐𝑐𝑘𝑘 𝑒𝑒 −𝑖𝑖𝑥𝑥𝑘𝑘 . This is a process
that can be applied in any dimension and in data or frequency space. In data space, however, the
ciphertext is complex and therefore increases the data by a factor of 2. On the other hand, the plaintext
throughput can be doubled in size by using both real and imaginary components of a complex vector 𝒑𝒑.
It is of course possible to intercept the ciphertext, compute |𝑐𝑐𝑘𝑘 | and then attempt to retrieve the phase
from this data. However, the phase retrieval problem is severely ill-posed with no uniformly stable
solutions. Moreover, the practicality of implementing phase retrieval algorithms is dependent on the
dimension. It is well known that for the two-dimensional case, phase estimation algorithms have been
developed to provide approximate solutions. These solutions tend to rely on the data (whose amplitude
spectrum is known) being sparse. However, for the one-dimensional case, the phase retrieval problem is
ambiguousness, the determination of the phase within the extensive solution set being challenging and
only able to be considered under suitable a priori assumptions or additional information. This is a
consequence of the Fundamental Theorem of Algebra which states that every single-variable polynomial
with complex coefficients has at least one complex root. This theorem fails for polynomials of two
variables. The inability to factor polynomials of two variables is the reason why the two-dimensional
phase retrieval is possible and prevents the one-dimensional phase retrieval problem from being solved.
This is because the ability to factor polynomials generates ambiguities where multiple phases correspond
to the same data. Thus, any attack associated with attempting to solve the one-dimensional phase
retrieval problem will no doubt continue to remain a significant challenge for a cryptanalyst.
2.5.3 Natively Binary Chaos
The methodology discussed in Section 2.3 produces ciphers that are based on floating point iterators. In
some cases, the output of such iterators can be converted into binary streams either through necessity
(such as with the segmentation scheme illustrated in Figure 3) or through a preference to encrypt in binary
space (when developing real-time systems, for example). In the latter case, for example, any algorithm
that provides a positive only floating-point cipher 𝒙𝒙 ∈ [𝑥𝑥min , 𝑥𝑥max ], say, can be post-processed to
generate a cipher 𝒙𝒙 ∈ [0,1] using the equation
𝒙𝒙 − 𝑥𝑥min
𝒙𝒙: =
||𝒙𝒙 − 𝑥𝑥min ||∞
The binary string is then obtained by applying a round transformation which rounds each element of 𝒙𝒙 to
the nearest integer (in this case, 0 or 1). It is therefore natural to ask whether it is possible to design
algorithms that output binary strings directly without the need for conversion from decimal integer form,
URL: http://dx.doi.org/10.14738/tmlai.83.8219 40
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
𝑆𝑆 = − � 𝑃𝑃𝑛𝑛 log𝑃𝑃𝑛𝑛
𝑛𝑛=1
The higher the entropy of a signal becomes the greater its ambiguity, and, in this context, information
entropy is a measure of the unpredictability or randomness of any message contained in the signal. This
is typically determined by the noise that distorts the information contained in a signal. In general, the
information entropy associated with the transmission of information in a signal tends to increase with
time. This is due to the increase in noise with time that distorts any signal as it propagates in time, the
sources of this noise being multi-faceted but tending to a Gaussian distribution as a consequence of the
Central Limit Theorem.
Instead of considering a digital signal composed of decimal integer or floating-point elements, let us now
consider the signal to be a binary string whose elements can take on only two values, namely 0 or 1, which
are mutually exclusive. In this case, the signal will have a binary distribution 𝑃𝑃𝑛𝑛 , 𝑛𝑛 = 1,2 consisting of just
two bins and the Binary Information Entropy (BIE) function, denoted by 𝐻𝐻, becomes
There have been a number of algorithms designed to compute various entropy-based metrics for the
determination or measurement on the randomness, regularity, irregularity, order, disorder and entropy
for binary and other strings [59], [60]. These include measures that aim to characterize randomness and
disorder through the entropy of finite strings such the Approximate Entropy [61], [62], Sample Entropy
[63] and Fuzzy Entropy [64]. Such metrics are based on algorithms that can be classified as follows: (i)
moving window methods that examine sub-strings of the original; (ii) algorithms which generate a metric
based on the entire string length. Applications include basic randomness tests [65], [66] and cryptanalysis
[67]. This includes the development of a BiEntropy measure which is based upon a weighted average of
the Shannon entropy for all but the last binary derivatives [68].
These metrics are the product of many studies that have been and continue to be undertaken to develop
suitable tests and measures based on the computation of a single metric. While desirable
URL: http://dx.doi.org/10.14738/tmlai.83.8219 42
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
computationally, focusing on the use of a single metric for this purpose is restrictive and may be
statistically insignificant because of its self-selecting data predication. For this reason, we consider a
complementary approach to the problem which is based on the application of the Kullback-Leibler
Divergence for a stream of data that yields a statistically significant result as opposed to a single metric.
This provides the foundations for the application of a machine learning approach.
where 𝑃𝑃𝑛𝑛 is the binary histogram of binary string 𝑓𝑓𝑘𝑘 and 𝑄𝑄𝑛𝑛 is the binary histogram of some reference
binary string 𝑔𝑔𝑘𝑘 , both binary strings being finite and of the same length.
Suppose the string 𝑓𝑓𝑘𝑘 is mostly ordered or regular and not disordered (e.g. a binary string representation
of some text from a natural language) and 𝑔𝑔𝑘𝑘 is a random string. We require the metric 𝑅𝑅 to be
significantly different in terms of its numerical value to the case when both 𝑓𝑓𝑘𝑘 and 𝑔𝑔𝑘𝑘 are random binary
strings. Ideally, what is required is to establish a threshold for the value of 𝑅𝑅 below which 𝑓𝑓𝑘𝑘 can be
classified as ordered say, and above which, 𝑓𝑓𝑘𝑘 can be classified as random. This is an example of a binary
decision-making process that may not be statistically significant and may not necessarily be able to
‘monitor’ a transition from 𝑓𝑓𝑘𝑘 being random to non-random. Thus, suppose we consider the relative
entropy digital signal given by
𝑄𝑄
𝑅𝑅𝑚𝑚 = − ∑2𝑛𝑛=1 𝑃𝑃𝑛𝑛𝑛𝑛 log 2 � 𝑃𝑃𝑛𝑛𝑛𝑛� , 𝑚𝑚 = 1, 2, … , 𝑀𝑀 (8)
𝑛𝑛𝑛𝑛
where 𝑃𝑃𝑛𝑛𝑛𝑛 is the 𝑚𝑚th binary histogram of the input binary string with 𝑃𝑃𝑛𝑛1 = 𝑃𝑃𝑛𝑛2 = ⋯ = 𝑃𝑃𝑛𝑛𝑛𝑛 and 𝑄𝑄𝑛𝑛𝑛𝑛 is
the 𝑚𝑚th binary histogram of an 𝑚𝑚th random binary string obtained using a PRNG or is obtained from a
real-world random binary string source. To illustrate the characteristics of this relative entropy signal, we
consider the following cases (which are referred to as such in regard to presenting the results that follow):
• For Case (i), when 𝑓𝑓𝑘𝑘 , is a non-random string, 𝑅𝑅𝑚𝑚 and has a Gaussian-type distribution.
• For Case (ii), when both strings are random, 𝑅𝑅𝑚𝑚 also has a Gaussian-type distribution.
In both cases, application of the Jargue-Bera test for normality shows that the null hypothesis must be
rejected, i.e. the series 𝑅𝑅𝑚𝑚 does not actually conform to a normal distribution even though it appears to
have the characteristics of one (as given in Figure 5). Nevertheless, Figure 5 demonstrates the ability for
Equation (8) to provide a statistically significant measure in regard to evaluating the randomness or
otherwise of a finite binary string. The difference in the mean 𝑅𝑅� of the relative entropy signal for the two
cases is at least two orders of magnitude and can therefore easily be used to differentiate between a
random and non-random binary string. But this is just one of many statistical metrics that may be
computed from 𝑅𝑅𝑚𝑚 by computing its J-bin histogram 𝐻𝐻𝑗𝑗 , 𝑗𝑗 = 1, 2, . . . , 𝐽𝐽 and evaluating the 𝑟𝑟 th central
moment (the moments about the mean 𝑅𝑅� ) given by
𝐽𝐽
providing the variance (𝑟𝑟 = 2), the skewness (𝑟𝑟 = 3) and Kurtosis (𝑟𝑟 = 4), for example, coupled with
metrics such as the median and the mode given by ||𝐻𝐻𝑗𝑗 ||∞ .
Given the demarcation between a random and non-random binary string using Equation (8), the potential
exists to compute further statistical metrics and other parameters based on an analysis of the signatures
given in Figure 5. These may include the higher statistical moments and spectral properties of the digital
signal 𝑅𝑅𝑚𝑚 , for example, designed to develop a feature vector whose purpose is to provide a multi-class
classification using an ANN starting with a simple four component feature vector consisting of the Mean,
Standard Deviation, Median and Mode.
Since all data can be expressed as a binary string, irrespective of the code used to do so (i.e. ASCII or
otherwise), the approach compounded in utilizing Equation (8) provides a relatively generic method of
differentiating between random and non-random binary strings. Unlike other approaches to solving this
problem, this approach is based on generating a signal computed using Equation (8) and characterizing
the distribution of the signal rather than computing a single metric for a binary string. This allows some
common statistical metrics to be used to classify changes to the distribution of the signal 𝑅𝑅𝑚𝑚 and how
these changes can form the basis for a machine learning approach using complementary statistical
parameters for the signal, coupled with metrics associated with the spectrum of the signal and other
transformations. The purpose of this is to provide an analysis of a binary stream that can specify whether
it is truly random, intelligible (i.e. the product of some communicable information, for example) or
partially random in some way. This provides the ability to continuously monitor binary streams to test
for plaintext (intelligible) or encrypted (non-intelligible and random) data.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 44
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
Figure 5. Plots of the 𝑅𝑅𝑚𝑚 (left) given by Equation (8) for 𝑀𝑀 = 8000 and the associated 100-bin histograms
(right) for Case (i) (above) and Case (ii) (below), respectively.
Figure 6. Schematic illustration of the application to a bank note of a flexible IC with a unique ID number
(encryption cipher/key) maintained on a non-programmable coprocessor with a ROM.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 46
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
Figure 7. Example of the IoT return for a banknote with an embedded flexible IC using a smartphone to read
and transmit the ciphertext and provide the user with a decision on the authenticity of the banknote, i.e.
GENUINE (left) or counterfeit and NOT AUTHORISED (right).
4.2 Optical ID
In addition to using online authentication technologies associated with unique RFID’s on a smartphone, it
is also possible to consider active ‘optical solutions’ using the digital camera of a smartphone to obtain
digital images in ambient light and/or flash light conditions. The retro-reflector method described here is
based on the latter case. Compared to RFID, optical ID provides the opportunity to develop a range of
optical ciphers and associated identifiers, but this comes at the cost of having to process and transfer
digital images using the IoT which requires higher data throughputs than those associated with a RFID.
On the other hand, relatively simple print-only and/or additive optical elements and complexes can be
included that are simple and cost effective as discussed in this section.
4.2.1 Printed Binary Texture Coding
We consider an approach which is based on the introduction of binary texture codes to a document.
Texture codes are based on applying convolutional coding to an image plaintext based on Equation (6) to
produce a ciphertext image [51]. The method is relatively robust to distortions of the ciphertext accept
in regard to three issues, size, rotation and cropping. In other words, if the ciphertext is printed on a
document and then scanned, the scanned image must be cropped correctly, re-oriented (as necessary)
and then re-sampled back to the original size of the plaintext image. However, auto-orientation and auto-
cropping are routine features of most scanning and remote imaging systems as is the re-sampling of a
digital image. Moreover, if the plaintext image is binary, the texture code is data redundant and can
therefore can be binarized. Correlation of this binary texture code with the original cipher then recovers
the binary plaintext with background noise but with a high signal-to-noise ratio thereby allowing the
plaintext to be identified.
Figure 8 shows an example of binary texture coding for a Euro banknote. In this case, the serial number
has been used as the binary plaintext, and the binary texture code printed in ultraviolet ink over the entire
surface of the banknote at 150 dpi. Under ultraviolet light, an image of the texture code is captured and
upon, auto-cropping, auto-correcting (for orientation, as required) and re-sampling the image, the result
is decrypted to recover the serial number. In this case, an EC cipher can be used and iterated, according
Figure 8. Example of binary texture coding applied to authenticate a bank note. Original note (left), binary
texture code (center) and decrypt to recover the serial number (right).
URL: http://dx.doi.org/10.14738/tmlai.83.8219 48
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
The key to using Laser Speckle Analysis for document authentication is to appreciate that the speckle
pattern is a signature of the surface topology (on the scale of the wavelength) associated with a chosen
surface patch. Moreover, if the physical parameters associated with the incident coherent radiation can
be reproduced accurately enough (including wavelength, angle of incidence of the laser beam to the
surface, the beam profile and the angle, relative to the surface at which the speckle pattern is recorded)
and the surface patch is not deformed over time, then the speckle pattern provides a unique signature of
the patch on the document, thereby providing a method of authentication. Further, this method of
authentication is passive given that the surface structure of the patch is a natural feature of the material
from which the document is composed. All that is required is to decide upon the position and the extent
of the surface patch whose speckle pattern signature is desired and then record the pattern produced for
a specific optical configuration.
Figure 9 shows a simulated speckle pattern and the characteristic distribution of the pattern plotted on a
logarithmic scale. This simulation is obtained using Equation (6) without power spectrum normalisation,
i.e. we use 𝐶𝐶 = 𝑋𝑋 ∗ 𝑃𝑃, where the Plaintext spectrum 𝑃𝑃 is an OTF given by a two-dimensional rectangular
function and 𝑋𝑋 is the spectrum of a zero-mean Gaussian distributed random field. The speckle pattern is
then given by the absolute square of the inverse Fourier transform of the cipher 𝐶𝐶 which models the
scattering cross-section (the intensity of scattered light).
The distribution of the scattering cross-section (the observed laser intensity, 𝐼𝐼 ≥ 0) associated with a
speckle pattern is well known and given by the (negative) exponential distribution
𝑃𝑃(𝐼𝐼) = 𝜆𝜆 exp(−𝜆𝜆𝜆𝜆)
where 𝜆𝜆 is the ‘rate’. Thus, under a logarithmic transformation,
ln 𝑃𝑃(𝐼𝐼) = ln𝜆𝜆 −𝜆𝜆𝜆𝜆,
a relationship that is well demonstrated in the logarithmic scaled histogram given in Figure 9.
By binarizing the speckle pattern, it is possible to generate a code similar to a QR code but one that is
produced naturally by the interaction of coherent light with a surface. Such a speckle code can be viewed
as a cipher, a cipher that is a unique signature associated with a known surface patch and optical
configuration. A speckle pattern can be considered to be another natural noise source in addition to those
discussed in Section 2.3.1, for example, and, could be used as an input to the EC systems and approach
discussed in Section 2.3. An optical cipher of the type simulated in Figure 9 could be used to encrypt the
serial number of a bank note to produce a binary texture code following the approach discussed in Section
4.2.1, for example.
Figure 9. Simulation of a speckle pattern (top-left) and the corresponding grey-level 100-bin histogram (top-
right) plotted on a logarithmic scale. A binarization of the speckle pattern designed to produce a matrix of 0’s
and 1’s of near equal population densities is shown in the lower left-hand image. The corresponding 2-bin
histogram of this image is given in the lower right-hand side bar graph.
While the methods discussed in this section and that of Section 4.2.1 can be implemented in practice,
they both require ‘laboratory conditions’ to be realised. They require the use of UV or laser light to
generate and recover the authentication patterns, patterns that need to be recovered, processed and
analysed under controlled conditions with specialist equipment and minimal error in terms of an optical
recording configuration. Although the analysis of such patterns for authentication could be considered
using Deep Learning, for example, it is clearly not currently possible to implement such an approach in
the field using smartphones which are not equipped with UV and laser light sources. Thus, in the following
section, another approach is considered that is compatible with the optical facilities provided by a
smartphone and can be operated in the field.
4.2.3 Micro-Retro Reflector Texture Coding
Although binary texture codes can be generated without UV printing, a UV print provides a way of hiding
the existence of the code under visible ambient lighting conditions. In this way the cipher remains hidden
in the optical spectrum. Another way to do this (i.e. introduce a cipher that is invisible under ambient
lighting conditions) is to use a random distribution of micro retro-reflectors which is set into the surface
of the document providing a random pattern that is unique to that document. Under ambient lighting
conditions, the retro-reflector complex will not be observed visually and will not be detected in a captured
digital image of a smartphone. However, with a halogen flash that is typically coupled with most modern
smartphone cameras, the retro-reflectors will reflect the light and their distribution (or rather the
combined effect of that distribution) on the document captured in the digital image that is retained.
Retro-reflectors are devices that reflect light back to its source with minimal scattering [74]. There are
many types of such devices including corner reflectors, prism-based reflectors and phase-conjugate
mirrors but the most common type makes use of micro retro-reflectors which are each composed of small
glass beads or micro-spheres. The beads vary in size from 0.1-1 mm and can be added to paint and
lacquers which then cause the surface of the material to which the paint/lacquer has been applied to
glow. Highway signs and painted stripes on roadways, for example, have been coated with micro-
spherical retro-reflectors for many years. Various reflective sheets are used to coat signs, some of which
employ layers of glass beads, while others use sheets embedded with micro-plastic corner prisms or
micro-prisms.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 50
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
Figure 10 shows an example of a bank note which has been coated with Glowtec fine grade retro-reflective
powder (a pigment for use in spirit and water-based mediums) [75] over a rectangular window within
central component of the main graphic. The powder is composed of retro-reflective glass microspheres
and has been introduced by adding the powder to a clear nail varnish and then applying the varnish to the
surface of the note with the (nail varnish) brush. The size of the microbeads is 30-40 microns each with a
refractive index of 1.93 and its application to the surface using a clear lacquer produces a naturally random
distribution of microbeads which are unique for any particular note. Under ambient lighting conditions,
the existence and distribution of the micro-spheres is not visible and the captured image of the bank-note
on a smartphone is ‘normal’. However, application of a halogen flash reveals the rectangular region over
which the powder doped lacquer has been applied. Thus, by cropping this region of the image, a unique
optically generated cipher is obtained which is a combination of the physical micro-bead distribution
subject to the resolution at which the image is captured.
Figure 10. Smartphone images of a banknote with reflective powder applied under ambient lighting
conditions (left) and with a LED halogen flash (right). The reflective power, consisting of a randomized complex
of micro retro-reflectors, has been applied by doping a clear nail varnish with the powder and ‘painting’ it onto a
rectangular patch that covers the central component of the notes principal graphic as shown on the example on
the right-hand-side.
Diffusing a grey level version of this image using Equation (6) with a critical feature of the banknote such
as a binary image of the serial number, a ciphertext can be generated and stored in a data base. This
ciphertext is then a uniquely encrypted record of the banknotes serial number coupled with the unique
optical cipher on the same bank note which remains invisible under ambient lighting conditions. Figure
11 shows example images obtained by applying this process, namely, the cropped optical cipher, the
ciphertext for the serial number and a decrypt obtained by re-imaging the cipher. As discussed in Section
4.2.1, the method of encryption by convolutional coding means that the image of the cipher is relatively
tolerant to distortion other than orientation, cropping and compatibility with the original digital image
size (and the cipher).
where 𝑌𝑌 is the spectrum of an evolved cipher using EC whose key could also be the serial number of a
bank note.
In this application, ANNs will be of value in regard to the automatic recognition of the optical cipher and
digital image post-processing that may be required in order to provide a ciphertext that is suitable for
decryption. The purpose of this is to allow the recovery of the optical ID to be undertaken in non-ideal
conditions associated with the way in which users operate a smartphone subject to critical tolerances.
This includes optical clarity, depth and resolution coupled with excess distances from the banknote at
which the image is taken, for example, and other issues associated with inevitable mismanagement in the
use of an app which is critically dependent on the quality of image capture.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 52
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
created with relative ease without having to resort to ‘hand-crafted’ ciphers, many of which are now
vulnerable to attack by the implementation of Shor’s algorithm [76] on a quantum computer. This is
because encryption algorithms that are predicated on the product of large prime numbers can be broken
using Shor’s algorithm, but until the development of quantum computing, this algorithm was not time
computational feasible. In this respect, the approach considered in the paper falls in to the category of
Post-quantum Cryptography and the generation of software for prototyping quantum resistant
cryptography [77].
5.1 Conclusion
In a broad context, and, in regard to the use of AI for cryptography, there is an interesting analogy that
can be made with the work of Station-X at Bletchley Park during the second world war. In this case, the
problem was to break the key settings of the Enigma cipher used by the German Armed forces for
communications. However, the design of the cipher was known, as was the language being encrypted and
many of the expected and repetitive words, phrases and statements (passive Cribs). Moreover, the
intercepted Morse coded traffic was known to be encrypted because it was scrambled. Had the traffic
been disguised to appear routine through encrypted information hiding (not technically possible at the
time), the Morse code have been reconfigured (quite possible at the time), the language been translated
to a non-Indo-European language (possible at the time but requiring multi-lingual service providers),
repetitive words a phases negated (possible) and the Enigma machine itself changed on a regular basis
(not technically possible at the time), then history might have been very different (albeit not necessarily
tolerable). It is the last component of this analogy that is the basis for using AI to design ciphers that can
be changed regularly or even changed for each communication to give an algorithmic one-time pad.
Because this is not yet practicable for all individuals (many of whom may lack the technical IT skills), in the
interim, there is ample potential for business opportunities based on establishing on-line service
providers to distribute personalized ciphers for 1-to-Cloud applications, 1-to-1 encrypted communication
and 1-to-many applications depending on the available distribution infrastructure of an organisation.
REFERENCES
[2] Turing, A. M., Computing Machinery and Intelligence. Mind, 1950, 49: p. 433-460 [Online]
Available from: https://www.csee.umbc.edu/courses/471/papers/turing.pdf [Accessed 27
April 2020].
[4] Kostadinov, S., Understanding the Back-Propagation Algorithm, Towards Data Science, 2019.
https://towardsdatascience.com/understanding-backpropagation-algorithm-7bb3aa2f95fd
URL: http://dx.doi.org/10.14738/tmlai.83.8219 54
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
[9] Joshi, P., Artificial Intelligence with Python: A Comprehensive Guide to Building Intelligent Apps for
Python Beginners and Developers, Packet Publishing Limited, 2017. ISBN: 978-1-78646-439-
2. https://www.amazon.com/Artificial-Intelligence-Python-ComprehensiveIntelligent/dp/178646439X
[10] Turner, M. J., Blackledge, J. M. and Andrews, P., Fractal Geometry in Digital Imaging, Academic
Press, 1998. ISBN: 0-12-703970-8, 1998.
[11] Fractal Geometry: Mathematical Methods, Algorithms and Applications (Ed. J. M. Blackledge,
A. K.
[12] Evans and M. J. Turner), Woodhead Publishing: Series in Mathematics and Applications, 2002.
ISBN: 190427500.
[14] Google Cloud. AI & Machine Learning Products, Advanced Guide to Inception V3 on Cloud TPU,
https://cloud.google.com/tpu/docs/inception-v3-advanced
[15] Zhang, W., Itoh, K., Tanida, J. and Ichioka, Y., Parallel Distributed Processing Model with Local
Space-invariant Interconnections and its Optical Architecture, Applied Optics, 1990, 29(32), p.
4790–4797. https://drive.google.com/file/d/0B65v6Wo67Tk5ODRzZmhSR29VeDg/view
[16] Blackledge, J. M., Digital Image Processing, Woodhead Publishing Series in Electronic and
Optical Materials, 2005, ISBN-13: 978-1898563495. https://arrow.tudublin.ie/engschelebk/3/
[18] Maghrebi, H., Portigliatti, T., Prouf, E., Breaking Cryptographic Implementations Using Deep
Learning Techniques, Security, Privacy, and Applied Cryptography Engineering (SPACE), 6th
International Conference, 2016. [Online] Available from: https://eprint.iacr.org/2016/921.pdf
[19] Bezobrazov, S., Blackledge, J. M. and Tobin, P., Cryptography using Artificial Intelligence, The
International Joint Conference on Neural Networks (IJCNN2015), Killarney, Ireland, 12-17 July,
2015.
[21] Hoare, O., Enigma: Code Breaking and the Second World War - The True Story through
Contemporary Documents. 2002, Introduced and Selected by Oliver Hoare, UK Public Records.
[22] Vernam, G. S., Cipher Printing Telegraph Systems for Secret Wire and Radio Telegraphic
Communications, Transactions of the American Institute of Electrical Engineers, 1926, 55, p.
109–115.
[23] Blum, L., Blum, M. and Shub, M. A., Simple Unpredictable Pseudo-Random Number
Generator, SIAM Journal on Computing, 1986, 15 (2), p. 364–383.
[24] Matthews, R., On the Derivation of a ‘Chaotic’ Encryption Algorithm, Cryptologia, 1984, 8(1), p.
29–41.
[25] Ptitsyn, N. V., Deterministic Chaos in Digital Cryptography, PhD Thesis, De Montfort University,
UK and Moscow State Technical University, Russia, 2002.
[26] Erdos, P., Pemereance, C. and Schmutz, E., Carmichael’s Lamba Function, Acta Arithmetica,
1991, 58(4), p. 363-385.
[28] Blackledge, J. M., Ptitsyn, N. V. and Chernenky, V. M., Deterministic Chaos in Digital
Cryptography, First IMA Conference on Fractal Geometry: Mathematical Methods, Algorithms
and Applications (Eds. J M Blackledge, A K Evans and M Turner), Horwood Publishing Series in
Mathematics and Applications, 2002, p. 189-222.
[29] Blackledge, J. M., Cryptography and Steganography: New Algorithms and Applications, Centre
for Advanced Studies Text-books, Warsaw University of Technology, ISBN: 978-83-61993-05-6,
2012.
[30] Eiben, A. E. and Schoenauer, M., Evolutionary Computing, Information Processing Letters,
2002, 82(1), p. 1-6.
[31] Ehrenberg, R., Software Scientist: With a Little Data, Eureqa Generates Fundamental Laws of
Nature, Science News, 2012, 181(1), p. 20-21.
[33] Fogel, L. J. Owens, A. J. and Walsh, M. J., Artificial Intelligence through Simulated Evolution,
Wiley, 1966.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 56
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
[34] Holland, J. H., Adaptation in Natural and Artificial Systems: An Introductory Analysis with
Applications to Biology, Control and Artificial Intelligence. University of Michigan Press, 1975.
[35] Eiben, A. E. and Smith, J. E., Introduction to Evolutionary Computing. Springer, 2003, ISBN 978-
3-662-05094-1.
[36] Nutonian, Eureqa: The AI-Powered Modeling Engine, 2020. [Online] Available from:
www.nutonian.com [Accessed 29 January 2020].
[37] Schmidt, M. and H Lipson, H., Distilling Free-form Natural Laws from Experimental Data,
Science, 2009. [Online] Available from: https://science.sciencemag.org/content/324/5923/81
[38] Dubcakova, R., Eureqa: Software Review. Genetic Programming and Evolvable Machines, 2011,
12(2), p. 173-178.
[39] Blackledge, J. M., Bezobrazov, S. Tobin, P. and Zamora, F., Cryptography using Evolutionary
Computing, Proc. IET ISSC2013, Letterkenny, Co Donegal, Ireland, June 20-21, 2013 (Awarded
Best Paper Prize for ISSC2013). https://arrow.tudublin.ie/aaconmuscon/5/
[40] NIST, Cryptographic Algorithm Validation Program (CAVP), 2020, [Online] Available at
https://csrc.nist.gov/Projects/cryptographic-algorithm-validation-program
[43] Araneus Information Systems Oy, 2020. [Online] Available from https://www.araneus.fi/
[44] Ntolo, S., Student No. 3175732, Course Work Assignment Report, Information and Communications
Security (COS 792), Department of Computer Science, University of Western Cape, 2018.
[45] Mosola, N. N., Dlamini, M. T., Blackledge, J. M., Eloff, J. H. P. and Venter, H. S., Chaos-based Encryption
Keys and Neural Key-store for Cloud-hosted Data Confidentiality, Southern Africa Telecommunication
Networks and Applications Conference (SATNAC), 2017, September 3-10, p. 168-173.
[46] Wikipedia, List of Random Number Generators, 2020. [Online] Available from
https://en.wikipedia.org/wiki/List_of_random_number_generators
[47] Blackledge, J. M. and Ptitsyn, N. V., On the Applications of Deterministic Chaos for Encrypting Data on the
Cloud, Third International Conference on Evolving Internet, INTERNET 2011, 19-24 June, IARIA, 2011,
Luxembourg, p. 78-87. ISBN: 978-1-61208-008-6, 2011.
[48] Tobin, P., Blackledge, J. M., Bezobrasov, S., Personalized Encryptor Generation for the Cloud using
Evolutionary Computing, International Workshop on Nonlinear Maps and their Applications (NOMA-
2015), Dublin, Ireland, 15-16 June, 2015.
[50] Blackledge, J. M., Information Hiding using Stochastic Diffusion for the Covert Transmission of Encrypted
Images, Proc of IET ISSC2010 UCC Cork, 23-24 June, 2010.
[51] Blackledge, J. M. and Al-Rawi, A. R., Steganography using Stochastic Diffusion for the Covert
Communication of Digital Images, IANEG International Journal of Applied Mathematics, 2011, 41(4), p.
270-298.
[52] Blackledge, J. M., Tobin, P. Myeza, J. and Adolfo, C. M., Information Hiding with Data Diffusion Using
Convolutional Encoding for Super-Encryption, Mathematica Aeterna, 2017, 7(4), p. 319 – 356.
https://arrow.tudublin.ie/engscheleart2/127/
[53] Blackledge, J. M., Govere, W. and Sibanda, D., Phase-Only Digital Encryption, IAENG
International Journal of Applied Mathematics, 2019, 49(2), p. 212-228.
https://arrow.tudublin.ie/engscheleart2/192/
[54] Blackledge, J. M., Tobin, P., Govere, W. and Sibanda D., Phase-only Digital Encryption using a
Three-pass Protocol, ISSC2019, IEEE UK and Ireland Signal Processing Chapter, Maynooth
University, 17-18, June 2019.
[55] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A. and
Bengio, Y., Generative Adversarial Networks, Proceedings of the International Conference on
Neural Information Processing Systems (NIPS), 2014, p. 2672–2680.
https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
[56] Szilard, L., On the Decrease of Entropy in a Thermodynamics System by the Intervention of
Intelligent Beings, Zeitschrift fur Physik, 1922, 53, p. 840–856.
http://fab.cba.mit.edu/classes/862.19/notes/computation/Szilard-1929.pdf
[57] Shannon, C. E., A Mathematical Theory of Communication, The Bell System Technical Journal,
1949, 27, p. 379–423, 1948.
http://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
[58] A. N. Kolmogorov, A. N., Entropy per Unit Time as a Metric Invariant of Automorphism, Russian
Academy of Sciences, 1959, 124, p. 754-755.
[59] Sinai, Y. G., On the Notion of Entropy of a Dynamical System, Russian Academy of Sciences, 1959,
124, p. 768-771.
[60] Marsaglia, G., Random Numbers Fall Mainly in the Planes, Proc. Natl. Acad. Sci., 1968, 61(1), p.
25–28.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 58
Transactions on Machine Learning and Artificial Intelligence Volume 8, Issue 3, June 2020
[61] Gao, Y., Kontoyiannis, I., Bienenstock, E., Estimating the Entropy of Binary Time Series:
Methodology, Some Theory and a Simulation Study, Entropy, 2008, 10, p. 71-99.
[62] Pincus, S., Approximate Entropy as a Measure of System Complexity, Proc. Natn. Acad. Sci., 1991,
88, p. 2297-2301.
[63] Rukhin, A. L., Approximate Entropy for Testing Randomness, J. Appl. Prob., 2000, 37(1), p. 88-
100.
[64] Richman, J. S. and Moorman, J. R., Physiological Time-series Analysis using Approximate Entropy
and Sample Entropy, American J. of Physiology-Heart and Circulatory Physiology, 2000, 278(6),
p. H2039-H2049.
[65] Chen, W., Zhuang, J., Yu W. and Wang, Z., Measuring Complexity using FuzzyEn, ApEn, and
SampEn, Med. Eng. and Phys., 2009, 31(1), p. 61-68.
[66] Carroll, J. M., The Binary Derivative Test: Noise Filter, Crypto Aid and Random Number Seed
Selector, Simulation, 1989, 53, p. 129-135.
[67] Carroll, J. M. and Sun, Y., The Binary Derivative Test for the Appearance of Randomness and its
use as a Noise Filter, Technical Report No. 221, Department of Computer Science, University of
Western Ontario, 1998.
[68] Bruwer, C. S., Correlation Attacks on Stream Ciphers using Convolutional Codes, Masters Thesis,
University of Pretoria, 2005.
[69] Croll, G. J., BiEntropy – The Approximate Entropy of a Finite Binary String, 2013.
https://pdfs.semanticscholar.org/d8d6/bf07050f9dcd026bfafe62c079b4e309dd7a.pdf
[70] Text to Binary translator, Rapid Tables, 2019. [Online] Available from
https://www.rapidtables.com/convert/number/ascii-to-binary.html
[73] Gołofit, K. and Wieczorek, P. Z., Chaos-based Physical Unclonable Functions, MDPI Applied Sciences,
Special Issue on ‘Side Channel Attacks’, 2019. https://www.mdpi.com/2076-3417/9/5/991
[74] Allen, L. et al., The Hubble Space Telescope Optical Systems Failure Report, NASA, Technical Report,
November, 1990. https://ntrs.nasa.gov/search.jsp?R=19910003124
[78] Open Quantum Safe Project, Software for Prototyping Quantum Resistant Cryptography, 2016.
https://openquantumsafe.org/
[79] Stamp, M., Introduction to Machine Learning with Applications in Information Security, Chapman &
Hall/CRC Machine Learning & Pattern Recognition, 2018. ISBN-13: 978-1138626782.
[81] Advances of DNA Computing in Cryptography (Eds. S. Namasudra and G. Chandra Deka), Chapman and
Hall/CRC, 2018.
URL: http://dx.doi.org/10.14738/tmlai.83.8219 60