Deep Learning-Based Prediction of Test Input Validity For RESTful APIs

This document discusses a deep learning approach for predicting the validity of API requests (test inputs) before calling the actual API. The model is trained on API requests and responses from previous test cases of five real-world RESTful APIs. Preliminary results show the approach can predict test input validity with over 90% accuracy, avoiding invalid requests and reducing human intervention in testing RESTful APIs.

Uploaded by

Burak Tuncer

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Deep Learning-Based Prediction of Test Input Validity For RESTful APIs

Uploaded by

Burak Tuncer

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

2021 IEEE/ACM Third International Workshop on Deep Learning for Testing and Testing for Deep Learning (DeepTest)

Deep Learning-Based Prediction of Test Input

Validity for RESTful APIs
A. Giuliano Mirabella Alberto Martin-Lopez Sergio Segura
SCORE Lab, I3US Institute SCORE Lab, I3US Institute SCORE Lab, I3US Institute
Universidad de Sevilla Universidad de Sevilla Universidad de Sevilla
Seville, Spain Seville, Spain Seville, Spain
[email protected] [email protected] [email protected]

Luis Valencia-Cabrera Antonio Ruiz-Cortés

SCORE Lab, I3US Institute SCORE Lab, I3US Institute
Universidad de Sevilla Universidad de Sevilla
Seville, Spain Seville, Spain
[email protected] [email protected]

Abstract—Automated test case generation for RESTful web same API request, otherwise an error will be returned. A
APIs is a thriving research topic due to their key role in software recent study [3] revealed that these dependencies are extremely
integration. Most approaches in this domain follow a black- common and pervasive—they appear in 4 out of every 5
box approach, where test cases are randomly derived from the
API specification. These techniques show promising results, but APIs across all application domains and types of operations.
they neglect constraints among input parameters (so-called inter- Unfortunately, current API specification languages like the
parameter dependencies), as these cannot be formally described OpenAPI Specification (OAS) [4] provide no support for the
in current API specification languages. As a result, when testing formal description of this type of dependencies, despite being
real-world services, most random test cases tend to be invalid a highly demanded feature by practitioners1 . Instead, users are
since they violate some of the inter-parameter dependencies of the
service, making human intervention indispensable. In this paper, encouraged to describe dependencies among input parameters
we propose a deep learning-based approach for automatically informally, using natural language, which leads to ambiguities
predicting the validity of an API request (i.e., test input) before and makes it hardly possible to interact with services without
calling the actual API. The model is trained with the API requests human intervention2 .
and responses collected during the generation and execution Automated testing of RESTful web APIs is an active
of previous test cases. Preliminary results with five real-world
RESTful APIs and 16K automatically generated test cases show research topic [5]–[10]. Most techniques in the domain follow
that test inputs validity can be predicted with an accuracy a black-box approach, where the specification of the API under
ranging from 86% to 100% in APIs like Yelp, GitHub, and test (e.g., an OAS document) is used to drive the generation of
YouTube. These are encouraging results that show the potential test cases [6]–[8], [10]. Essentially, these approaches exercise
of artificial intelligence to improve current test case generation the API using (pseudo) random test data. To test a RESTful
techniques.
Index Terms—RESTful web API, web services testing, artificial API thoroughly, it is crucial to generate valid test inputs (i.e.,
neural network successful API calls) that go beyond the input validation code
and exercise the actual functionality of the API. Valid test
I. I NTRODUCTION inputs are those satisfying all the input constraints of the API
under test, including inter-parameter dependencies.
RESTful web APIs [1] are heavily used nowadays for Problem: Current black-box testing approaches for REST-
integrating software systems over the network. One common ful web APIs do not support inter-parameter dependencies
phenomenon occurring to RESTful APIs is that they contain since, as previously mentioned, these are not formally de-
inter-parameter dependencies (or simply dependencies for scribed in the API specification used as input. As a result,
short), i.e., dependency constraints that restrict the way in existing approaches simply ignore dependencies and resort to
which two or more input parameters can be combined to brute force to generate valid test cases, i.e., those satisfying all
form valid calls to the service. For example, in the Google input constraints. This is hardly feasible for most real-world
Maps API, when searching for places, if the location services, where inter-parameter dependencies are complex and
parameter is set, then the radius parameter must be set pervasive. For example, the search operation in the YouTube
too, otherwise a 400 status code (“bad request”) is returned. API has 31 input parameters, out of which 25 are involved
Likewise, when querying the GitHub API [2] to retrieve
the authenticated user’s repositories, the optional parameters 1 https://github.com/OAI/OpenAPI-Specification/issues/256
2 https://swagger.io/docs/specification/describing-parameters/
type and visibility must not be used together in the

978-1-6654-4565-8/21/$31.00 ©2021 IEEE 9

DOI 10.1109/DeepTest52559.2021.00008

Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 15,2023 at 15:41:08 UTC from IEEE Xplore. Restrictions apply.
in at least one dependency: trying to generate valid test
cases randomly is like hitting a wall. This was confirmed
in a previous study [11], where it was found that 98 out of
every 100 random test cases for the YouTube search operation
violated one or more inter-parameter dependencies.
To address this problem, Martin-Lopez et al. [11] devised a
constraint-based testing technique, where valid combinations
of parameters were automatically generated by analyzing the
dependencies of the API expressed in Inter-parameter Depen-
dency Language (IDL), a domain-specific language proposed
by the own authors. Although effective, this approach requires
specifying the dependencies of the API in IDL, which is a
manual and error-prone task. Furthermore, sometimes depen-
dencies are simply not mentioned in the API specification, not
even in natural language, and thus they can only be discovered
when debugging unexpected API failures.
Approach: In this paper, we propose a deep learning-based
approach for automatically inferring whether a RESTful API
request is valid or not, i.e., whether the request satisfies all the
inter-parameter dependencies. In contrast to the state-of-the-art
methods, no input specification is needed. The model is trained
with the API requests and their corresponding API responses
observed in previous calls to the API. In effect, this allows to
rule out potentially invalid test cases—unable to test the actual
functionality of the API under test—without calling the actual
API. This makes the testing process significantly more efficient
and cost-effective, especially in resource-constrained scenarios
where API calls may be limited. Preliminary evaluation results
show that test inputs validity can be predicted with an accuracy
ranging from 86% to 100% in APIs like Yelp, GitHub, and
YouTube. These results are promising and show the potential
of artificial intelligence techniques in the context of system-
level test case generation.
The rest of the paper is organized as follows: Section II
introduces the basics on automated testing of RESTful web
APIs. Section III presents our approach for predicting the Fig. 1. Excerpt of the OAS specification of the GitHub API.
validity of test inputs for RESTful APIs. Section IV explains
the evaluation process performed and the results obtained.
The possible threats to validity are discussed in Section V. update, and delete (CRUD) operations, using specific HTTP
Related literature is discussed in Section VI. Lastly, Section methods such as GET and POST.
VII mentions future lines of research and concludes the paper. RESTful APIs are commonly described with languages such
as the OpenAPI Specification (OAS) [4], arguably considered
II. REST FUL WEB API S the industry standard. An OAS document describes an API in
terms of the allowed inputs and the expected outputs. Figure 1
Web APIs allow systems to interact over the network by depicts an excerpt of the OAS specification of the GitHub API.
means of simple HTTP interactions. A web API exposes one As illustrated, the operation GET /user/repos accepts
or more endpoints through which it is possible to retrieve data five input parameters (lines 67, 77, 85, 94 and 103), three
(e.g., an HTML page) or to invoke operations (e.g., create a of which are involved in two inter-parameter dependencies,
Spotify playlist [12]). Most modern web APIs typically adhere according to the documentation of the GitHub API [2]: 1)
to the REpresentational State Transfer (REST) architectural type and visibility cannot be used together; and 2)
style [1], being referred to as RESTful web APIs. Such APIs type and affiliation cannot be used together either.
are usually decomposed into multiple RESTful web services These dependencies must be satisfied in order to generate
[13], each one allowing to manage one or more resources. A valid API inputs (i.e., HTTP requests). Upon valid API inputs,
resource can be any piece of data exposed to the Web, such successful HTTP responses with a 200 status code will be
as a YouTube video [14] or a GitHub repository [2]. These returned (line 113); otherwise, an API error will be obtained,
resources can be accessed and manipulated via create, read, identifiable by a 4XX status code (lines 114 and 116).

Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 15,2023 at 15:41:08 UTC from IEEE Xplore. Restrictions apply.
Automated test case generation for RESTful APIs is an TABLE I
DATASET EXAMPLE .
active research topic [5]–[10]. Most approaches in this domain
exploit the OAS specification of the service to generate test sort direction visibility type affiliation faulty
cases, typically in the form of one or more HTTP requests.
full name - public public - True
An HTTP request is identified by a method (e.g., GET), a - - all private - True
path (e.g., /user/repos) and a set of parameters (e.g., - - private - collaborator,owner False
type and sort). In some cases, parameter values can also be - desc - all - False
extracted from the API specification, as in the example shown
in Figure 1: all parameters are defined as enum strings, i.e.,
with a finite set of values. Crafting an HTTP request would cell [i, j] represents the value of the parameter in column
just be a matter of assigning values to random parameters j for the API call in row i. The last cell of each row
within their domain, resulting in a test input such as GET (column “faulty”) states whether the API call is valid or
/user/repos?type=all&sort=pushed. In order to not—the value to be predicted in our work. For example,
generate a valid request, all inter-parameter dependencies of row 2 in the dataset represents a request to the GitHub
the API operation should be satisfied. Valid requests are API (operation GET /user/repos) with the following key-
essential for thoroughly testing APIs, since they exercise their value pairs: <visibility=all, type=private>. As
inner functionality, going beyond their input validation logic. stated in the faulty column, such call is invalid because, as
According to the REST best practices [13], valid requests explained in the API documentation [2], the parameters type
should return successful responses (2XX status codes), while and visibility cannot be used together. By contrast, row
invalid requests should be gracefully handled and return “client 4 is valid, because <direction=desc, type=all> does
error” responses (4XX status codes). not violate any dependency.
III. A PPROACH B. Data preprocessing
In this paper, we propose an artificial neural network for The raw dataset is not ready to be fed into the network,
the automated inference of test input validity in the context as it contains many empty values, not supported by standard
of RESTful APIs (or simply APIs henceforth). The goal is to deep learning frameworks. In order to fill those empty values,
automatically infer whether an API call is valid or not without different techniques could be used. For instance, empty values
calling the actual API. This would be extremely helpful for the of numeric variables could be filled with the mean or the
automated generation of cost-effective test suites, especially in median value of the column, or zero values. Unfortunately,
resource-constrained scenarios where stressing the API with none of these strategies can be applied in this domain because
thousands or even millions of random API calls is not an the sole presence of an input parameter in an API call can
option. The approach is divided in three steps, described be crucial for the violation or fulfillment of a dependency.
below. Additionally, artificial neural networks do not support string
inputs, only numeric values. To address these problems, we
A. Data collection
propose enriching and processing the dataset as described
The first step consists in the collection of a dataset of below.
API calls properly labeled as valid or invalid. We consider 1) Data enrichment: we propose extending the original data
an API call as valid if it returns a successful response (2XX as follows.
status codes), and invalid if it returns a “client error” response
– For each number and string column, an auxiliary
(4XX status codes). As a precondition, the API calls must
boolean column will be created informing whether the
meet all the individual parameter constraints indicated in
value is empty or not. Columns representing free string
the API specification regarding data types, domains, regular
parameters will be deleted since dependencies typically
expressions, and so on. For example, if a parameter is defined
constrain their presence or absence exclusively [3].
as an integer between 0 and 20, the value 42 would be invalid.
– For each enum column, a string fake_value will
Therefore, it can be assumed that all the invalid API calls in the
be assigned to empty values (e.g., “None0”), making sure
dataset are invalid due to the violation of one or more inter-
that such values do not exist in the dataset already.
parameter dependencies, and not due to errors in individual
parameters. A dataset of these characteristics (a set of valid and 2) Data processing: next, we propose processing the
invalid API calls) can be automatically generated using state- dataset as follows.
of-the-art testing tools like RESTest [11] or RESTler [15], or – Numerical variables will be normalized in order to even
collected directly from the users activity. the influence of variables of different orders of magni-
Table I depicts a sample dataset for the operation to browse tude. For example, the operation for creating a coupon
user’s repositories from the GitHub API (described in Fig- in the Stripe API includes, among others, the numerical
ure 1). The dataset, in tabular format, contains five rows, parameters redeem_by and duration_in_months.
one for each API call. It contains n + 1 columns, n being While the latter is ∼ 102 , the former is typically ∼ 109 ,
the number of input parameters of the API operation. Each which makes its influence to the network output much

Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 15,2023 at 15:41:08 UTC from IEEE Xplore. Restrictions apply.
stronger. The normalization step guarantees that the val-
ues of both parameters are reduced to the same order of
magnitude, so that there is equity among inputs.
– enum values will be processed using one-hot encoding
[16], which creates an alphabet of the k possible values
found in dataset, and then transforms the string pa-
rameter into a binary vector of k elements. For example,
possible values for the parameter direction of GitHub API
Fig. 2. Network architecture.
are “asc”, “desc”, or fake_value (if it is missing).
Those three possibilities are encoded as vectors (1, 0, 0),
(0, 1, 0), (0, 0, 1) respectively, where each position of this case by three dropout layers, with the dropout coefficient
the vector corresponds to a specific value and only equal to 0.3.
one element is equal to 1. One-hot encoding technique
provides better performance than simple integer encoding D. Training and automated inference
when there is no logical order among enum values [16]. For each target API, the network must be trained with
a dataset of previous calls to the API before using it as
C. Network design
an effective predictor. Training the network excessively is
The discipline of deep learning is a common choice when counterproductive as it may results on the network memorizing
facing a classification problem. Specifically, the proposed instances rather than learning; a phenomena called overfitting.
model is the multilayer perceptron [17], [18]: it can easily To avoid overfitting, we propose the following strategies:
handle numerical data and learn knowledge representation; – L1 and L2-regularization factors in each layer [20], which
moreover, its implementation is relatively straightforward, and prevents the entropy among weights from increasing
it offers many possible configurations and architectures to excessively. A lower entropy guarantees the uniform dis-
experiment with. tribution of the information, in this case the knowledge,
The system is a classifier of Rn into 2 classes: valid or
among weights.
faulty; it accepts the parameters of an API call as input, and
– Several Dropout layers along the architecture [21], which
returns the predicted validity as output. While the number of
randomly silence neurons and prevent the network from
classes is already known, the number of inputs n is variable,
activating them, and therefore from improving their
and it depends on the number of API parameters. For this
weights.
reason, the network automatically adapts to the number of
– Early stopping of the training process [20]. Along the
inputs of the corresponding API operation. More specifically,
training process several metrics are controlled and when
it has the following structure:
they stop improving the training is forced to stop.
– Input layer: an input layer will be automatically built
Once the system has been properly trained, the validity
fitting to the dataframe number of columns, so as for the
of new API calls can be predicted by simply applying the
accepted shape to be flexible.
network over new instances of the same API (i.e., introducing
– Inner layers: we have experimented with several depths
the instance as an input and applying the forward step only,
and different neuronal units per layer.
thus getting the output of the network). In the GitHub example,
– Output layer: one neuron, emitting the probability of the
for instance, when running the network with the API call
instance of belonging to the valid class. The class will
GET /user/repos?sort=created, not included in the
be predicted as faulty or valid depending on the value of
original dataset depicted in Table I, the system should ideally
the output being greater than 0.5 or not.
predict it as valid, since it satisfies all the input constraints,
In order to achieve better results, we experimented with including the inter-parameter dependencies described in Sec-
different combinations of values for the key hyperparameters tion II.
driving the learning process. The optimizer is the computation
method used to update network weights [19]. The batch size IV. E VALUATION
refers to the number of instances the system is fed with For the evaluation of the approach, we implemented a neural
before each weight update happens. Finally, the learning rate network in Python and we assessed its prediction capability
determines how much the gradient of the error influences on an automatically generated dataset. Next, we describe
the weights update. The best configuration found is opti- the research questions, experimental setup and the results
mizer=“Adam”, batch size=8, and learning rate=0.002. obtained.
Figure 2 depicts a visual representation of the proposed
architecture, consisting in: n-units input layer, five inner layers A. Research questions
with 32, 16, 8, 4 and 2 units, respectively, and the final one- We address the following research questions (RQs):
unit output layer. As usual in these feed-forward networks, all – RQ1: How effective is the approach in predicting the
layers are densely connected to the preceding and following validity of test inputs for RESTful APIs? This is the
ones. Besides, the first, second, and third layers are followed in main goal of our work. To answer it, we will study the

Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 15,2023 at 15:41:08 UTC from IEEE Xplore. Restrictions apply.
TABLE II TABLE III
REST FUL SERVICES USED IN THE EVALUATION . P REDICTION ACCURACY FOR EACH SERVICE .

API Operation #Parameters #Dependencies Accuracy (%)

API
GitHub Get user repositories 5 2 Cross Validation Test
LanguageTool Proofread text 11 4
Stripe-CC Create coupon 9 3 GitHub 99.8 100
Stripe-CP Create product 18 6 LanguageTool 85.6 86.0
Yelp Search businesses 14 4 Stripe-CC 97.8 100
YouTube-GCT Get comment threads 11 5 Stripe-CP 98.4 98.8
YouTube-GV Get videos 12 5 Yelp 94.8 94.1
YouTube-S Search 31 16 YouTube-GCT 98.2 100
YouTube-GV 99.4 100
YouTube-S 98.3 99.3
Mean 96.5 97.3
prediction accuracy of the approach using a large dataset
of calls to industrial APIs.
– RQ2: What is the impact of the size of the dataset in the
accuracy of the network? The accuracy of the approach to implement regularization techniques; finally, Sequential
largely depends on the size of the dataset. To investigate class was used to build the model.
its impact, we will study the accuracy achieved along 3) Network evaluation: In order to evaluate the perfor-
different sizes of the dataset to provide further insights mance of each configuration and choose the best option, we
on the applicability of the approach. used the stratified 5-fold cross validation (CV) technique,
which makes possible biases highly unlikely. Stratified K-fold
B. Experimental setup CV is a variant of K-fold CV which guarantees that each fold
has a balanced presence of instances of each class (faulty and
Next, we describe the dataset, the network implementation,
valid, in this case), and it is supported by scikit-learn
and the evaluation procedure.
package.
1) Dataset: we used the tool RESTest for the automated
The fundamental metric considered to choose the best
generation of the dataset. RESTest [11] is a black-box test
architecture during the cross validation is the accuracy, which
case generation framework for RESTful web APIs. RESTest
measures the fraction of correctly predicted instances. One
implements a novel constraint-based testing approach that
common practice when facing classification problems is to
leverages the manually enriched specification of APIs (using
consider also the area under the ROC curve (AUC). AUC
the IDL language to specify inter-parameter dependencies) to
measures the homogeneity of the system accuracy among
automatically generate valid or invalid calls to a given API
different classes, and it is especially useful when classes are
operation [11]. Specifically, we generated test cases for the
not equally represented in the dataset, and the accuracy score
eight real-world API operations depicted in Table II. For each
can easily mislead the intuition. In our case, since the dataset
API service, the table shows its name, operation under test,
is completely balanced, AUC plays a less relevant role as a
number of input parameters and number of inter-parameter
metric. However, we were controlling its values as well, to
dependencies. A detailed description of the dependencies is
guarantee they were always considerably high, thus increasing
given in Appendix A. For each of such API operations, we
the confidence that the good results in accuracy were similar
used RESTest to automatically generate 1,000 valid and 1,000
in the different classes represented (valid and faulty). Both
invalid API calls, creating eight datasets with 2,000 API calls
metrics were calculated using keras.
each, 16,000 API calls in total. RESTest was configured to
satisfy all the individual parameter constraints (e.g., using The database was randomly split into test and train datasets
data dictionaries) ensuring that invalid calls are caused by the (20% and 80%, respectively). Training data were exploited by
violation of one or more dependencies. As a sanity check, CV training in order to choose the best system architecture and
we ran all the generated calls into the actual APIs to confirm tune hyperparameters, while the test data were used to evaluate
that they were correctly labeled as valid or invalid. Also, we the final accuracy of the predictor. Such division of the dataset
manually confirmed that each of the dependencies is violated guarantees the measured scores are not only adjusted to the
at least once in the dataset. specific CV training data, but also legitimate in predicting
completely new instances. Just like in cross validation, the
2) Network implementation: the proposed approach was
fundamental metric to ponder is the accuracy.
implemented using common deep learning packages for
Python. The preprocessing step was based entirely on pandas
C. Experimental results
1.2.0, which offers general dataframe manipulation tools,
and scikit-learn 0.23.2 which provides normaliza- Table III shows the predictor accuracy for each API opera-
tion, and one-hot encoding functionalities. The network tion both in CV training and test data. With 99% confidence
was built using tensorflow 2.3.0 and keras 2.4.3; CV and test accuracy are (96.5 ± 3.97)% and (97.3 ± 4.23)%,
among others, most relevant tools were: Dense class to build respectively. The results show that the overfitting phenomena
inner layers; Dropout class, and regularizers method has been controlled successfully along training process, as

Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 15,2023 at 15:41:08 UTC from IEEE Xplore. Restrictions apply.
V. T HREATS TO VALIDITY
Next, we discuss the possible internal and external validity
threats that may have influenced our work, and how these were
mitigated.
A. Internal validity
Threats to the internal validity relate to those factors that
might introduce bias and affect the results of our investigation.
A possible threat in this regard is the existence of bugs in
the implementation of the approach or the tools used. For
the generation of the API requests, we relied on RESTest
[11], a state-of-the-art testing framework for RESTful APIs.
It could happen that, due to bugs on RESTest or errors in the
Fig. 3. Accuracy evolution by dataset size. API specifications, some valid requests were actually invalid,
or vice versa. To neutralize this threat, we executed all the
requests generated against each of the APIs, to make sure that
there is a strong similarity between CV and test scores for
all of them were correctly labeled as valid or invalid.
each API operation.
For the implementation of our approach, we used cross
It would be sensible to think that a correlation should
validation to select the best combination of hyperparameters
exist between goodness of results and complexity of the
that would yield the best possible results. Regarding the dataset
API operation, i.e., validity of services with less number of
used, it could be argued that it is not diverse enough, especially
parameters and dependencies should be easily predictable.
for the invalid requests: since these were generated with
This theoretical intuition is confirmed by GitHub or Stripe-
RESTest (which we used as a black box), it could happen that
CC, for instance, where accuracy is 100%. However, Yelp
all of them were invalid due to violating the same dependency
scores are counter-intuitively lower, which could be due to
over and over again. To mitigate this threat, we manually
its dependencies being arithmetic; similarly, LanguageTool
confirmed that every single dependency was violated at least
validity is by far the hardest to predict despite having only
once in the dataset.
4 dependencies. By contrast, the search operation validity in
YouTube is predicted very accurately (99.3%), despite being B. External validity
its parameters and dependencies the most numerous ones. This External validity concerns the extent to which we can
may due to the fact that most of its parameters are enum, generalize from the results obtained in the experiments. Our
with a fixed set of possible values, and therefore there are less approach has been validated on five subject APIs, and therefore
combinations of parameters and values for which to predict they might not completely generalize further. To minimize this
input validity. All these facts suggest that the complexity threat, we resorted to industrial APIs with millions of users
of validity prediction does not lie only on the number of world-wide. Specifically, we selected API operations with
dependencies, but also on their type and the shape of the different characteristics in terms of numbers of parameters
parameters involved. (from 5 to 31), parameter types (e.g., strings, numeric, enums),
In view of these results, RQ1 can be answered as follows: number of inter-parameter dependencies (from 2 to 16), and
type of dependencies. In this regards, it is worth mentioning
The accuracy of the network ranges from 86% to 100% that the selected API operations include instances of all the
with a mean accuracy in the eight API operations eight dependency patterns identified in web APIs [3].
under study of 97.3%.
VI. R ELATED WORK
Figure 3 shows the evolution of the mean accuracy across The automated generation of valid test inputs for RESTful
all the API operations under study with respect to the size APIs is a somehow overlooked topic in the literature. Most
of the dataset. As illustrated, accuracy increases drastically as testing approaches focus on black-box fuzzing or related tech-
the size approaches to 400 instances, properly balanced. Then, niques [7], [8], [10], [15], where test inputs are derived from
the prediction accuracy increases slowly until the 800-1000 the API specification (when available) or randomly generated,
instances, after which the growth is asymptotic. As a result, with the hope of causing service crashes (i.e., 5XX status
RQ2 can be answered as follows: codes). Such inputs are unlikely to be valid, especially in
the presence of inter-parameter dependencies like the ones
discussed in this paper.
The approach achieves an accuracy of 90% with
Applications of artificial intelligence and deep learning
balanced datasets of about 400 instances, reaching its
techniques for enhancing RESTful API testing are still in
top performance with 800 or more instances.
their infancy. Arcuri [5] advocates for a white-box approach
where genetic algorithms are used for generating test inputs

Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 15,2023 at 15:41:08 UTC from IEEE Xplore. Restrictions apply.
that cover more code and find faults in the system under [3] A. Martin-Lopez, S. Segura, and A. Ruiz-Cortés, “A Catalogue of
Inter-Parameter Dependencies in RESTful Web APIs,” in International
test. Atlidakis et al. [22] proposed a learning-based mutation Conference on Service-Oriented Computing, 2019, pp. 399–414.
approach, where two recurrent neural networks are employed [4] “OpenAPI Specification,” accessed April 2020. [Online]. Available:
for evolving inputs that are likely to find bugs. In both cases, https://www.openapis.org
[5] A. Arcuri, “RESTful API Automated Test Case Generation with Evo-
the source code of the system is required, which is seldom Master,” ACM Transactions on Software Engineering and Methodology,
available, especially for commercial APIs such as the ones vol. 28, no. 1, pp. 1–37, 2019.
studied in our evaluation (e.g., YouTube and Yelp). [6] V. Atlidakis, P. Godefroid, and M. Polishchuk, “Checking Security Prop-
erties of Cloud Services REST APIs,” in IEEE International Conference
The handling of inter-parameter dependencies is key for on Software Testing, Validation and Verification, 2020, pp. 387–397.
ensuring the validity of RESTful API inputs, as shown in a [7] H. Ed-douibi, J. L. C. Izquierdo, and J. Cabot, “Automatic Generation
recent study on 40 industrial APIs [3]. Two previous papers of Test Cases for REST APIs: A Specification-Based Approach,” in
IEEE International Enterprise Distributed Object Computing Confer-
have addressed this issue up to a certain degree: Wu et al. ence, 2018, pp. 181–190.
[23] proposed a method for inferring these dependencies by [8] S. Karlsson, A. Causevic, and D. Sundmark, “QuickREST: Property-
leveraging several resources of the API. Oostvogels et al. [24] based Test Generation of OpenAPI Described RESTful APIs,” in IEEE
International Conference on Software Testing, Validation and Verifica-
proposed a DSL for formally specifying these dependencies, tion, 2020, pp. 131–141.
but no approach was provided for automatically analyzing [9] S. Segura, J. A. Parejo, J. Troya, and A. Ruiz-Cortés, “Metamorphic
them. The most related work is probably that of Martin- Testing of RESTful Web APIs,” IEEE Transactions on Software Engi-
neering, vol. 44, no. 11, pp. 1083–1099, 2018.
Lopez et al. [11], [25], where an approach for specifying [10] E. Viglianisi, M. Dallago, and M. Ceccato, “RestTestGen: Automated
and analyzing inter-parameter dependencies was proposed, as Black-Box Testing of RESTful APIs,” in IEEE International Conference
well as a testing framework supporting these dependencies. on Software Testing, Validation and Verification, 2020, pp. 142–152.
[11] A. Martin-Lopez, S. Segura, and A. Ruiz-Cortés, “RESTest: Black-
However, the API dependencies must be manually written in Box Constraint-Based Testing of RESTful Web APIs,” in International
IDL language, which is time-consuming and error-prone. The Conference on Service-Oriented Computing, 2020, pp. 459–475.
work presented in this paper is a step forward in automating [12] “Spotify Web API,” accessed November 2016. [Online]. Available:
https://developer.spotify.com/web-api/
this process, since we provide a way for predicting the validity [13] L. Richardson, M. Amundsen, and S. Ruby, RESTful Web APIs.
of API inputs without the need of formally specifying the O’Reilly Media, Inc., 2013.
dependencies among their parameters, just by analyzing the [14] “YouTube Data API,” accessed April 2020. [Online]. Available:
https://developers.google.com/youtube/v3/
inputs and outputs of the API. [15] V. Atlidakis, P. Godefroid, and M. Polishchuk, “RESTler: Stateful REST
API Fuzzing,” in International Conference on Software Engineering,
VII. C ONCLUSION AND F UTURE W ORK 2019, pp. 748–758.
In this paper, we proposed a deep learning-based approach [16] F. N. Kerlinger and E. J. Pedhazur, Multiple regression in behavioral
research. Holt, Rinehart and Winston of Canada Ltd, New York (N.Y.),
for predicting the validity of test inputs for RESTful APIs. 1973.
Starting from a dataset of previous calls labeled as valid or [17] F. Rosenblatt, Principles of neurodynamics: perceptions and the theory
faulty, the proposed network is able to predict the validity of brain mechanisms. Washington, DC: Spartan, 1962.
[18] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning Internal
of new API calls with an accuracy of ∼ 97%. In contrast to Representations by Error Propagation. Cambridge, MA: MIT Press,
existing methods, relying on manual means or brute force, our 1986, p. 318–362.
approach is fully automated, leveraging the power of current [19] Keras. Optimizers. [Online]. Available: https://keras.io/api/optimizers/
[20] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,
deep learning frameworks. This is a small step, but promising, 2016.
showing the potential of AI to achieve an unprecedented [21] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
degree of automation in software testing. R. Salakhutdinov, “Dropout: A simple way to prevent neural
networks from overfitting,” Journal of Machine Learning Research,
Many plans remain for our future work: first and foremost, vol. 15, no. 56, pp. 1929–1958, 2014. [Online]. Available:
we plan to perform a more thorough evaluation including http://jmlr.org/papers/v15/srivastava14a.html
different datasets and more APIs. In addition, we are currently [22] V. Atlidakis, R. Geambasu, P. Godefroid, M. Polishchuk, and B. Ray,
“Pythia: Grammar-Based Fuzzing of REST APIs with Coverage-guided
exploring the use of different machine learning techniques for Feedback and Learning-based Mutations,” Tech. Rep., 2020.
the automated inference of inter-parameter dependencies in [23] Q. Wu, L. Wu, G. Liang, Q. Wang, T. Xie, and H. Mei, “Inferring
RESTful APIs. Dependency Constraints on Parameters for Web Services,” in World Wide
Wed, 2013, pp. 1421–1432.
V ERIFIABILITY [24] Oostvogels, N., De Koster, J., De Meuter, W., “Inter-parameter Con-
straints in Contemporary Web APIs,” in International Conference on
For the sake of reproducibility, the code and the dataset of Web Engineering, 2017, pp. 323–335.
our work are publicly accessible in the following anonymous [25] A. Martin-Lopez, S. Segura, C. Müller, and A. Ruiz-Cortés, “Specifica-
tion and Automated Analysis of Inter-Parameter Dependencies in Web
repository: APIs,” IEEE Transactions on Services Computing, 2020, in press.
https://anonymous.4open.science/r/ A PPENDIX A
8954c607-8d6c-4348-a23c-d57c920cdc22/ IDL DEPENDENCIES FROM THE API S UNDER STUDY
R EFERENCES Table IV shows the inter-parameter dependencies present in
[1] R. T. Fielding, “Architectural Styles and the Design of Network-based the eight API operations considered in our study, expressed in
Software Architectures,” Ph.D. dissertation, 2000. the IDL language.
[2] “GitHub API,” accessed January 2020. [Online]. Available: https:
//developer.github.com/v3/

Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 15,2023 at 15:41:08 UTC from IEEE Xplore. Restrictions apply.
TABLE IV
I NTER - PARAMETER DEPENDENCIES INCLUDED API OPERATIONS UNDER STUDY DESCRIBED IN IDL.
IN THE

API Inter-parameter dependencies

ZeroOrOne(type, visibility);
GitHub
ZeroOrOne(type, affiliation);
OnlyOne(text, data);
IF preferredVariants THEN language==’auto’;
LanguageTool
IF enabledOnly==true THEN NOT (disabledRules OR disabledCategories);
IF enabledOnly==true THEN (enabledRules OR enabledCategories);
Or(amount off, percent off);
Stripe-CC IF amount off THEN currency;
AllOrNone(duration==’repeating’, duration in months);
IF caption THEN type==’good’;
IF [deactivate on[]] THEN type==’good’;
IF [package dimensions[height]] OR [package dimensions[length]] OR [package dimensions[weight]] OR
[package dimensions[width]] THEN type==’good’;
Stripe-CP
AllOrNone([package dimensions[height]], [package dimensions[length]], [package dimensions[weight]],
[package dimensions[width]]);
IF shippable THEN type==’good’;
IF url THEN type==’good’;
Or(location, latitude AND longitude);
ZeroOrOne(open now, open at);
Yelp
IF offset AND limit THEN offset + limit <= 1000;
IF offset AND NOT limit THEN offset <= 980;
OnlyOne(allThreadsRelatedToChannelId, channelId, id, videoId);
ZeroOrOne(id, moderationStatus);
YouTube-GCT
ZeroOrOne(id, order);
ZeroOrOne(id, pageToken);
ZeroOrOne(id, searchTerms);
OnlyOne(chart, id, myRating);
ZeroOrOne(maxResults, id);
YouTube-GV ZeroOrOne(pageToken, id);
IF regionCode THEN chart;
IF videoCategoryId THEN chart;
ZeroOrOne(forContentOwner, forDeveloper, forMine, relatedToVideoId);
IF forContentOwner==true THEN onBehalfOfContentOwner AND type==’video’ AND NOT (videoDefinition OR videoDimension
OR videoDuration OR videoLicense OR videoEmbeddable OR videoSyndicated OR videoType);
IF forMine==true THEN type==’video’ AND NOT (videoDefinition OR videoDimension OR videoDuration OR videoLicense
OR videoEmbeddable OR videoSyndicated OR videoType);
IF relatedToVideoId THEN type==’video’ AND NOT (channelId OR channelType OR eventType OR location OR locationRadius OR
onBehalfOfContentOwner OR order OR publishedAfter OR publishedBefore OR q OR topicId OR videoCaption OR videoCategoryId
OR videoDefinition OR videoDimension OR videoDuration OR videoEmbeddable OR videoLicense OR videoSyndicated OR videoType);
IF eventType THEN type==’video’;
AllOrNone(location, locationRadius);
YouTube-S
publishedAfter >= publishedBefore;
IF videoCaption THEN type==’video’;
IF videoCategoryId THEN type==’video’;
IF videoDefinition THEN type==’video’;
IF videoDimension THEN type==’video’;
IF videoDuration THEN type==’video’;
IF videoEmbeddable THEN type==’video’;
IF videoLicense THEN type==’video’;
IF videoSyndicated THEN type==’video’;
IF videoType THEN type==’video’;

Authorized licensed use limited to: ULAKBIM UASL - GAZI UNIV. Downloaded on May 15,2023 at 15:41:08 UTC from IEEE Xplore. Restrictions apply.

Edoc 2018
No ratings yet
Edoc 2018
10 pages
Morest Model-Based RESTful API Testing With Execution Feedback
No ratings yet
Morest Model-Based RESTful API Testing With Execution Feedback
12 pages
Leveraging_Large_Language_Models_to_Improve_REST_API_Testing
No ratings yet
Leveraging_Large_Language_Models_to_Improve_REST_API_Testing
5 pages
RESTful API Automated Test Case Generation With EvoMaster
No ratings yet
RESTful API Automated Test Case Generation With EvoMaster
37 pages
Testing of RESTful Web APIs
No ratings yet
Testing of RESTful Web APIs
3 pages
Evomaster
No ratings yet
Evomaster
41 pages
Proyecto Arquitectura AISS RESTest v2
No ratings yet
Proyecto Arquitectura AISS RESTest v2
19 pages
Vulnerability-Oriented Testing For RESTful APIs
No ratings yet
Vulnerability-Oriented Testing For RESTful APIs
17 pages
RESTful API Automated Test Case Generation
No ratings yet
RESTful API Automated Test Case Generation
12 pages
Web API Search
No ratings yet
Web API Search
18 pages
Vrushali
No ratings yet
Vrushali
33 pages
Improving Test Case Generation For REST APIs Through Hierarchical Clustering
No ratings yet
Improving Test Case Generation For REST APIs Through Hierarchical Clustering
12 pages
usenixsecurity24-du
No ratings yet
usenixsecurity24-du
18 pages
REST API Testing Guide
50% (2)
REST API Testing Guide
12 pages
API QUESTIONS
No ratings yet
API QUESTIONS
5 pages
API Testing Using Postman Tool
No ratings yet
API Testing Using Postman Tool
5 pages
An Approach to Generating API Test Scripts Using GPT
No ratings yet
An Approach to Generating API Test Scripts Using GPT
9 pages
RestAPI Testing
No ratings yet
RestAPI Testing
11 pages
Venkat Raj
No ratings yet
Venkat Raj
6 pages
Rest Assured Workshop
No ratings yet
Rest Assured Workshop
81 pages
Rest Assured Master Class 1737266046
No ratings yet
Rest Assured Master Class 1737266046
14 pages
Rest 101
No ratings yet
Rest 101
35 pages
Automation API TM14 PDF
No ratings yet
Automation API TM14 PDF
72 pages
API Testing Practical Guide - QA_SDET
No ratings yet
API Testing Practical Guide - QA_SDET
7 pages
Test-Driven APIs With Laravel and Pest Sample Chapter
No ratings yet
Test-Driven APIs With Laravel and Pest Sample Chapter
32 pages
Testing Manual 1620883272
100% (1)
Testing Manual 1620883272
146 pages
Rest-Assured Rest
No ratings yet
Rest-Assured Rest
17 pages
Automated Example Oriented REST API
No ratings yet
Automated Example Oriented REST API
162 pages
API Testing Crash Course 1685052723
No ratings yet
API Testing Crash Course 1685052723
10 pages
API research papers
No ratings yet
API research papers
9 pages
7_API testing Interview Questions and Answers
No ratings yet
7_API testing Interview Questions and Answers
14 pages
APIs_Questions
No ratings yet
APIs_Questions
4 pages
Api Testing Notes
No ratings yet
Api Testing Notes
15 pages
API Testing Guide 1730023081
No ratings yet
API Testing Guide 1730023081
18 pages
Lecture 20
No ratings yet
Lecture 20
39 pages
Practical guide to API testing
No ratings yet
Practical guide to API testing
7 pages
Nestful_Nested_API_Calls_Benchmark
No ratings yet
Nestful_Nested_API_Calls_Benchmark
10 pages
Thesis Hamza Ed-Douibi PDF
No ratings yet
Thesis Hamza Ed-Douibi PDF
200 pages
API Security Testing The Challenges of Security Testing For Restful APIs
No ratings yet
API Security Testing The Challenges of Security Testing For Restful APIs
15 pages
Test Design and Automation For REST API PDF
No ratings yet
Test Design and Automation For REST API PDF
101 pages
Api Testing Quick Reference
No ratings yet
Api Testing Quick Reference
15 pages
API Testing
No ratings yet
API Testing
15 pages
Xr0cwi8oqywrvm2yvnar Signature Poli 171121075325
No ratings yet
Xr0cwi8oqywrvm2yvnar Signature Poli 171121075325
26 pages
API Testing
No ratings yet
API Testing
20 pages
Rest 101: The Beginner'S Guide To Using and Testing Restful Apis
No ratings yet
Rest 101: The Beginner'S Guide To Using and Testing Restful Apis
35 pages
Endtoendapitestinginterviewquestionsnotes Sample
No ratings yet
Endtoendapitestinginterviewquestionsnotes Sample
20 pages
Rest Api
No ratings yet
Rest Api
10 pages
SE Tool Demo Report
No ratings yet
SE Tool Demo Report
28 pages
IBM WhitePaper
No ratings yet
IBM WhitePaper
9 pages
Untitled Document (2)
No ratings yet
Untitled Document (2)
9 pages
API Testing
No ratings yet
API Testing
5 pages
04 Nodejs Apis
No ratings yet
04 Nodejs Apis
39 pages
API Testing With TestProject - KT
No ratings yet
API Testing With TestProject - KT
14 pages
API Testing
No ratings yet
API Testing
15 pages
APIs Study 1
No ratings yet
APIs Study 1
15 pages
API Intrview Questions
No ratings yet
API Intrview Questions
21 pages
api doc
No ratings yet
api doc
44 pages
Api testing interview Questions.docx2
No ratings yet
Api testing interview Questions.docx2
6 pages
API Security Testing The Challenges of Security Testing For Restful APIs
No ratings yet
API Security Testing The Challenges of Security Testing For Restful APIs
16 pages
Stock Prediction Report
No ratings yet
Stock Prediction Report
27 pages
Regression:: Predicting House Prices
No ratings yet
Regression:: Predicting House Prices
42 pages
MFML Unit-4 Notes - 14-06-2024
No ratings yet
MFML Unit-4 Notes - 14-06-2024
36 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
48 pages
Linear Classifiers in Python: Chapter1
No ratings yet
Linear Classifiers in Python: Chapter1
16 pages
Issues in ML
No ratings yet
Issues in ML
2 pages
Unit - 1 1.introduction To ML
No ratings yet
Unit - 1 1.introduction To ML
74 pages
Note On The Evaluation of Generative Models: These Authors Contributed Equally To This Work. Now at Google Deepmind
No ratings yet
Note On The Evaluation of Generative Models: These Authors Contributed Equally To This Work. Now at Google Deepmind
10 pages
Intro To NN & FL
No ratings yet
Intro To NN & FL
42 pages
Machine Learning For Microalgae Detection and Utilization
No ratings yet
Machine Learning For Microalgae Detection and Utilization
22 pages
Ai Model Life Cycle
No ratings yet
Ai Model Life Cycle
6 pages
Cs224n Text Generation
No ratings yet
Cs224n Text Generation
73 pages
Report Technical Seminar
No ratings yet
Report Technical Seminar
30 pages
DPO Vs PPO Comparative Analysis
No ratings yet
DPO Vs PPO Comparative Analysis
15 pages
MAchine Learning 2
No ratings yet
MAchine Learning 2
16 pages
ML Course Slides
No ratings yet
ML Course Slides
356 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
A Brief Tour of Deep Learning From A Statistical Perspective
No ratings yet
A Brief Tour of Deep Learning From A Statistical Perspective
31 pages
AI_ML
No ratings yet
AI_ML
23 pages
Flutter Speed Prediction by Using Deep Learning
No ratings yet
Flutter Speed Prediction by Using Deep Learning
15 pages
CP 3
No ratings yet
CP 3
2 pages
Wa0030.
No ratings yet
Wa0030.
36 pages
003 KNN Complete
No ratings yet
003 KNN Complete
66 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
Ensemble Learning Based Features Extraction For Brain MR Images Classifcation With Machine Learning Classifers
No ratings yet
Ensemble Learning Based Features Extraction For Brain MR Images Classifcation With Machine Learning Classifers
17 pages
Class IX unit 1 & 2 MCQ
No ratings yet
Class IX unit 1 & 2 MCQ
15 pages
Marketing Analytics: No Document Allowed
No ratings yet
Marketing Analytics: No Document Allowed
3 pages
Full download Grokking Machine Learning 1st Edition Luis Serrano pdf docx
100% (1)
Full download Grokking Machine Learning 1st Edition Luis Serrano pdf docx
40 pages
Bias, Variance, and Tradeoff
No ratings yet
Bias, Variance, and Tradeoff
8 pages
Interview Questions
No ratings yet
Interview Questions
225 pages