0% found this document useful (0 votes)
49 views

RCPP Sugar PDF

This document describes Rcpp sugar, which provides syntactic sugar facilities in C++ for code written using the Rcpp API: 1. Rcpp sugar allows C++ code to be written at a higher level of abstraction similar to R, using expression templates and lazy evaluation. This makes the code both nicer syntactically and more efficient. 2. Rcpp sugar overloads common operators like +, -, *, / to work on Rcpp numeric and logical vectors, allowing arithmetic and logical expressions to be written concisely in C++ like in R. 3. Functions like any(), all(), is_na() produce logical sugar expressions, while seq_along() generates integer sequences - providing functionality similar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

RCPP Sugar PDF

This document describes Rcpp sugar, which provides syntactic sugar facilities in C++ for code written using the Rcpp API: 1. Rcpp sugar allows C++ code to be written at a higher level of abstraction similar to R, using expression templates and lazy evaluation. This makes the code both nicer syntactically and more efficient. 2. Rcpp sugar overloads common operators like +, -, *, / to work on Rcpp numeric and logical vectors, allowing arithmetic and logical expressions to be written concisely in C++ like in R. 3. Functions like any(), all(), is_na() produce logical sugar expressions, while seq_along() generates integer sequences - providing functionality similar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Rcpp syntactic sugar

Dirk Eddelbuettela and Romain Françoisb


a
http://dirk.eddelbuettel.com; b https://romain.rbind.io/

This version was compiled on November 8, 2019

This note describes Rcpp sugar which has been introduced in version Apart from being strongly-typed and the need for explicit
0.8.3 of Rcpp (Eddelbuettel et al., 2019a; Eddelbuettel and François, 2011). return statement, the code is now identical between highly-
Rcpp sugar brings a higher-level of abstraction to C++ code written using vectorised R and C++.
the Rcpp API. Rcpp sugar is based on expression templates (Abrahams Rcpp sugar is written using expression templates and lazy eval-
and Gurtovoy, 2004; Vandevoorde and Josuttis, 2003) and provides some uation techniques (Abrahams and Gurtovoy, 2004; Vandevoorde
‘syntactic sugar’ facilities directly in Rcpp. This is similar to how RcppAr- and Josuttis, 2003). This not only allows a much nicer high-level
madillo (Eddelbuettel et al., 2019b) offers linear algebra C++ classes based syntax, but also makes it rather efficient (as we detail in section 4
on Armadillo (Sanderson, 2010). below).
Rcpp | sugar | R | C++
2. Operators
Rcpp sugar takes advantage of C++ operator overloading. The next
1. Motivation
few sections discuss several examples.
Rcpp facilitates development of internal compiled code in an R
package by abstracting low-level details of the R API (R Core Team, 2.1. Binary arithmetic operators. Rcpp sugar defines the usual bi-
2018) into a consistent set of C++ classes. nary arithmetic operators : +, -, *, /.
Code written using Rcpp classes is easier to read, write and
// two numeric vectors of the same size
maintain, without loosing performance. Consider the following
NumericVector x;
code example which provides a function foo as a C++ extension to
NumericVector y;
R by using the Rcpp API:

RcppExport SEXP foo(SEXP x, SEXP y) { // expressions involving two vectors


Rcpp::NumericVector xx(x), yy(y); NumericVector res = x + y;
int n = xx.size(); NumericVector res = x - y;
Rcpp::NumericVector res(n); NumericVector res = x * y;
double x_ = 0.0, y_ = 0.0; NumericVector res = x / y;
for (int i=0; i<n; i++) {
x_ = xx[i]; // one vector, one single value
y_ = yy[i]; NumericVector res = x + 2.0;
if (x_ < y_) { NumericVector res = 2.0 - x;
res[i] = x_ * x_; NumericVector res = y * 2.0;
} else { NumericVector res = 2.0 / y;
res[i] = -(y_ * y_);
} // two expressions
} NumericVector res = x * y + y / 2.0;
return res; NumericVector res = x * (y - 2.0);
} NumericVector res = x / (y * y);

The left hand side (lhs) and the right hand side (rhs) of each
The goal of the function foo code is simple. Given two numeric
binary arithmetic expression must be of the same type (for example
vectors, we create a third one. This is typical low-level C++ code
they should be both numeric expressions).
that that could be written much more consicely in R thanks to
The lhs and the rhs can either have the same size or one of them
vectorisation as shown in the next example.
could be a primitive value of the appropriate type, for example
foo <- function(x, y) { adding a NumericVector and a double.
ifelse(x < y, x * x, -(y * y))
2.2. Binary logical operators. Binary logical operators create a
}
logical sugar expression from either two sugar expressions of
the same type or one sugar expression and a primitive value of the
Put succinctly, the motivation of Rcpp sugar is to bring a subset
associated type.
of the high-level R syntax in C++. Hence, with Rcpp sugar , the
C++ version of foo now becomes: // two integer vectors of the same size
NumericVector x;
Rcpp::NumericVector foo(Rcpp::NumericVector x,
NumericVector y;
Rcpp::NumericVector y) {
return ifelse(x < y, x * x, -(y * y));
// expressions involving two vectors
}

https://cran.r-project.org/package=Rcpp Rcpp Vignette | November 8, 2019 | 1–6


LogicalVector res = x < y; 3.1.1. Conversion to bool. One
important thing to note concerns the
LogicalVector res = x > y; conversion to the bool type. In order to respect the concept of
LogicalVector res = x <= y; missing values (NA) in R, expressions generated by any or all
LogicalVector res = x >= y; can not be converted to bool. Instead one must use is_true,
LogicalVector res = x == y; is_false or is_na:
LogicalVector res = x != y;
// wrong: will generate a compile error
// one vector, one single value bool res = any(x < y);
LogicalVector res = x < 2;
LogicalVector res = 2 > x; // ok
LogicalVector res = y <= 2; bool res = is_true(any( x < y ));
LogicalVector res = 2 != y; bool res = is_false(any( x < y ));
bool res = is_na(any( x < y ));
// two expressions
LogicalVector res = (x + y) < (x*x); 3.2. Functions producing sugar expressions.
LogicalVector res = (x + y) >= (x*x);
LogicalVector res = (x + y) == (x*x); 3.2.1. is_na. Given
a sugar expression of any type, is_na (just like
the other functions in this section) produces a logical sugar ex-
pression of the same length. Each element of the result expression
2.3. Unary operators. The unary operator- can be used to negate evaluates to TRUE if the corresponding input is a missing value, or
a (numeric) sugar expression. whereas the unary operator! FALSE otherwise.
negates a logical sugar expression:
IntegerVector x =
// a numeric vector IntegerVector::create(0, 1, NA_INTEGER, 3);
NumericVector x;
is_na(x)
// negate x all(is_na( x ))
NumericVector res = -x; any(!is_na( x ))

// use it as part of a numerical expression 3.2.2. seq_along. Givena sugar expression of any type, seq_along
NumericVector res = -x * (x + 2.0); creates an integer sugar expression whose values go from 1 to the
size of the input.
// two integer vectors of the same size
NumericVector y; IntegerVector x =
NumericVector z; IntegerVector::create( 0, 1, NA_INTEGER, 3 );

// negate the logical expression "y < z" IntegerVector y = seq_along(x);


LogicalVector res = !(y < z); IntegerVector z = seq_along(x * x * x * x * x * x);

This is the most lazy function, as it only needs to call the size
3. Functions member function of the input expression. The input expression
need not to be resolved. The two examples above gives the same
Rcpp sugar defines functions that closely match the behavior of R
result with the same efficiency at runtime. The compile time will
functions of the same name.
be affected by the complexity of the second expression, since the
abstract syntax tree is built at compile time.
3.1. Functions producing a single logical result. Given a logical
sugar expression, the all function identifies if all the elements are 3.2.3. seq_len. seq_lencreates an integer sugar expression whose
TRUE. Similarly, the any function identifies if any the element is i-th element expands to i. seq_len is particularly useful in con-
TRUE when given a logical sugar expression. junction with sapply and lapply.

IntegerVector x = seq_len(1000); // 1, 2, ..., 10


all(x*x < 3); IntegerVector x = seq_len(10);
any(x*x < 3);
List y = lapply(seq_len(10), seq_len);
Either call to all and any creates an object of a class that has
member functions is_true, is_false, is_na and a conversion 3.2.4. pmin and pmax. Given two sugar expressions of the same type
to SEXP operator. and size, or one expression and one primitive value of the appro-
One important thing to highlight is that all is lazy. Unlike R, priate type, pmin (pmax) generates a sugar expression of the same
there is no need to fully evaluate the expression. In the example type whose i-th element expands to the lowest (highest) value be-
above, the result of all is fully resolved after evaluating only the tween the i-th element of the first expression and the i-th element of
two first indices of the expression x * x < 3. any is lazy too, so the second expression.
it will only need to resolve the first element of the example above.

2 | https://cran.r-project.org/package=Rcpp Eddelbuettel and François


IntegerVector x = seq_len(10); i-th element of the result of diff is the difference
3.2.9. diff. The
between the (i + 1)th and the i-th element of the input expression.
pmin(x, x*x); Supported types are integer and numeric.
pmin(x*x, 2); IntegerVector xx;

pmin(x, x*x); diff(xx)


pmin(x*x, 2);
3.3. Mathematical functions. For the following set of functions,
3.2.5. ifelse. Given a logical sugar expression and either : generally speaking, the i-th element of the result of the given func-
tion (say, abs) is the result of applying that function to this i-th el-
• two compatible sugar expression (same type, same size) ement of the input expression. Supported types are integer and
numeric.
• one sugar expression and one compatible primitive
IntegerVector x;
ifelse expands to a sugar expression whose i-th
element is the i-th element of the first expression if the i-th element abs(x)
of the condition expands to TRUE or the i-th of the second expres- exp(x)
sion if the i-th element of the condition expands to FALSE, or the floor(x)
appropriate missing value otherwise. ceil(x)
pow(x, z) // x to the power of z
IntegerVector x;
IntegerVector y;
3.4. The d/q/p/r statistical functions. The framework provided by
ifelse(x < y, x, (x+y)*y) Rcpp sugar also permits easy and efficient access the density, distri-
ifelse(x > y, x, 2) bution function, quantile and random number generation functions
function by R in the Rmath library.
Currently, most of these functions are vectorised for the first
3.2.6. sapply. sapply applies a C++ function to each element of element which denote size. Consequently, these calls works in C++
the given expression to create a new expression. The type of the just as they would in R:
resulting expression is deduced by the compiler from the result
type of the function. x1 = dnorm(y1, 0, 1); // density of y1 at m=0, sd=1
The function can be a free C++ function such as the overload x2 = qnorm(y2, 0, 1); // quantiles of y2
generated by the template function below: x3 = pnorm(y3, 0, 1); // distribution of y3
x4 = rnorm(n, 0, 1); // 'n' RNG draws of N(0, 1)
template <typename T>
T square(const T& x){ Similar d/q/p/r functions are provided for the most common
return x * x; distributions: beta, binom, cauchy, chisq, exp, f, gamma, geom,
} hyper, lnorm, logis, nbeta, nbinom, nbinom_mu, nchisq, nf, norm,
sapply(seq_len(10), square<int>); nt, pois, t, unif, and weibull.
Note that the parameterization used in these sugar functions
Alternatively, the function can be a functor whose type has a may differ between the top-level functions exposed in an R session.
nested type called result_type For example, the internal rexp is parameterized by scale, whereas
the R-level stats::rexp is parameterized by rate. Consult Dis-
template <typename T> tribution Functions for more details on the parameterization used
struct square : std::unary_function<T, T> { for these sugar functions.
T operator()(const T& x){ One point to note is that the programmer using these functions
return x * x; needs to initialize the state of the random number generator as
} detailed in Section 6.3 of the ‘Writing R Extensions’ manual (R
} Core Team, 2018). A nice C++ solution for this is to use a scoped
sapply(seq_len(10), square<int>()); class that sets the random number generatator on entry to a block
and resets it on exit. We offer the RNGScope class which allows
3.2.7. lapply. lapplyis similar to sapply except that the result is code such as
allways an list expression (an expression of type VECSXP). RcppExport SEXP getRGamma() {
RNGScope scope;
3.2.8. sign. Given
a numeric or integer expression, sign expands to NumericVector x = rgamma(10, 1, 1);
an expression whose values are one of 1, 0, -1 or NA, depending return x;
on the sign of the input expression. }
IntegerVector xx;
As there is some computational overhead involved in using
sign(xx) RNGScope, we are not wrapping it around each inner function.
sign(xx * xx) Rather, the user of these functions (i.e. you) should place an
RNGScope at the appropriate level of your code.

Eddelbuettel and François Rcpp Vignette | November 8, 2019 | 3


4. Performance this)->size();
TBD }

/* definition ommited here */


5. Implementation
class iterator;
This section details some of the techniques used in the implemen-
tation of Rcpp sugar . Note that the user need not to be familiar inline iterator begin() const {
with the implementation details in order to use Rcpp sugar , so this return iterator(*this, 0);
section can be skipped upon a first read of the paper. }
Writing Rcpp sugar functions is fairly repetitive and follows a inline iterator end() const {
well-structured pattern. So once the basic concepts are mastered return iterator(*this, size());
(which may take time given the inherent complexities in template }
programming), it should be possible to extend the set of function }
further following the established pattern.
The VectorBase template has three parameters:
5.1. The curiously recurring template pattern. Expression tem-
• RTYPE: This controls the type of expression (INTSXP, REALSXP,
plates such as those used by Rcpp sugar use a technique called the
...)
Curiously Recurring Template Pattern (CRTP). The general form of
• na: This embeds in the derived type information about
CRTP is:
whether instances may contain missing values. Rcpp vector
// The Curiously Recurring Template Pattern (CRTP) types (IntegerVector, . . . ) derive from VectorBase with
template <typename T> this parameter set to true because there is no way to know
struct base { at compile-time if the vector will contain missing values at
// ... run-time. However, this parameter is set to false for types
}; that are generated by sugar expressions as these are guaran-
struct derived : base<derived> { teed to produce expressions that are without missing values.
// ... An example is the is_na function. This parameter is used in
}; several places as part of the compile time dispatch to limit the
occurence of redundant operations.
The base class is templated by the class that derives from it : • VECTOR: This parameter is the key of Rcpp sugar . This is the
derived. This shifts the relationship between a base class and a manifestation of CRTP. The indexing operator and the size
derived class as it allows the base class to access methods of the method of VectorBase use a static cast of this to the VECTOR
derived class. type to forward calls to the actual method of the derived class.

5.2. The VectorBase class. The CRTP is used as the basis for Rcpp 5.3. Example: sapply. As an example, the current imple-
sugar with the VectorBase class template. All sugar expression mentation of sapply, supported by the template class
derive from one class generated by the VectorBase template. The Rcpp::sugar::Sapply is given below:
current definition of VectorBase is given here:
template <int RTYPE, bool NA,
template <int RTYPE, bool na, typename VECTOR> typename T, typename Function>
class VectorBase { class Sapply : public VectorBase<
public: Rcpp::traits::r_sexptype_traits< typename
struct r_type : ::Rcpp::traits::result_of<Function>::type
traits::integral_constant<int,RTYPE>{}; >::rtype,
struct can_have_na : true,
traits::integral_constant<bool,na>{}; Sapply<RTYPE, NA, T, Function>
> {
typedef typename public:
traits::storage_type<RTYPE>::type typedef typename
stored_type; ::Rcpp::traits::result_of<Function>::type;

VECTOR& get_ref(){ const static int RESULT_R_TYPE =


return static_cast<VECTOR&>(*this); Rcpp::traits::r_sexptype_traits<
} result_type>::rtype;

inline stored_type operator[](int i) const { typedef Rcpp::VectorBase<RTYPE,NA,T> VEC;


return static_cast<const VECTOR*>(
this)->operator[](i); typedef typename
} Rcpp::traits::r_vector_element_converter<
RESULT_R_TYPE>::type
inline int size() const { converter_type;
return static_cast<const VECTOR*>(

4 | https://cran.r-project.org/package=Rcpp Eddelbuettel and François


typedef typename Rcpp::traits::storage_type< The second definition is a partial specialization targetting func-
RESULT_R_TYPE>::type STORAGE; tion pointers.

Sapply(const VEC& vec_, Function fun_) : 5.3.3. Indentification of expression type. Based
on the result type of
vec(vec_), fun(fun_){} the function, the r_sexptype_traits trait is used to identify the
expression type.
inline STORAGE operator[]( int i ) const { const static int RESULT_R_TYPE =
return converter_type::get(fun(vec[i])); Rcpp::traits::r_sexptype_traits<
} result_type>::rtype;
inline int size() const {
return vec.size(); 5.3.4. Converter. The
r_vector_element_converter class is used
} to convert an object of the function’s result type to the actual
storage type suitable for the sugar expression.
private:
typedef typename
const VEC& vec;
Rcpp::traits::r_vector_element_converter<
Function fun;
RESULT_R_TYPE>::type
};
converter_type;
// sugar
5.3.5. Storage type. The
storage_type trait is used to get access
template <int RTYPE, bool _NA_, to the storage type associated with a sugar expression type. For
typename T, typename Function > example, the storage type of a REALSXP expression is double.
inline sugar::Sapply<RTYPE, _NA_, T, Function>
sapply(const Rcpp::VectorBase<RTYPE,_NA_,T>& t, typedef typename
Function fun) { Rcpp::traits::storage_type<RESULT_R_TYPE>::type
STORAGE;
return
sugar::Sapply<RTYPE,_NA_,T,Function>(t, fun); 5.3.6. Input expression base type. The input expression—the expres-
} sion over which sapply runs—is also typedef’ed for convenience:

5.3.1. The sapply function. sapplyis a template function that takes typedef Rcpp::VectorBase<RTYPE, NA, T> VEC;
two arguments. The first argument is a sugar expression, which we
recognize because of the relationship with the VectorBase class 5.3.7. Output expression base type. In order to be part of the Rcpp
template. The second argument is the function to apply. sugar system, the type generated by the Sapply class template
The sapply function itself does not do anything, it is just used must inherit from VectorBase.
to trigger compiler detection of the template parameters that will
be used in the sugar::Sapply template. template <int RTYPE, bool NA,
typename T, typename Function>
5.3.2. Detection of return type of the function. In
order to decide which class Sapply : public VectorBase<
kind of expression is built, the Sapply template class queries the Rcpp::traits::r_sexptype_traits<
template argument via the Rcpp::traits::result_of template. typename
::Rcpp::traits::result_of<Function>::type
typedef typename >::rtype,
::Rcpp::traits::result_of<Function>::type true,
result_type; Sapply<RTYPE,NA,T,Function>
>
The result_of type trait is implemented as such:
The expression built by Sapply depends on the result type of
template <typename T> the function, may contain missing values, and the third argument
struct result_of{ is the manifestation of the CRTP.
typedef typename T::result_type type;
}; 5.3.8. Constructor. The constructor of the Sapply class template is
straightforward, it simply consists of holding the reference to the
template <typename RESULT_TYPE, input expression and the function.
typename INPUT_TYPE>
struct result_of<RESULT_TYPE (*)(INPUT_TYPE)> { Sapply(const VEC& vec_, Function fun_):
typedef RESULT_TYPE type; vec(vec_), fun(fun_){}
};
private:
const VEC& vec;
The generic definition of result_of targets functors with a
Function fun;
nested result_type type.

Eddelbuettel and François Rcpp Vignette | November 8, 2019 | 5


5.3.9. Implementation. Theindexing operator and the size member
function is what the VectorBase expects. The size of the result
expression is the same as the size of the input expression and the i th
element of the result is simply retrieved by applying the function
and the converter. Both these methods are inline to maximize
performance:

inline STORAGE operator[](int i) const {


return converter_type::get(fun(vec[i]));
}
inline int size() const {
return vec.size();
}

6. Summary
TBD

References
Abrahams D, Gurtovoy A (2004). C++ Template Metaprogramming: Concepts,
Tools and Techniques from Boost and Beyond. Addison-Wesley, Boston.
Eddelbuettel D, François R (2011). “Rcpp: Seamless R and C++ Integration.”
Journal of Statistical Software, 40(8), 1–18. URL http://www.jstatsoft.org/v40/
i08/.
Eddelbuettel D, François R, Allaire J, Ushey K, Kou Q, Russel N, Chambers J,
Bates D (2019a). Rcpp: Seamless R and C++ Integration. R package version
1.0.3, URL http://CRAN.R-Project.org/package=Rcpp.
Eddelbuettel D, François R, Bates D, Ni B (2019b). RcppArmadillo: Rcpp
integration for Armadillo templated linear algebra library. R package version
0.9.800.1.0, URL http://CRAN.R-Project.org/package=RcppArmadillo.
R Core Team (2018). Writing R extensions. R Foundation for Statistical Com-
puting, Vienna, Austria. URL http://CRAN.R-Project.org/doc/manuals/R-exts.
html.
Sanderson C (2010). “Armadillo: An open source C++ Algebra Library for Fast
Prototyping and Computationally Intensive Experiments.” Technical report,
NICTA. URL http://arma.sf.net.
Vandevoorde D, Josuttis NM (2003). C++ Templates: The Complete Guide.
Addison-Wesley, Boston.

6 | https://cran.r-project.org/package=Rcpp Eddelbuettel and François

You might also like