REQUIREMENTS - Us and Them - A Study of Privacy Requirements Across North America, Asia, and Europe
REQUIREMENTS - Us and Them - A Study of Privacy Requirements Across North America, Asia, and Europe
ABSTRACT                                                                                                  1.   INTRODUCTION
Data privacy when using online systems like Facebook and                                                     As systems that collect and use personal data, such as
Amazon has become an increasingly popular topic in the last                                               Facebook and Amazon, become more pervasive in our daily
few years. However, only a little is known about how users                                                lives, users are starting to worry about their privacy. There
and developers perceive privacy and which concrete measures                                               has been a lot of media coverage about data privacy. One of
would mitigate their privacy concerns. To investigate privacy                                             the earliest articles in the New York Times reported how it
requirements, we conducted an online survey with closed                                                   was possible to break the anonymity of AOL’s search engine’s
and open questions and collected 408 valid responses. Our                                                 users [7]. A more recent article mentions privacy concerns
results show that users often reduce privacy to security,                                                 about Google Glass [29]. Both technical and, especially, non-
with data sharing and data breaches being their biggest                                                   technical users are finding it increasingly hard to navigate
concerns. Users are more concerned about the content of                                                   this privacy minefield [21]. This is further exacerbated by
their documents and their personal data such as location                                                  well-known systems periodically making changes that breach
than about their interaction data. Unlike users, developers                                               privacy and not allowing users to opt out a-priori [19].
clearly prefer technical measures like data anonymization                                                    There is a large body of research on privacy in vari-
and think that privacy laws and policies are less effective.                                              ous research communities. This ranges from data anony-
We also observed interesting differences between people from                                              mization techniques in different domains [13, 23, 35, 44] to
different geographies. For example, people from Europe                                                    novel approaches to make privacy settings more understand-
are more concerned about data breaches than people from                                                   able [18, 34]. Recent studies have shown that there is a
North America. People from Asia/Pacific and Europe believe                                                discrepancy between users’ intentions and reality for privacy
that content and metadata are more critical for privacy                                                   settings [24, 27]. The assumption behind most of this work
than people from North America. Our results contribute to                                                 is that privacy is well-specified and important. However,
developing a user-driven privacy framework that is based on                                               there is very little evidence about what exactly are the user
empirical evidence in addition to the legal, technical, and                                               concerns, priorities, and trade-offs, and how users think these
commercial perspectives.                                                                                  concerns can be mitigated. In particular, in the software en-
                                                                                                          gineering community, there have been no systematic studies
Categories and Subject Descriptors                                                                        to find out what privacy requirements are and how these
                                                                                                          requirements should be addressed by developers.
D.2.1 [Software Engineering]: Requirements/Specifica-                                                        This research aims to understand the privacy expectations
tions; K.4.1 [Computers and Society]: Public Policy Is-                                                   and needs for modern software systems. To this end, we
sues—Privacy                                                                                              conducted an online survey. We received 595 responses and
                                                                                                          selected 408 of them as valid. The responses represented
General Terms                                                                                             diverse populations including developers and users, and peo-
Human Factors                                                                                             ple from North America, Europe, and Asia. The results of
                                                                                                          our study show that the biggest privacy concerns are data
                                                                                                          sharing and data breaches. However, there is a disagreement
Keywords                                                                                                  on the best approach to address these concerns. With respect
Human factors in software engineering, requirements engi-                                                 to types of data that are critical for privacy, respondents are
neering, privacy, user developer collaboration, interaction                                               least concerned about metadata and interaction data and
data, empirical studies                                                                                   most concerned about their personal data and the content of
                                                                                                          documents. Most respondents are not willing to accept less
                                                                                                          privacy in exchange for fewer advertisements and financial
Permission to make digital or hard copies of all or part of this work for personal or                     incentives such as discounts on purchases.
classroom
Permission  usetois make
                    granteddigital
                             withoutorfeehard
                                          provided   thatofcopies
                                                copies        all orarepart
                                                                        not made  or work
                                                                            of this  distributed
                                                                                            for
for profit or commercial advantage and that copies bear this notice and the full citation                    The main contribution of this paper is threefold. First,
personal    or classroom      use  is granted     without    fee  provided    that
on the first page. Copyrights for components of this work owned by others than ACM copies   are           it illustrates and quantifies the general trends on how users
not made
must         or distributed
      be honored.   Abstractingforwith
                                    profit or iscommercial
                                       credit     permitted. Toadvantage      and that
                                                                  copy otherwise,       copies
                                                                                   or republish,
to postthis
bear     on notice
            servers and
                     or tothe
                           redistribute to lists,
                               full citation  onrequires
                                                   the firstprior specific
                                                              page.         permission
                                                                      To copy           and/ortoa
                                                                                otherwise,
                                                                                                          understand privacy and on how they assess different privacy
fee. Requesttopermissions
republish,                   from [email protected].
                 post on servers     or to redistribute to lists, requires prior specific                 concerns and measures to address them. Second, the paper
permissionMay
ICSE’14,    and/or
                31 –a June
                      fee. 7, 2014, Hyderabad, India                                                      identifies differences in privacy expectations between various
ICSE ’14, May
Copyright  201431ACM– June  7, 2014, Hyderabad, India
                        978-1-4503-2756-5/14/05...$15.00                                                  groups: developers versus users and people from different
http://dx.doi.org/10.1145/2568225.2568244
Copyright 2014 ACM 978-1-4503-2756-5/14/05 ...$15.00.
                                                                                                    859
geographic regions. Finally, the paper gives insights into how            2.2    Research Method
software developers and managers can identify, analyze, and                  We designed an online survey with 16 questions, which
address privacy concerns of their users – building a first step           took 5–10 minutes to answer. Out of the 16 questions, 14
towards a software engineering privacy framework.                         were closed and respondents had to choose an answer from a
   Our analysis for geographic regions, for example, shows                list of options. The survey also had two open-ended questions.
that there is a significant difference between respondents                This helped us get qualitative insights about privacy and
from North America, Europe, and Asia/Pacific. People                      gave an opportunity for respondents to report aspects that
from Europe and Asia/Pacific rate different types of data                 were not already included in the closed questions.
such as metadata, content, and interaction data being a                      We chose a survey instead of observations or interviews for
lot more critical for privacy than respondents from North                 the following reasons. First, surveys are scalable and allow
America. People from Europe are a lot more concerned about                to get a large number and broad cross-section of responses.
data breaches than data sharing whereas people from North                 Second, we were interested in the subjective opinion of people
America are equally concerned about the two. Similarly, our               and this can be different from real behavior. Third, the
analysis for developers versus users shows a marked difference            closed questions were purely quantitative and allowed us to
between the two groups. For example, developers believe                   analyze general trends and correlations. We did not aim for a
that privacy laws and policies are less effective for reducing            representative report of the opinions. This would have been
privacy concerns than data anonymization.                                 possible only through a well-defined population and random
   The rest of the paper is organized as follows. Section 2               representative sampling. Instead, we were interested in the
describes the design of our study. Sections 3, 4, and 5 high-             priority trends and inter-relations, which can be analyzed
light its key results. Section 6 discusses the implications of            through a cross-tabulation of the survey answers.
the results and their limitations. Finally, Section 7 describes              We used semantic scales for the closed questions, allowing
related work and Section 8 concludes the paper.                           for the measurement of subjective assessments while giving
                                                                          respondents some flexibility of the interpretation [37]. For
2.    STUDY DESIGN                                                        example, one question was: “Would users be willing to use
                                                                          your system if they are worried about privacy issues?” and
  We describe the research questions, methods, and respon-
                                                                          the answer options were: “Definitely yes — Users don’t care
dents of our study.
                                                                          about privacy”, “Probably yes”, “Unsure”, “Probably not”,
2.1     Research Questions                                                and “Definitely not — if there are privacy concerns, users
                                                                          will not use this system”. To reduce the complexity of matrix
   There have been many different definitions of privacy over
                                                                          questions (which include multiple answer options) we used
time. One of the earliest definitions was the “right to be
                                                                          a 3-point scale consisting of “Yes”, “No”, and “Uncertain”.
left alone” as described by Warren and Brandeis [50]. Solove
                                                                          When we observed in the dry runs that higher discriminative
claims that “privacy is an umbrella term, referring to a wide
                                                                          powers were needed, we used a 5-point scale [22].
and disparate group of related things”. The author proposes
                                                                             Respondents could choose to fill out our survey in two
a taxonomy of privacy in the context of harmful activities
                                                                          languages: English or German. For each language, there
such as information collection, information processing, infor-
                                                                          were two slightly different versions based on whether the
mation dissemination, and invasion [38]. According to the
                                                                          respondents had experience in software development or not.
Merriam-Webster dictionary, privacy is the “freedom from
                                                                          The difference in the versions was only in the phrasing of
unauthorized intrusion”. We are interested specifically in
                                                                          the questions in order to reduce confusion. For example,
data privacy and other notions of privacy such as physical
                                                                          developers were asked: “Would users be willing to use your
privacy are beyond the scope of our work.
                                                                          system if they are worried about privacy issues?” whereas
   The goal of this study is to gather and analyze privacy
                                                                          users were asked: “Would you be willing to use the system if
requirements for modern software systems. In particular, we
                                                                          you are worried about privacy issues?”
want to study the perception of different groups of people
                                                                             To increase the reliability of the study [37], we took the
on privacy. We focused on the following research questions:
                                                                          following measures:
     • RQ 1: What are developers’ and users’ perceptions                     • Pilot Testing: We conducted pilot testing in four itera-
       of privacy? What aspects of privacy are more impor-                     tions with a total of ten users that focused on improving
       tant and what are the best measures to address them?                    the timing and understandability of the questions. We
       (Section 3)                                                             wanted to reduce ambiguity about the questions and
                                                                               answers and ensure that none of the semantics were lost
     • RQ 2: Does software development experience have any
                                                                               in translation. We used the feedback from pilot testing
       impact on privacy requirements? (Section 4)
                                                                               to improve the phrasing and the order of questions for
     • RQ 3: Does geography have any impact on privacy                         the English and German versions.
       requirements? (Section 5)                                             • Random order of answers: The answer options for the
                                                                               closed questions were randomly ordered. This ensures
  By perception, we mean the subjective understanding and                      that answer order does not influence the response [48].
assessment of privacy aspects. Since privacy is a very broad                 • Validation questions: To ensure that respondents did
term, we are interested in specific aspects, in particular, types              not fill out the answers arbitrarily, we included two
of concerns, measures to mitigate these concerns, types of                     validation questions [3]. For example, one of the val-
data that are critical to privacy, and whether people would                    idation questions was: “What is the sum of 2 and 5?”
give up privacy. We think these aspects are most related to                    Respondents who did not answer these correctly were
software and requirements engineering concerns.                                not included in the final set of valid responses.
                                                                    860
                                                                         chose “Important”, and the remaining three options (“Aver-
Table 1: Summary of study respondents based on                           age”, “Less Important”, “Least Important”) combined were
location and software development experience                             chosen by a total of 8.1% of the respondents.
                         Developers Users                                  The location of the data storage was a key concern for
             North America         85          44                        the respondents. We asked respondents whether privacy
             Europe                116         65                        concerns depend on the location of where the data is stored
             Asia                   61         30                        and provided a 5-point semantic scale with options: “Yes”,
             South America           3          2                        “Maybe yes”, “Unsure”, “Maybe not”, and “No”. 57.7% of
             Africa                 2           0                        the respondents chose “Yes”, 28.6% chose “Maybe yes”, while
                                                                         only 13.7% of the respondents chose the remaining three
                                                                         options.
                                                                           On the other hand, there was disagreement about whether
     • Post sampling: We monitored the number of respon-
                                                                         users would be willing to use such systems if there were
       dents from each category of interest: developers, users,
                                                                         privacy concerns. The answer options were: “Definitely yes
       and geographic location. We conducted post-sampling
                                                                         — Users don’t care about privacy”, “Probably yes”, “Unsure”,
       and stratification to ensure that we got sufficient re-
                                                                         “Probably not”, and “Definitely not — if there are privacy
       sponses for each category and that the ratio of develop-
                                                                         concerns, users will not use this system”. 20.8% of the
       ers to users for each geographic location was roughly
                                                                         respondents choose “Unsure”, while 34.8% and 29.4% chose
       similar. For categories that did not have sufficient re-
                                                                         “Probably yes” and “Probably not” respectively.
       spondents, we targeted those populations by posting
       the survey in specific channels. We stopped data col-             3.1    Factors that Increase and Reduce Privacy
       lection when we had a broad spectrum of respondents                      Concerns
       and sufficient representation in all the categories.
                                                                           We asked respondents if the following factors would in-
  Finally, to corroborate our results, we conducted a number             crease privacy concerns:
of statistical tests. In particular, we used the Z-test for                 • Data Aggregation: The system discovers additional
equality of proportions [40] and Welch’s Two Sample t-test                    information about the user by aggregating data over a
to check if our results are statistically significant.                        long period of time.
2.3     Survey Respondents                                                  • Data Distortion: The system might misrepresent the
  We did not have any restrictions on who could fill out the                  data or user intent.
survey. We wanted, in particular, people with and without                   • Data Sharing: The collected data might be given to
software development experience and people from different                     third parties for purposes like advertising.
parts of the world. We distributed our survey through a                     • Data Breaches: Malicious users might get access to
variety of channels including various mailing lists, social                   sensitive data about other users.
networks like Facebook and Twitter, personal contacts, and
colleagues. We circulated the survey across companies with                 For each concern, the respondents could answer using a
which we are collaborating. We also asked specific people                3-point semantic scale with the options: “Yes”, “Uncertain”,
with many contacts (e.g., with many followers on Twitter) to             and “No”. We also asked respondents if the following would
forward the survey. As an incentive, two iPads were raffled              help to reduce concerns about privacy:
among the respondents.
                                                                            • Privacy Policy, License Agreements, etc.: Describing
  In total, 595 respondents filled out our survey between
                                                                              what the system will/won’t do with the data.
November 2012 and September 2013. Filtering out the incom-
plete and invalid responses resulted in 408 valid responses                 • Privacy Laws: Describing which national law the sys-
(68.6%). Table 1 shows the respondents based on location                      tem is compliant with (e.g., HIPAA in the US, European
and software development experience. The four versions of                     privacy laws).
the survey along with raw data and summary information                      • Anonymizing all data: Ensuring that none of the data
are available on our website1 . Among the respondents, 267                    has any personal identifiers.
have software development experience and 141 do not. For                    • Technical Details: Describing the algorithms/source
respondents with development experience, 28 have less than                    code of the system in order to achieve higher trust (e.g.,
one year of experience, 129 have 1-5 years, 57 have 5-10                      encryption of data).
years, and 53 have more than ten years of experience. 129                   • Details on usage: Describe, e.g., in a table how different
respondents live in North America, 181 in Europe, and 91 in                   data are used.
Asia/Pacific. 166 are affiliated with industry or public sector,
182 are in academia and research, and 56 are students.                      Figure 1 shows the overall answers for both questions.
                                                                         In the figure, each answer option is sorted by the number
3.    PRIVACY PERCEPTIONS                                                of “Yes” respondents. Most respondents agreed that the
                                                                         biggest privacy concerns are data breaches and data sharing.
  We asked respondents: “How important is the privacy issue              There is disagreement about whether data distortion and
in online systems?” They answered using a 5-point semantic               data aggregation would increase privacy concerns. To check
scale ranging from “Very important” to “Least important”.                if these results are statistically significant, we ran Z-tests
Two thirds of the respondents chose “Very Important”, 25.3%              for equality of proportions. This would help us validate, for
1                                                                        example, if there is a statistically significant difference in
  http://mobis.informatik.uni-hamburg.de/privacy-
requirements/                                                            the number of respondents who said “Yes” for two different
                                                                   861
                                                              Increase Concern
                Breach
                Sharing
              Distortion
           Aggregation
                                                              Reduce Concern
         Anonymization
                 Usage
                  Laws
                 Policy
       Technical details
                                                                                                                      # responses
                       200          100              0              100             200            300          400
                                                         No         Uncertain             Yes
                                                                   862
is activated by default in applications (unaware users would            The suggestions can be grouped into the following measures:
leave it so)”. One respondent wrote: “Transparency and
letting the user choose make a huge difference. Maybe not               Easy and fine-grained control over the data, includ-
in the beginning and maybe not for all users but definitely             ing access and deletion: 17 respondents recommended
for a specific user group”.                                             allowing the users to easily access and control the collected
                                                                        and processed data about them. In particular, respondents
 Intentional or unintentional misuse: At least seven re-                mentioned the options of deactivating the collection and
 spondents mentioned different forms of misusing the data               deleting the data. One respondent wrote: “To alleviate pri-
 as main concerns. This includes commercial misuse such as              vacy concerns, it should be possible to opt out of or disagree
 making products of interest more expensive, but it could               with certain terms”. Another wrote: “Allow users to access
 also be misused for social and political purposes. Apart               a summary of all the data stored on their behalf, and allow
 from abusing the data to put pressure on users, respondents            them to delete all or part of it if they desire”. The respon-
 mentioned using fake data to manipulate public opinions or             dents also highlighted that this should be simple and easy to
 inferencing sensitive information about groups of people and           do and embedded into the user interface at the data level.
 minorities. One respondent wrote: “Whenever something
 happen [sic] the media uses their data accessible online to            Certification from independent trusted organizations:
‘sell’ this person as good or evil”.                                    14 respondents suggested introducing a privacy certification
                                                                        mechanism by independent trusted authorities. A few also
Lack of control: Seven respondents mentioned the lack of                suggested the continuous conduction of privacy audits similar
control and in particular, options to delete data collected             to other fields such as safety and banking. Respondents also
about them as their main concern. One wrote: “if we agree               suggested that the results of the checks and audits should
to give the data, we are not able anymore to revise this                be made public to increase the pressure on software vendors.
decision and delete the data. Even if the service confirms              One respondent even suggested “having a privacy police to
the deletion, we don’t have any mean of control”. Another               check on how data is handled”.
respondent explicitly mentioned the case where companies
owning their data are bankrupt or sold and in this case, the            Transparency and risk communication, open source:
privacy of their data is also lost: “Company A has a decent             13 respondents mentioned increased transparency about the
privacy policy, Company B acquires the company, and in                  collection, aggregation, and sharing of the data. In par-
doing so, now has access to Company A’s data”.                          ticular, respondents mentioned that the risks of misusing
                                                                        the data should be also communicated clearly and continu-
Combined data sources: Five respondents explicitly men-                 ously. Three respondents suggested that making the code
tioned combining data about users from different sources                open source would be the best approach for transparency.
as a main privacy concern. In most cases, this cannot be                One wrote: “tell users (maybe in the side-bar) how they are
anticipated when developing or using a single system or a               being tracked. This would educate the general public and
service. One respondent wrote: “It’s difficult to anticipate or         ideally enable them to take control of their own data”. The
assess the privacy risk in this case”. Another claimed: “Con-           spectrum of transparency was from the data being collected
tinuous monitoring combined with aggregation over multiple              to physical safety measures of servers and qualifications of
channels or sources leads to complex user profiling. It’s dis-          people handling data to how long the data is stored.
turbing to know that your life is monitored on so many levels”.
                                                                        Period and amount of data: 11 respondents recom-
Collecting and storing data: Five respondents wrote that                mended always limiting and minimizing the amount of data
collecting and storing data is, on its own, a privacy concern.          and the period of storage, referring to the principle of min-
In particular, respondents complained about too much data               imality. The period of time for storing the data seems to
being collected about them and stored for too long time. One            be crucial for users. One wrote: “Not allowing users data
respondent mentioned: “The sheer amount of cookies that                 being stored in servers. Just maintaining them in the source”.
are placed on my computer just by landing on their website”.
Another claimed: “Collecting the data and storing for a long            Security and encryption: We noted that respondents
period of time is seen more critical than just collecting”.             strongly relate privacy issues to information security. At
                                                                        least seven suggested security measures, mainly complete
Other issues: Three respondents mentioned problems with                 encryption of data and communication channels.
the legal framework and in particular, the compatibility of
laws in the developer and user countries. Three respondents             Trust and education: Seven respondents mentioned build-
said that in some cases there is no real option to not use a            ing trust in the system and vendor as well as education of
system or service, e.g., due to a “social pressure as all use           users on privacy as effective means to reduce privacy concerns.
Facebook” or since “we depend on technology”.
                                                                        Short, usable, precise and understandable descrip-
 3.2.2   Suggestions for Reducing Privacy Concerns                      tion, in the UI: At least six respondents mentioned increas-
                                                                        ing the usability to access data and policy as an important
  In total, 69 respondents answered the open question on
                                                                        measure to reduce privacy concerns. One wrote: “the dis-
additional measures to reduce user concerns about privacy.
                                                                        claimer should be directly accessible from the user interface
Ten of these answers either repeated the standard options or
                                                                        when conducting a function which needs my data”. Another
were useless. The remaining 59 comments showed more con-
                                                                        respondent wrote: “short understandable description and no
vergence in the opinion than the comments on the additional
                                                                        long complicated legal text”.
concerns, possibly because this question was more concrete.
                                                                  863
              Content
Personal Data
Location
Preferences
Interaction
Metadata
                                                                                                                               # responses
                                   200                     100                             0                            100
                         Very Critical      Critical      Neutral           Somewhat Uncritical            Uncritical
Figure 2: How critical would you rate the collection of the following data?
   • Content of documents (such as email body)                            Figure 3: Would users accept less privacy for the
   • Metadata (such as date)                                              following?
   • Interaction (such as a mouse click to open or send an
     email)
   • User location (such as the city from where the email                    36.7% of the respondents said they would accept less pri-
     was sent)                                                            vacy for added functionality of the system while only 20.7%
                                                                          and 13.7% would accept less privacy for monetary discounts
   • Name or personal data (such as email address)
                                                                          and fewer advertisements respectively. Added functionality
   • User preferences (such as inbox or email settings)                   seems to be the most important reason to accept less privacy.
   The results are shown in Figure 2. Respondents chose                   These results are statistically significant using the Z-test for
content as most critical, followed by personal data, loca-                equality of proportions (p < 3.882e−5 for monetary discounts
tion, preferences, and interaction and metadata are the least             and p < 1.854e−9 for fewer advertisements). It is important
critical as far as privacy is concerned.                                  to note that less than half of the respondents would accept
   We used Welch’s Two Sample t-test to compare if the                    less privacy for added functionality of the system.
difference among the different types of data is statistically                Previous studies, such as the one conducted by Acquisti et
significant. The null hypothesis was that the difference in               al. [1], have shown, however, that people’s economic valua-
means was equal to zero. Table 3 summarizes the results. It               tions of privacy vary significantly and that people do accept
shows, for example, that there is no statistically significant            less privacy for monetary discounts. This contrast in results
difference between content and personal data. On the other                might be due to a difference between people’s opinion and
hand, there is a statistically significant difference between             their actual behavior.
content and location for p < 0.01.                                          Hypothesis 4: There are different groups of opinions
  Hypothesis 3: People are more concerned about content                   about accepting less privacy for certain benefits. The largest
and personal data than interaction and metadata.                          group of users say that they are not inclined to give up privacy
                                                                          for additional benefits. However, their actual behavior might
3.4    Giving up Privacy                                                  be different.
  We asked respondents if they would accept less privacy
for the following:
                                                                          4.    DEVELOPER VS USER PERCEPTIONS
   • Monetary discounts (e.g., 10% discount on the next                     The results from the previous section describe the broad
     purchase)                                                            concerns for all respondents of our survey. In this section, we
   • “Intelligent” or added functionality of the system (such             report on the important results from a differential analysis
     as the Amazon recommendations)                                       between two groups of respondents: developers (267 out of
   • Fewer advertisements                                                 408) versus users of software systems (141 out of 408). We
                                                                          used Z-tests for equality of proportions for the rest of this
  For each option, the respondents could answer using a                   section, unless otherwise noted.
3-point semantic scale having options: “Yes”, “Uncertain”,
and “No”. The results are shown in Figure 3.
                                                                    864
Table 3: The significance in the difference between the criticality of collecting different data. p-values: ‘+ + +’
for p < e−11 , ‘++’ for p < e−6 , ‘+’ for p < 0.01, and ‘ ’ for p > 0.01.
The rows and columns are ordered from most to least critical. For each cell, t-tests compare if the difference in criticality is
 statistically significant. For example, the difference between interaction and content is statistically significant for p < e−11 .
                                   Content Personal Data Location Preferences Interaction Metadata
                 Content               –
                 Personal Data                        –
                 Location             +               +                –
                 Preferences         +++             ++               +               –
                 Interaction         +++            +++               ++             +               –
                 Metadata            +++            +++              +++             ++              +             –
4.2    Measures to Reduce Concerns                                        Less privacy for added functionality: A larger fraction
Developers and reducing concerns: A larger fraction of                    of respondents in Europe (50.6%) claim that they would not
developers (71.2%) feel that data anonymization is a better               give up privacy for added functionality. In North America, on
option to reduce privacy concerns as compared to privacy                  the other hand, this fraction is 24.1%. The difference between
policies or privacy laws (both, 56.9%) (p = 0.0006). 66.3% of             the two regions is statistically significant (p = 0.0001).
developers prefer providing details on data usage for mitigat-              Hypothesis 7: People from North America are more will-
ing privacy concerns compared to privacy policies (56.9%)                 ing to give up privacy and feel that different types of data
(p = 0.03).                                                               are less critical for privacy compared to people from Europe.
   Similarly, 20.2% of developers feel that privacy policies will
not reduce privacy concerns whereas only 11.2% feel that pro-             Concerns about data sharing versus data distortion:
viding details on data usage will not be beneficial (p = 0.004).          A larger fraction of respondents in North America (88.9%)
                                                                          feel that data sharing is a concern compared to 46.3% for
Users and reducing concerns: In contrast, for users,                      data distortion (p = 6.093e−6 ). On the other hand, there is
there is no statistically significant difference between their            no statistically significant difference among respondents in
perception on privacy policies, laws, anonymization, and                  Asia/Pacific (p > 0.67).
providing usage details. (0.6 < p for all combinations).
  Hypothesis 6: Developers prefer anonymization and pro-                  Concerns about data sharing versus data breach:
viding usage details as measures to reduce privacy concerns.              In Europe, a larger fraction of the respondents (94.3%)
Users, on the other hand, do not have a strong preference.                are concerned about data breaches as compared to 76.4%
                                                                    865
for data sharing. The difference is statistically significant             world as far as privacy is concerned. The recent NSA PRISM
(p = 5.435e−6 ). On the other hand, there is no statistically             scandal has also brought these differences into sharp focus. A
significant difference among respondents in North America                 majority of Americans considered NSA’s accessing personal
(p > 0.12).                                                               data to prevent terrorist threats more important that privacy
                                                                          concerns [14]. In contrast, there was widespread “outrage” in
Laws versus usage details: In Europe, a larger fraction                   Europe over these incidents [16]. It also led to an article in
of respondents (75.9%) feel that providing details on how                 the New York Times by Malte Spitz, a member of the German
the data is being used will reduce privacy concerns as op-                Green Party’s executive committee, titled “Germans Loved
posed to 58.1% who feel that privacy laws will be effective               Obama. Now We Don’t Trust Him” [39]. These differences,
(p = 0.00063). On the other hand, there is no statistically               both in terms of laws and people’s perceptions, should be
significant difference among respondents in North America,                considered carefully when designing and deploying software
where the percentage of respondents are 67.9% and 64.2%                   systems.
respectively (p > 0.43).                                                    We think that privacy should become an explicit require-
                                                                          ment, with measurable and testable criteria. We also think
Usage details versus privacy policies: A larger fraction                  that privacy should also become a main design criteria for
of respondents in Europe (75.9%) feel that providing usage                developers as software systems are collecting more and more
details can mitigate privacy concerns compared to 63.2% for               data about their users [15]. To this end, we feel that there is
using a privacy policy (p = 0.015). On the other hand, there              a need to develop a standard survey for privacy that software
is no statistically significant difference among respondents in           teams can customize and reuse for their projects and users.
North America (p > 0.32).                                                 Our survey can be reused to conduct additional user studies
                                                                          on privacy for specific systems. Our results can also serve as
  Hypothesis 8: People from Europe feel that providing                    a benchmark for comparing the data. This can help build
usage details can be more effective for mitigating privacy                a body of knowledge and provide guidelines such as best
concerns than privacy laws and privacy policies whereas                   practices.
people from North America feel that these three options are
equally effective.                                                        6.2    The Security Dimension of Privacy
                                                                             We think that people are more concerned about data
6.    DISCUSSION                                                          breaches and data sharing as there have been many recent
   We discuss our results, potential reasons, and the implica-            instances that have received a lot of media coverage. To list
tions for software developers and analysts. We also reflect               a few recent examples, Sony suffered a massive data breach
on the limitations and threats to validity of our results.                in its Playstation network that led to the theft of personal
                                                                          information belonging to 77 million users [6]. One hundred
6.1    Privacy Interpretation Gaps                                        and sixty million credit card numbers were stolen and sold
   Data privacy is often an implicit requirement: everyone                from various companies including Citibank, the Nasdaq stock
talks about it but no one specifies what it means and how it              exchange, and Carrefour [36]. The Federal Trade Commission
should be implemented. This topic also attracts the interests             publicly lent support to the “Do-Not-Track” system for adver-
of different stakeholders including users, lawyers, sales people,         tising [4]. Compared to these high-profile stories, we feel that
and security experts, which makes it even harder to define                there have been relatively few “famous” instances of privacy
and implement. One important result from our study is that                problems caused by data aggregation or data distortion yet.
while almost all respondents agree about the importance                      There is a large body of research that has advanced the
of privacy, the understanding of the privacy issues and the               state-of-the-art in security (encryption) and authorization.
measures to reduce privacy concerns are divergent. This calls             One short-term implication for engineers and managers is to
for an even more careful and distinguished analysis of privacy            systematically implement security solutions when designing
when designing and building a system.                                     and deploying systems that collect user data, even if it is not
   Our results from Sections 4 and 5 show there is a definite             a commercially or politically sensitive system. This would
gap in privacy expectations and needs between users and                   significantly and immediately reduce privacy concerns. For
developers and between people from different regions of the               the medium-term, more research should be conducted for
world. Developers have assumptions about privacy, which                   deployable data aggregation and data distortion solutions.
do not always correspond to what users need. Developers                      As far as mitigating privacy concerns, our results show
seem to be less concerned about data distortion and aggrega-              that there is more disagreement. We believe that the reason
tion compared to users. It seems that developers trust their              for this is that online privacy concerns are a relatively recent
systems more than users when it comes to wrong interpreta-                phenomenon. Due to this, people are not sure which approach
tion of privacy critical data. Unlike users, developers prefer            works best and might be beneficial in the long run.
anonymization and providing usage details for mitigating
privacy concerns. If the expectations and needs of users do               6.3    Privacy Framework
not match those of developers, developers might have wrong                   We feel that an important direction for software and re-
assumptions and might end up making wrong decisions when                  quirements engineering researchers is to develop a universal,
designing and building privacy-aware software systems.                    empirically grounded framework for collecting, analyzing,
   In addition, privacy is not a universal requirement as it              and implementing privacy requirements. This study is the
appears to have an internationalization aspect to it. Different           first building block towards such a framework. Some of the
regions seem to have different concrete requirements and                  lessons learned from our study can be translated into con-
understanding for privacy. Our results confirm that there                 crete qualities and features, which should be part of such a
exist many cultural differences between various regions of the            framework. This includes:
                                                                    866
   • Anonymization: This is perhaps the most well-known                  within our set of respondents, enabled us to identify sta-
     privacy mitigating technique and seems to be perceived              tistically significant relationships and correlations. Hence,
     as an important and effective measure by both users                 many of our results deliberately focus on correlations and
     and developers. Developers should therefore use anony-              cross-tabulations between different populations.
     mization algorithms and libraries.                                     As for internal validity, we are aware that by filling out
   • Data usage: Although anonymization is perceived                     a brief survey, we can only understand a limited amount
     as the most effective measure for addressing privacy                of concerns that the respondents have in mind. Similarly,
     concerns, this is currently not practical as approaches             the format and questions of the survey might constrain the
     like differential privacy are computationally infeasible            expressiveness of some of the respondents. We might have
     [30, 45]. In such situations, it is better to provide               missed certain privacy concerns and measures to reduce
     users with data usage details and make these more                   concerns by the design of the survey. We tried to mitigate
     transparent and easier to understand. Our findings                  this risk by providing open-ended questions that respondents
     show that there is no statistical difference between                could use to express additional aspects they had in mind.
     anonymization and providing usage details as far as                 Moreover, we designed the survey in a highly iterative process
     users are concerned. Thus, in terms of future research,             and tested it in dry runs to ensure that all options are
     it is perhaps better to focus on improving techniques for           understandable and that we did not miss any obvious option.
     providing data usage details rather than (or in addition               As with any online survey, there is a possibility that re-
     to) making anonymization computationally feasible.                  spondents did not fully understand the question or chose
   • Default encryption: As users are mainly concerned                   the response options arbitrarily. We conducted several pilot
     about the loss and abuse of their data, systems collect-            tests, gave the option to input comments, and the incomple-
     ing user data should implement and activate encryption              tion rate is relatively small. We included a few validation
     mechanism for storing and transmitting these data. In               questions and we only report responses in this paper from
     Facebook, e.g., the default standard communication                  respondents who answered these questions correctly. We also
     protocol should be HTTPS and not HTTP.                              provided two versions of the survey, in English and German,
                                                                         to make it easier for non-native speakers.
   • Fine-grained control over the data: Users become                       In spite of these limitations, we managed to get a large
     less concerned about privacy if the system provides a               and diverse population that filled out our survey. This gives
     mechanism to control their data. This includes acti-                us confidence about the overall trends reported in this paper.
     vating and deactivating the collection at any time, the
     possibility to access and delete the raw and processed
     data, and define who should have access to what data.               7.   RELATED WORK
   • Interaction data first: Users have a rather clear pref-                There has been a lot of research about privacy and secu-
     erence of the criticality of the different types of data            rity in different research communities. We summarize the
     collected about them. Therefore, software researchers               important related work focussing on usability and economic
     and designers should first try to implement their sys-              aspects of privacy, anonymization techniques, and work from
     tems based on collecting and mining interaction data                the software and requirements engineering community.
     instead of content of files and documents. Research has                Many recent studies on online social networks show that
     advanced a lot in this field in, especially, recommender            there is a (typically, large) discrepancy between users’ in-
     systems [25].                                                       tentions for what their privacy settings should be versus
                                                                         what they actually are. For example, Madejski et al. [24, 27]
   • Time and space-limited storage: The storage of
                                                                         report in their study of Facebook that 94% of their partici-
     data about users should be limited in time and space.
                                                                         pants (n = 65) were sharing something they intended to hide
     The location where the data is stored is an important
                                                                         and 85% were hiding something that they intended to share.
     factor for many respondents. Therefore, systems should
                                                                         Liu et al. [24] found that Facebook’s users’ privacy settings
     provide options for choosing the location of storing
                                                                         match their expectations only 37% of the time. A recent
     privacy sensitive data.
                                                                         longitudinal study by Stutzman et al. [42] shows how privacy
   • Privacy policies, laws, and usage details: Users                    settings for Facebook users have evolved over a period of
     rated all these options as equally effective for mitigating         time. These studies have focused on privacy settings in a
     their privacy concerns. Therefore, developers could                 specific online system whereas our study was designed to be
     utilize any of these options, thus giving them better               agnostic to any modern system collecting user sensitive data.
     flexibility in the design and deployment of software                Further, the main contribution of these studies is to show
     systems.                                                            that there is a discrepancy between what the settings are
                                                                         and what they should be and how settings evolve over time.
6.4    Limitations and Threats to Validity                               Our study aims to gain a deeper understanding of what the
   There are several limitations to our study, which we discuss          requirements are and how they change across geography and
in this section. The first limitation is a potential selection           depending on software development experience.
bias. Respondents who volunteered to fill out our survey                    Fang and LeFevre [18] proposed an automated technique
were self-selected. Such selection bias implies that our results         for configuring a user’s privacy settings in online social net-
are only applicable to the volunteering population and may               working sites. Paul et al. [34] present using a color coding
not necessarily generalize to other populations. The sum-                scheme for making privacy settings more usable. Squicciarini,
maries have helped us identify certain trends and hypotheses             Shehab, and Paci [41] propose a game-theoretic approach
and these should be validated and tested by representative               for collaborative sharing and control of images in a social
samples, e.g., for certain countries. In contrast, the differen-         network. Toubiana et al. [46] present a system that auto-
tial analysis (also called pseudo-experimentation) conducted             matically applies users’ privacy settings for photo tagging.
                                                                   867
All these papers propose new approaches to make privacy                    focus is broader than the first two papers as we don’t limit
settings “better” from a user’s perspective (i.e., more us-                our scope to mobile applications; nonetheless, many of our
able and more visible). Our results help development teams                 findings would apply directly. Our work is complementary
decide when and which of these techniques should be im-                    to the last paper where our findings could be used as part of
plemented. We focus more on a broader requirements and                     the adaptive framework.
engineering perspective of privacy than on a specific technical               Finally, many authors in the software engineering and
perspective.                                                               requirements engineering communities mention privacy in
   There has been a lot of recent work on the economic                     the discussion or challenges section of their papers (e.g.,
ramifications of privacy. For example, Acquisti et al. [1]                 [11, 25, 26, 31]. But in most cases, there is little evidence
(and the references therein) conducted a number of field and               and grounded theory about what, how, and in which con-
online experiments to investigate the economic valuations of               text privacy concerns exist and what the best measures for
privacy. In Section 3.4, we discussed whether users would                  addressing them are. Our study helps in clarifying these
give up privacy for additional benefits like discounts or fewer            concerns and measures as well as comparing the different
advertisements. Our study complements and contrasts the                    perceptions of people.
work of Acquisti et al. as described earlier.
   There has also been a lot of work about data anonymization              8.   CONCLUSION
and building accurate data models for statistical use (e.g.,
                                                                              In this paper, we conducted a study to explore the privacy
[2, 17, 23, 35, 49]). These techniques aim to preserve certain
                                                                           requirements for users and developers in modern software
properties of the data (e.g., statistical properties like average)
                                                                           systems, such as Amazon and Facebook, that collect and
so they can be useful in data mining while trying to preserve
                                                                           store data about the user. Our study consisted of 408 valid
privacy of individual records. Similarly, there has also been
                                                                           responses representing a broad spectrum of respondents:
work on anonymizing social networks [8] and anonymizing
                                                                           people with and without software development experience
user profiles for personalized web search [52]. The broad
                                                                           and people from North America, Europe, and Asia. While the
approaches include aggregating data to a higher level of
                                                                           broad majority of respondents (more than 91%) agreed about
granularity or adding noise and random perturbations. There
                                                                           the importance of privacy as a main issue for modern software
has been research on breaking the anonymity of data as well.
                                                                           systems, there was disagreement concerning the concrete
Narayanan and Shmatikov [32] show how it is possible to
                                                                           importance of different privacy concerns and the measures to
correlate public IMDb data with private anonymized Netflix
                                                                           address them. The biggest concerns about privacy were data
movie rating data resulting in the potential identification
                                                                           breaches and data sharing. Users were more concerned about
of the anonymized individuals. Backstrom et al. [5] and
                                                                           data aggregation and data distortion than developers. As far
Wondracek et al. [51] describe a series of attacks for de-
                                                                           as mitigating privacy concerns, there was little consensus on
anonymizing social networks.
                                                                           the best measure among users. In terms of data criticality,
   Also in the software engineering community, recent papers
                                                                           respondents rated content of documents and personal data
on privacy mainly focused on data anonymization techniques.
                                                                           as most critical versus metadata and interaction data as least
Clause and Orso [13] propose techniques for the automated
                                                                           critical.
anonymization of field data for software testing. They extend
                                                                              We also identified difference in privacy perceptions based
the work done by Castro et al. [12] using novel concepts of
                                                                           on the geographic location of the respondent. Respondents
path condition relaxation and breakable input conditions re-
                                                                           from North America, for example, consider all types of data
sulting in improving the effectiveness of input anonymization.
                                                                           as less critical for privacy than respondents from Europe or
Taneja et al. [44] and Grechanik et al. [20] propose using k-
                                                                           Asia/Pacific. Respondents from Europe are more concerned
anonymity [43] for privacy by selectively anonymizing certain
                                                                           about data breaches than data sharing whereas respondents
attributes of a database for software testing. They propose
                                                                           from North America are equally concerned about the two.
novel approaches using static analysis for selecting which
                                                                              Finally, we gave some insight into a framework and a set
attributes to anonymize so that test coverage remains high.
                                                                           of guidelines on privacy requirements for developers when
Our work complements these papers as respondents in our
                                                                           designing and building software systems. This is an impor-
study considered anonymization an effective technique for
                                                                           tant direction for future research and our results can help
mitigating privacy concerns and these techniques could be
                                                                           establish such a framework, which can be a catalog of pri-
used as part of a privacy framework.
                                                                           vacy concerns and measures, a questionnaire to assess and
   There have been some recent papers on extracting privacy
                                                                           fine-tune them, and perhaps a library of reusable privacy
requirements from privacy regulations and laws [9,10]. These
                                                                           components.
could be part of the privacy framework as well and help in
reducing the impact due to cultural differences for privacy.
While this work focus on legal requirements, we focus on the               9.   ACKNOWLEDGMENTS
users’ understanding of privacy and how it differs from devel-               We would like to thank all the respondents for filling out
opers’ views. A few recent papers have also discussed privacy              our online survey. We would also like to thank Timo Johann,
requirements, mainly in the context of mobile applications.                Mathias Ellman, Zijad Kurtanovic, Rebecca Tiarks, and
Mancini et al. [28] conducted a field study to evaluate the im-            Zardosht Hodaie for help with the translations. Sheth and
pact of privacy and location tracking on social relationships.             Kaiser are members of the Programming Systems Laboratory
Tun et al. [47] introduce a novel approach called “privacy                 and are funded in part by NSF CCF-1302269, NSF CCF-
arguments” and use it to represent and analyze privacy re-                 1161079, NSF CNS-0905246, NIH 2 U54 CA121852-06. This
quirements in mobile applications. Omoronyia et al. [33]                   work is a part of the EU research project MUSES (grant
propose an adaptive framework using privacy aware require-                 FP7-318508).
ments, which will satisfy runtime privacy properties. Our
                                                                     868
10.   REFERENCES                                                       [14] J. Cohen. Most Americans back NSA tracking phone
                                                                             records, prioritize probes over privacy.
 [1] A. Acquisti, L. John, and G. Loewenstein. What is                       http://www.washingtonpost.com/politics/most-
      privacy worth? In Workshop on Information Systems                      americans-support-nsa-tracking-phone-records-
      and Economics (WISE), 2009.                                            prioritize-investigations-over-
 [2] D. Agrawal and C. C. Aggarwal. On the design and                        privacy/2013/06/10/51e721d6-d204-11e2-9f1a-
      quantification of privacy preserving data mining                      1a7cdee20287_story.html, June 2013.
      algorithms. In PODS ’01: Proceedings of the twentieth            [15] L. F. Cranor and N. Sadeh. A shortage of privacy
     ACM SIGMOD-SIGACT-SIGART symposium on                                   engineers. Security & Privacy, IEEE, 11(2):77–79, 2013.
     Principles of database systems, pages 247–255, New                [16] S. Erlanger. Outrage in Europe Grows Over Spying
     York, NY, USA, 2001. ACM.                                               Disclosures. http://www.nytimes.com/2013/07/02/
 [3] T. Anderson and H. Kanuka. E-research: Methods,                        world/europe/france-and-germany-piqued-over-
      strategies, and issues. 2003.                                          spying-scandal.html, July 2013.
 [4] J. Angwin and J. Valentino-Devries. FTC Backs                     [17] A. Evfimievski, J. Gehrke, and R. Srikant. Limiting
      Do-Not-Track System for Web.                                           privacy breaches in privacy preserving data mining. In
      http://online.wsj.com/article/                                        PODS ’03: Proceedings of the twenty-second ACM
      SB10001424052748704594804575648670826747094.                          SIGMOD-SIGACT-SIGART symposium on Principles
      html, December 2010.                                                   of database systems, pages 211–222, New York, NY,
 [5] L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore                     USA, 2003. ACM.
      art thou r3579x?: anonymized social networks, hidden             [18] L. Fang and K. LeFevre. Privacy wizards for social
      patterns, and structural steganography. In WWW ’07:                    networking sites. In Proceedings of the 19th
     Proceedings of the 16th international conference on                     international conference on World wide web, WWW
     World Wide Web, pages 181–190, New York, NY, USA,                      ’10, pages 351–360, New York, NY, USA, 2010. ACM.
     2007. ACM.                                                        [19] D. Fletcher. How Facebook Is Redefining Privacy.
 [6] L. B. Baker and J. Finkle. Sony PlayStation suffers                     http://www.time.com/time/business/article/0,
      massive data breach.                                                   8599,1990582.html, May 2010.
      http://www.reuters.com/article/2011/04/26/us-                    [20] M. Grechanik, C. Csallner, C. Fu, and Q. Xie. Is data
      sony-stoldendata-idUSTRE73P6WB20110426, April                          privacy always good for software testing? Software
     2011.                                                                  Reliability Engineering, International Symposium on,
 [7] M. Barbaro, T. Zeller, and S. Hansell. A face is                        0:368–377, 2010.
      exposed for AOL searcher no. 4417749.                            [21] S. Grobart. The Facebook Scare That Wasn’t.
      http://www.nytimes.com/2006/08/09/technology/                          http://gadgetwise.blogs.nytimes.com/2011/08/10/
      09aol.html?_r=1, August 2006.                                          the-facebook-scare-that-wasnt/, August 2011.
 [8] S. Bhagat, G. Cormode, B. Krishnamurthy, and                      [22] J. Jacoby and M. S. Matell. Three-point likert scales
      D. Srivastava. Privacy in dynamic social networks. In                  are good enough. Journal of Marketing Research,
     Proceedings of the 19th international conference on                     8(4):pp. 495–500, 1971.
     World wide web, WWW ’10, pages 1059–1060, New                     [23] N. Lathia, S. Hailes, and L. Capra. Private distributed
     York, NY, USA, 2010. ACM.                                               collaborative filtering using estimated concordance
 [9] T. D. Breaux and A. I. Anton. Analyzing regulatory                      measures. In RecSys ’07: Proceedings of the 2007 ACM
      rules for privacy and security requirements. IEEE                      conference on Recommender systems, pages 1–8, New
     Transactions on Software Engineering, 34(1):5–20,                      York, NY, USA, 2007. ACM.
     2008.                                                             [24] Y. Liu, K. P. Gummadi, B. Krishnamurthy, and
[10] T. D. Breaux and A. Rao. Formal analysis of privacy                    A. Mislove. Analyzing facebook privacy settings: user
      requirements specifications for multi-tier applications.               expectations vs. reality. In Proc. of the 2011
      In RE’13: Proceedings of the 21st IEEE International                  SIGCOMM Conf. on Internet measurement conf.,
     Requirements Engineering Conference (RE’13),                            pages 61–70, 2011.
     Washington, DC, USA, July 2013. IEEE Society Press.               [25] W. Maalej, T. Fritz, and R. Robbes. Collecting and
[11] R. P. L. Buse and T. Zimmermann. Information Needs                      processing interaction data for recommendation
      for Software Development Analytics. In Proceedings of                  systems. In M. Robillard, M. Maalej, R. Walker, and
      the 2012 International Conference on Software                         T. Zimmerman, editors, Recommendation Systems in
     Engineering, ICSE 2012, pages 987–996, Piscataway,                     Software Engineering, pages 173–197. Springer, 2014.
      NJ, USA, 2012. IEEE Press.                                       [26] W. Maalej and D. Pagano. On the socialness of
[12] M. Castro, M. Costa, and J.-P. Martin. Better bug                       software. In Proceedings of the International Software
      reporting with better privacy. In Proceedings of the                   on Social Computing and its Applications. IEEE
     13th international conference on Architectural support                  Computer Society, 2011.
      for programming languages and operating systems,                 [27] M. Madejski, M. Johnson, and S. M. Bellovin. A study
     ASPLOS XIII, pages 319–328, New York, NY, USA,                          of privacy settings errors in an online social network.
     2008. ACM.                                                             Pervasive Computing and Comm. Workshops, IEEE
[13] J. Clause and A. Orso. Camouflage: automated                           Intl. Conf. on, 0:340–345, 2012.
      anonymization of field data. In Proceeding of the 33rd           [28] C. Mancini, Y. Rogers, K. Thomas, A. N. Joinson,
      international conference on Software engineering, ICSE                 B. A. Price, A. K. Bandara, L. Jedrzejczyk, and
     ’11, pages 21–30, New York, NY, USA, 2011. ACM.                         B. Nuseibeh. In the best families: Tracking and
                                                                 869
       relationships. In Proceedings of the SIGCHI Conference         [41] A. C. Squicciarini, M. Shehab, and F. Paci. Collective
       on Human Factors in Computing Systems, CHI ’11,                     privacy management in social networks. In Proceedings
       pages 2419–2428, New York, NY, USA, 2011. ACM.                      of the 18th international conference on World wide web,
[29]   C. C. Miller. Privacy Officials Worldwide Press Google              WWW ’09, pages 521–530, New York, NY, USA, 2009.
       About Glass. http:                                                  ACM.
       //bits.blogs.nytimes.com/2013/06/19/privacy-                   [42] F. Stutzman, R. Gross, and A. Acquisti. Silent listeners:
       officials-worldwide-press-google-about-glass/,                      The evolution of privacy and disclosure on facebook.
       June 2013.                                                          Journal of Privacy and Confidentiality, 4(2):2, 2013.
[30]   I. Mironov, O. Pandey, O. Reingold, and S. Vadhan.             [43] L. Sweeney. k-anonymity: a model for protecting
       Computational differential privacy. In Advances in                  privacy. Int. J. Uncertain. Fuzziness Knowl.-Based
       Cryptology-CRYPTO 2009, pages 126–142. Springer,                    Syst., 10(5):557–570, 2002.
       2009.                                                          [44] K. Taneja, M. Grechanik, R. Ghani, and T. Xie.
[31]   H. Muccini, A. Di Francesco, and P. Esposito. Software              Testing software in age of data privacy: a balancing act.
       testing of mobile applications: Challenges and future               In Proceedings of the 19th ACM SIGSOFT symposium
       research directions. In Automation of Software Test                 and the 13th European conference on Foundations of
       (AST), 2012 7th International Workshop on, pages                    software engineering, SIGSOFT/FSE ’11, pages
       29–35, June 2012.                                                   201–211, New York, NY, USA, 2011. ACM.
[32]   A. Narayanan and V. Shmatikov. How to break                    [45] C. Task and C. Clifton. A guide to differential privacy
       anonymity of the netflix prize dataset. CoRR,                       theory in social network analysis. In Proceedings of the
       abs/cs/0610105, 2006.                                               2012 International Conference on Advances in Social
[33]   I. Omoronyia, L. Cavallaro, M. Salehie, L. Pasquale,                Networks Analysis and Mining (ASONAM 2012), pages
       and B. Nuseibeh. Engineering adaptive privacy: On the               411–417. IEEE Computer Society, 2012.
       role of privacy awareness requirements. In Proceedings         [46] V. Toubiana, V. Verdot, B. Christophe, and
       of the 2013 International Conference on Software                    M. Boussard. Photo-tape: user privacy preferences in
       Engineering, ICSE ’13, pages 632–641, Piscataway, NJ,               photo tagging. In Proceedings of the 21st international
       USA, 2013. IEEE Press.                                              conference companion on World Wide Web, WWW ’12
[34]   T. Paul, M. Stopczynski, D. Puscher, M. Volkamer,                   Companion, pages 617–618, New York, NY, USA, 2012.
       and T. Strufe. C4ps: colors for privacy settings. In                ACM.
       Proceedings of the 21st international conference               [47] T. T. Tun, A. Bandara, B. Price, Y. Yu, C. Haley,
       companion on World Wide Web, WWW ’12                                I. Omoronyia, and B. Nuseibeh. Privacy arguments:
       Companion, pages 585–586, New York, NY, USA, 2012.                  Analysing selective disclosure requirements for mobile
       ACM.                                                                applications. In Requirements Engineering Conference
[35]   H. Polat and W. Du. Privacy-preserving collaborative                (RE), 2012 20th IEEE International, pages 131–140,
       filtering using randomized perturbation techniques. In              Sept 2012.
       Data Mining, 2003. ICDM 2003. Third IEEE                       [48] T. L. Tuten, D. J. Urban, and M. Bosnjak. Internet
       International Conference on, pages 625–628, Nov. 2003.              surveys and data quality: A review. Online social
[36]   N. Popper and S. Sengupta. U.S. Says Ring Stole 160                 sciences, page 7, 2000.
       Million Credit Card Numbers.                                   [49] V. S. Verykios, E. Bertino, I. N. Fovino, L. P. Provenza,
       http://dealbook.nytimes.com/2013/07/25/arrests-                     Y. Saygin, and Y. Theodoridis. State-of-the-art in
       planned-in-hacking-of-financial-companies/,                         privacy preserving data mining. SIGMOD Rec.,
       July 2013.                                                          33(1):50–57, 2004.
[37]   R. L. Rosnow and R. Rosenthal. Beginning behavioral            [50] S. D. Warren and L. D. Brandeis. The Right to Privacy.
       research: A conceptual primer . Prentice-Hall, Inc,                 Harvard law review, pages 193–220, 1890.
       1996.                                                          [51] G. Wondracek, T. Holz, E. Kirda, and C. Kruegel. A
[38]   D. J. Solove. A Taxonomy of Privacy. University of                  practical attack to de-anonymize social network users.
       Pennsylvania Law Review, pages 477–564, 2006.                       In Security and Privacy (SP), 2010 IEEE Symposium
[39]   M. Spitz. Germans Loved Obama. Now We Don’t Trust                   on, pages 223–238, 2010.
       Him. http://www.nytimes.com/2013/06/30/opinion/                [52] Y. Zhu, L. Xiong, and C. Verdery. Anonymizing user
       sunday/germans-loved-obama-now-we-dont-trust-                       profiles for personalized web search. In Proceedings of
       him.html, June 2013.                                                the 19th international conference on World wide web,
[40]   R. C. Sprinthall and S. T. Fisk. Basic statistical                  WWW ’10, pages 1225–1226, New York, NY, USA,
       analysis. Prentice Hall Englewood Cliffs, NJ, 1990.                 2010. ACM.
870