0% found this document useful (0 votes)
917 views53 pages

5212-1693457982871-NEW - Unit 16 - CRP-SEM3 - Proposal 2023 Big Data (AutoRecovered)

The document provides guidance for a research project proposal on big data. It outlines the submission format, learning outcomes, and gives an overview of big data as the topic area. Students are asked to examine appropriate research methodologies and approaches for their big data proposal.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
917 views53 pages

5212-1693457982871-NEW - Unit 16 - CRP-SEM3 - Proposal 2023 Big Data (AutoRecovered)

The document provides guidance for a research project proposal on big data. It outlines the submission format, learning outcomes, and gives an overview of big data as the topic area. Students are asked to examine appropriate research methodologies and approaches for their big data proposal.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 53

1

Higher Nationals

Internal verification of assessment decisions – BTEC (RQF)

INTERNAL VERIFICATION – ASSESSMENT DECISIONS

Programme title BTEC Higher National Diploma in Computing

Assessor Internal Verifier

Unit 16: Computing Research Project (Pearson Set)


Unit(s)
Research Proposal – Big Data
Assignment title

Student’s name

List which assessment criteria Pass Merit Distinction


the Assessor has awarded.

INTERNAL VERIFIER CHECKLIST

Do the assessment criteria awarded match


those shown in the assignment brief?
Y/N

Is the Pass/Merit/Distinction grade awarded


justified by the assessor’s comments on the Y/N
student work?

Has the work been assessed


accurately? Y/N

Is the feedback to the student:


Give details:

Y/N
• Constructive?
Y/N
• Linked to relevant assessment criteria?
2

Y/N
• Identifying opportunities for Y/N
improved performance?

• Agreeing actions?

Does the assessment decision need


amending? Y/N

Assessor signature Date

Internal Verifier signature Date

Programme Leader signature (if


required) Date

Confirm action completed


Remedial action taken

Give details:

Internal Verifier
signature Date
Programme Leader
signature (if required) Date
3

Higher Nationals - Summative Assignment Feedback Form

Student Name/ID E140891


Unit Title Unit 16: Computing Research Project (Pearson Set)

Assignment Number Assessor Mr. Lasitha Ranawaka


Date Received 1st
Submission Date submission
4

Date Received 2nd


Re-submission Date submission
Assessor Feedback:

LO1 Examine appropriate research methodologies and approaches as part of the research process

Pass, Merit & Distinction


P1 ☐ P2 ☐ M1 ☐ D1 ☐
Descripts

Grade: Assessor Signature: Date:


Resubmission Feedback:

Grade: Assessor Signature: Date:


Internal Verifier’s Comments:

Signature & Date:


* Please note that grade decisions are provisional. They are only confirmed once internal and external moderation has taken place and
grades decisions have been agreed at the assessment board.
5

Assignment Feedback

Formative Feedback: Assessor to Student

Action Plan

Summative feedback

Feedback: Student to Assessor


6

Assessor Date
signature

Student Date
signature

Pearson
Higher Nationals in

Computing
Unit 16: Computing Research Project
7

(Pearson Set)
Research Project Proposal
8

General Guidelines

1. A Cover page or title page – You should always attach a title page to your assignment. Use
previous page as your cover sheet and make sure all the details are accurately filled.
2. Attach this brief as the first section of your assignment.
3. All the assignments should be prepared using a word processing software.
4. All the assignments should be printed on A4 sized papers. Use single side printing.
5. Allow 1” for top, bottom, right margins and 1.25” for the left margin of each page.

Word Processing Rules

1. The font size should be 12 point and should be in the style of Time New Roman.
2. Use 1.5 line spacing. Left justify all paragraphs.
3. Ensure that all the headings are consistent in terms of the font size and font style.
4. Use footer function in the word processor to insert Your Name, Subject, Assignment No,
and Page Number on each page. This is useful if individual sheets become detached for any
reason.
5. Use word processing application spell check and grammar check function to help editing your
assignment.

Important Points:

1. It is strictly prohibited to use textboxes to add texts in the assignments, except for the
compulsory information. eg: Figures, tables of comparison etc. Adding text boxes in the body
except for the before mentioned compulsory information will result in rejection of your work.
2. Carefully check the hand in date and the instructions given in the assignment. Late
submissions will not be accepted.
3. Ensure that you give yourself enough time to complete the assignment by the due date.
4. Excuses of any nature will not be accepted for failure to hand in the work on time.
5. You must take responsibility for managing your own time effectively.
6. If you are unable to hand in your assignment on time and have valid reasons such as illness,
you may apply (in writing) for an extension.
7. Failure to achieve at least PASS criteria will result in a REFERRAL grade.
9

8. Non-submission of work without valid reasons will lead to an automatic REFERRAL. You
will then be asked to complete an alternative assignment.
9. If you use other people’s work or ideas in your assignment, reference them properly using
HARVARD referencing system to avoid plagiarism. You have to provide both in-text
citation and a reference list.
10. If you are proven to be guilty of plagiarism or any academic misconduct, your grade could be
reduced to A REFERRAL or at worst you could be expelled from the course
10

Student Declaration

I hereby, declare that I know what plagiarism entails, namely to use another’s work and to present it
as my own without attributing the sources in the correct way. I further understand what it means to
copy another’s work.

1. I know that plagiarism is a punishable offence because it constitutes theft.


2. I understand the plagiarism and copying policy of the Pearson UK.
3. I know what the consequences will be if I plagiaries or copy another’s work in any of the
assignments for this program.
4. I declare therefore that all work presented by me for every aspects of my program, will be my
own, and where I have made use of another’s work, I will attribute the source in the correct way.
5. I acknowledge that the attachment of this document signed or not, constitutes a binding
agreement between myself and Pearson UK.
6. I understand that my assignment will not be considered as submitted if this document is not
attached to the attached.

[email protected] 4/1/2024
Student’s Signature: Date:
(Provide E-mail ID) (Provide Submission Date)
11

Assignment Brief
Student Name /ID Number E140891

Unit Number and Title Unit 16: Computing Research Project (Pearson Set)

Academic Year 2024

Unit Tutor 16

Assignment Title Final Research Project Proposal -Big Data

Issue Date

Submission Date

IV Name & Date

Submission Format:
12

Research Project Proposal

 The submission is in the form of an individual written report.


 This should be written in a concise, formal business style using single spacing and font size 12.
 You are required to make use of headings, paragraphs and subsections as appropriate, and all
work must be supported with research.
 Reference using the Harvard referencing system.
 Please provide a referencing list using the Harvard referencing system.
 The recommended word limit is minimum 2000 words.

Unit Learning Outcomes:

LO1. Examine appropriate research methodologies and approaches as part of the research
process.

Assignment Brief and Guidance:

Big Data
Big data is a term that has become more and more common over the last decade. It was originally
defined as data that is generated in incredibly large volumes, such as internet search queries, data
from weather sensors or information posted on social media. Today big data has also come to
represent large amounts of information generated from multiple sources that cannot be processed
in a conventional way and that cannot be processed by humans without some form of
computational intervention.
Big data can be stored in several ways: Structured, whereby the data is organised into some form of
relational format, unstructured, where data is held as raw, unorganised data prior to turning into a
structured form, or semi-structured where the data will have some key definitions or structural
form but is still held in a format that does not conform to standard data storage models.

Many systems and organisations now generate massive quantities of big data on a daily basis, with
some of this data being made publicly available to other systems for analysis and processing. The
generation of such large amounts of data has necessitated the development of machine learning
13

systems that can sift through the data to rapidly identify patterns, to answer questions or to solve
problems. As these new systems continue to be developed and refined, a new discipline of data
science analytics has evolved to help design, build and test these new machine learning and
artificial intelligence systems.

Utilising Big Data requires a range of knowledge and skills across a broad spectrum of areas and
consequently opens opportunities to organisations that were not previously accessible. The ability
to store and process large quantities of data from multiple sources has meant that organisations and
businesses are able to get a larger overall picture of the pattern of global trends in the data to allow
them to make more accurate and up to date decisions. Such data can be used to identify potential
business risks earlier and to make sure that costs are minimised without compromising on
innovation.

However, the rapid application and use of Big Data has raised several concerns. The storage of
such large amounts of data means that security concerns need to be addressed in case the data is
compromised or altered in such a way to make the interpretation erroneous. In addition, the ethical
issues of the storage of personal data from multiple sources have yet to be addressed, as well as
any sustainability concerns in the energy requirements of large data warehouses and lakes.

The theme will enable students to explore some of the topics concerned with Big Data from the
standpoint of a prospective computing professional or data scientist. It will provide the opportunity
for students to investigate the applications, benefits and limitations of Big Data while exploring the
responsibilities and solutions to the problems it is being used to solve.
Choosing a research objective/question
Students are to choose their own research topic for this unit. Strong research projects are those with
clear, well focused and defined objectives. A central skill in selecting a research objective is the
ability to select a suitable and focused research objective. One of the best ways to do this is to put
it in the form of a question. Students should be encouraged by tutors to discuss a variety of topics
related to the theme to generate ideas for a good research objective.
14

The range of topics discussed on Big Data, could cover the following areas:
 Storage models

 Cyber security risks

 Future developments and driving innovation.

 Legal and ethical trade-offs

Project Proposal should cover following areas.

1. Definition of research problem or question. (This can be stated as a research question,


objectives, or hypothesis)
2. Provide a literature review giving the background and conceptualisation of the proposed area
of study. (This would provide existing knowledge and benchmarks by which the data can be
judged)
3. Examine and critically evaluate research methodologies and research processes available.
Select the most suitable methodologies and the process and justify your choice based on
theoretical/philosophical frameworks. Demonstrate understanding of the pitfalls and
limitations of the methods chosen and ethical issues that might arise.
4. Draw points (1–3, above) together into a research proposal by getting agreement with your
tutor.
15

Useful links
Useful resources for underlying principles, examples of articles and webinars on the theme:

Type of
Resource
Resourc Resource Titles Links
Number
e

1 Article 6V’s of Big Data https://www.geeksforgeeks.org/5vs-of-big-data/

2 Article Business Ethics and Big Data https://www.ibe.org.uk/resource/business-ethics-and-big-data.html

3 Article What is Big Data Security? Challenges & Solutions https://www.datamation.com/bigdata/big-data-security/

4 Article What is Big Data? https://www.oracle.com/uk/bigdata/what-is-big-data/

5 Magazine Information Sciences https://www.sciencedirect.com/jou rnal/information-sciences

6 Magazine Big Data Research https://www.sciencedirect.com/jou rnal/big-data-research

Big Data & Investment Management:


7 Report The Potential to Quantify Traditionally Qualitative https://tinyurl.com/yff4uenz
factors

8 Webinar Big Data Sources & Analysis Webinar https://tinyurl.com/2p85d7mb

9 Video Big Data In 5 Minutes | What Is Big Data?| https://www.youtube.com/watch?v =bAyrObl7TYE

15
16

Type of
Resource
Resourc Resource Titles Links
Number
e

Introduction To Big Data |Big Data Explained

10 Video Challenges of Securing Big Data https://www.youtube.com/watch?v =3xIuIcPzMVs

11 Video The Importance of Data Ethics https://www.youtube.com/watch?v =gLHMhCtxEYE

12 Book A Bite-Sized Guide to Visualising Data https://tinyurl.com/38d6thsk

Business Intelligence Strategy and Big Data https://www.sciencedirect.com/bo


13 Book
Analytics ok/9780128091982/businessintelligence-strategy-and-big-data-analytics

https://www.sciencedirect.com/book/9780128156094/principles-and-
14 Book Principles and Practice of Big
practice-of-big-data

Systems Simulation and


15 Book Modelling for Cloud Computing and Big Data https://tinyurl.com/2s3wkehn
Applications

Big Data in Construction: Current Applications and


16 Journal https://www.mdpi.com/25042289/6/1/18
Future Opportunities

Big Data with Cloud Computing: Discussions and https://www.sciopen.com/article/pdf/10.26599/


17 Journal
Challenges BDMA.2021.9020016.pdf

16
17

Type of
Resource
Resourc Resource Titles Links
Number
e

18 Journal Mobile Big Data Solutions for a better Future https://tinyurl.com/hpk2zvvw

The social implications, risks, challenges and https://tinyurl.com/yw593svk


19 Journal
opportunities of big data

Policy discussion – Challenges of big data and


20 Journal https://tinyurl.com/kyb3j6x7
analytics driven demand-side management

Explore Big Data Analytics Applications and


21 Journal Opportunities: https://tinyurl.com/597j8nd3
A Review

https://www.oracle.com/cl/a/ocom/ docs/what-is-big-data-ebook-
22 Journal What is Big Data?
4421383.pdf

Towards felicitous decision making: An overview on https://www.sciencedirect.com/science/article/abs/pii/S002002551630


23 Journal
challenges and trends of Big Data 4868

Critical analysis of Big Data challenges and


24 Journal analytical https://www.sciencedirect.com/science/article/pii/S014829631630488X
methods

17
18

Type of
Resource
Resourc Resource Titles Links
Number
e

25 Journal Big Data Security Issues and Challenges https://tinyurl.com/wabx7zya

IoT Big Data Security and Privacy Versus


26 Journal https://ieeexplore.ieee.org/abstract /document/8643026
Innovation

27 Journal Big Data Security and Privacy Protection https://www.atlantis-press.com/proceedings/icmcs18/25904185

https://journalofcloudcomputing.springeropen.com/articles/10.1186/
28 Journal Big data analytics in Cloud computing: an overview
s13677-022-00301-w

18
19

Grading Rubric

Grading Criteria Achieved Feedback

P1 Produce a research proposal that clearly defines a


research question or hypothesis, supported by a literature
review.

P2 Examine appropriate research methods and


approaches to primary and secondary research.

M1 Analyse different research approaches and


methodology and make justifications for the choice of
methods selected based on philosophical/theoretical
frameworks.

D1 Critically evaluate research methodologies and


processes in application to a computing research project

19
20

to justify chosen research methods and analysis

20
21

Research Proposal Form


Student Name N. Kaweesha Nayananjana Bandara
Student number E140891 Date 4/1/2024
Centre Name
Unit Unit 16: Computing Research Project (Pearson Set)
Tutor Mr. Lasith Ranawaka
Proposed title
Responsible data science: a comprehensive analysis and approach to eliminate big data's
moral and safety issues.

Section One: Title, objective, responsibilities


Title or working title of research project (in the form of a question, objective or
hypothesis): Research project objectives (e.g. what is the question you want to answer?
What do you want to learn how to do? What do you want to find out?): Introduction,
Objective, Sub Objective(s), Research Questions and/or Hypothesis
Responsible data science: a comprehensive analysis and approach to eliminate big data's
moral and safety issues.

Introduction:
In recent years, the rapid advancements in technology have ushered in the era of big data,
revolutionizing industries, and reshaping the way information is processed and utilized.
However, this surge in data collection and analysis has raised significant moral and safety
concerns, prompting the need for responsible data science practices. The ethical
implications of big data, including privacy breaches, bias in algorithms, and potential
misuse of sensitive information, have sparked a growing demand for a comprehensive
analysis and approach to address these challenges.

Objective:
The primary objective of this study is to delve into the realm of responsible data science,
aiming to eliminate moral and safety issues associated with big data. By understanding the
ethical dimensions of data science, the goal is to develop a framework that fosters
responsible and ethical practices, ensuring the conscientious use of data in various

21
22

domains.

Sub Objective:
This research aims to enhance responsible data science by proposing an ethical framework
emphasizing transparency, fairness, accountability, and user consent. Strategies will be
developed to strengthen privacy protection in big data, mitigate algorithmic bias, ensure
regulatory compliance, foster stakeholder collaboration, promote educational initiatives,
and establish continuous improvement mechanisms, ensuring ethical practices endure in
the dynamic landscape of data science.

Section Two: Reasons for choosing this research project


Reasons for choosing the project (e.g. links to other subjects you are studying, personal
interest, future plans, knowledge/skills you want to improve, why the topic is important):
Motivation, Research gap

Section Three: Literature sources searched


Use of key literature sources to support your objective, Sub Objective, research question
and/or hypothesis: Can include the Conceptual Framework

22
23

Section Four: Activities and timescales


Activities to be carried out during the research project (e.g. research, development,
analysis of ideas, writing, data collection, numerical analysis, tutor meetings, production
of final outcome, evaluation, writing the report) and How long this will take:
Milestone Propose completion date

Section Five: Research approach and methodologies


Type of research approach and methodologies you are likely to use, and reasons for your
choice: What your areas of research will cover: Research Onion; Sample
Strategy/Method; Sample Size

Comments and agreement from tutor


Comments (optional):

I confirm that the project is not work which has been or will be submitted for another
qualification and is appropriate.

Agreed Yes ☐ No ☐ Name Date

Comments and agreement from project proposal checker (if applicable)

23
24

Comments (optional):

I confirm that the project is appropriate.

Agreed Yes ☐ No ☐ Name Date

24
25

Research Ethics Approval Form


All students conducting research activity that involves human participants or the use of data
collected from human participants are required to gain ethical approval before commencing
their research. Please answer all relevant questions and note that your form may be returned if
incomplete.

Section 1: Basic Details

Project title: Responsible data science: a comprehensive analysis and approach


to eliminate big data's moral and safety issues.

Student name: N. Kaweesha Nayananjana Bandara

Student ID number: E14891

Programme:

School:

Intended research start


date:

Intended research end date:

Section 2: Project Summary

Please select all research methods that you plan to use as part of your project

 Interviews: ☐
 Questionnaires: ☐
 Observations: ☐
 Use of Personal Records: ☐
 Data Analysis: ☐
 Action Research: ☐
 Focus Groups: ☐
 Other (please specify): ☐ ............................................................
Section 3: Participants

Please answer the following questions, giving full details where necessary.

25
26

Will your research involve human participants?

Who are the participants? Tick all that apply:

Age 12-16 ☐ Young People aged 17–18 ☐ Adults ☐

How will participants be recruited (identified and approached)?

Describe the processes you will use to inform participants about what you are doing:

Studies involving questionnaires:

Will participants be given the option of omitting questions they do not wish to answer?

Yes ☐ No ☐

If “NO” please explain why below and ensure that you cover any ethical issues arising
from this.

Studies involving observation:

Confirm whether participants will be asked for their informed consent to be observed.

Yes ☐ No ☐

Will you debrief participants at the end of their participation (i.e. give them a brief
explanation of the study)?

Yes ☐ No ☐

Will participants be given information about the findings of your study? (This could be a
brief summary of your findings in general)

26
27

Yes ☐ No ☐

Section 4: Data Storage and Security

Confirm that all personal data will be stored and processed in compliance with the Data
Protection Act (1998)
Yes ☐ No ☐

Who will have access to the data and personal information?

During the research:

Where will the data be stored?

Will mobile devices such as USB storage and laptops be used?


Yes ☐ No ☐

If “YES”, please provide further details:

After the research:

Where will the data be stored?

How long will the data and records be kept for and in what format?

Will data be kept for use by other researchers?


Yes ☐ No ☐

If “YES”, please provide further details:

Section 5: Ethical Issues

Are there any particular features of your proposed work which may raise ethical
concerns? If so, please outline how you will deal with these:

27
28

Section 6: Declaration

I have read, understood and will abide by the institution’s Research and Ethics Policy:

Yes ☐ No ☐

I have discussed the ethical issues relating to my research with my Unit Tutor:

Yes ☐ No ☐

I confirm that to the best of my knowledge:

The above information is correct and that this is a full description of the ethics issues that
may arise in the course of my research.

Name:

Date:

Please submit your completed form to: ESOFT Learning Management System
(ELMS)

28
THE RESEARCH PROPOSAL

<YOUR TITLE>.

By

<NAME>

<Registration Number>

Research Proposal Submitted in accordance with the requirements for the


COMPUTING RESEARCH PROJECT MODULE OF
PEARSON’S HND IN < YOUR STREAM> PROGRAMME
at the
ESOFT METRO CAMPUS

Name of research Tutor: <Tutor’s Name>

29
30
i

ACKNOWLEDGMENT

i
ii

EXECUTIVE SUMMARY
This summary discusses the ethical implications of big data practices, focusing on privacy
concerns, algorithmic bias, data security, and ethical frameworks. Privacy safeguards are
crucial to protect individuals' privacy rights, while algorithmic bias and fairness are essential
to prevent discrimination against marginalized groups. Data security and unauthorized access
are crucial for safeguarding sensitive information against cyber threats. Scholars like Floridi
and Taddeo emphasize the need for informed consent and data anonymization. Ethical
frameworks and guidelines provide guidance for responsible data usage, such as Floridi's
proposed ethical principles for big data analytics and the IEEE Global Initiative's guidelines
for ethically aligned AI and data technologies design. By addressing these pillars,
organizations can navigate the ethical implications of big data, promoting transparency,
accountability, and respect for individuals' rights. The research aims to provide actionable
recommendations to promote responsible data science practices and mitigate moral and safety
concerns in the era of big data.

ii
iii

CONTENTS

ACKNOWLEDGMENT.............................................................................................................i

EXECUTIVE SUMMARY........................................................................................................ii

CONTENTS..............................................................................................................................iii

LIST OF TABLES.....................................................................................................................v

LIST OF FIGURES...................................................................................................................vi

INTRODUCTION......................................................................................................................1

1.1. Introduction.................................................................................................................1

1.2. Purpose of research......................................................................................................1

1.3. Significance of the Research.......................................................................................1

1.4. Research objectives.....................................................................................................1

1.5. Research Sub objectives..............................................................................................1

1.6. Research questions......................................................................................................1

1.7. Hypothesis...................................................................................................................1

LITERATURE REVIEW...........................................................................................................2

2.1. Literature Review........................................................................................................2

2.2. Conceptual framework................................................................................................2

METHODOLOGY.....................................................................................................................3

3.1. Research philosophy....................................................................................................3

3.2. Research approach.......................................................................................................3

3.3. Research strategy.........................................................................................................3

3.4. Research Choice..........................................................................................................3

3.5. Time frame..................................................................................................................3

3.6. Data collection procedures..........................................................................................3

3.6.1. Type of Data.........................................................................................................3

3.6.2. Data Collection Method.......................................................................................3


iii
iv

3.6.3. Data Collection and Analyze Tools.....................................................................3

3.7. Sampling......................................................................................................................3

3.7.1. Sampling Strategy................................................................................................3

3.7.2. Sample Size..........................................................................................................3

3.8. The selection of participants........................................................................................3

REFERENCES...........................................................................................................................4

iv
v

LIST OF TABLES

v
vi

LIST OF FIGURES

vi
1

INTRODUCTION

1.1.1. Introduction

In recent years, the rapid advancements in technology have ushered in the era of big data,
revolutionizing industries, and reshaping the way information is processed and utilized.
However, this surge in data collection and analysis has raised significant moral and safety
concerns, prompting the need for responsible data science practices. The ethical implications
of big data, including privacy breaches, bias in algorithms, and potential misuse of sensitive
information, have sparked a growing demand for a comprehensive analysis and approach to
address these challenges.

1.2. Purpose of research

Addressing big data ethics involves safeguarding privacy, tackling algorithmic bias, and
preventing misuse of sensitive data. This includes implementing strong data protection
measures, mitigating bias in decision-making algorithms, and enforcing strict security
protocols. Ultimately, the goal is to ensure technology benefits society ethically, with
privacy, fairness, and transparency as guiding principles.

1.3. Significance of the Research

Recent technological advances have propelled us into the era of big data, transforming
industries and changing how we use information. However, this surge in data comes with
ethical challenges such as privacy breaches and biased algorithms. This has led to a growing
demand for responsible data science practices to address these concerns and ensure data is
used ethically and safely.

1.4. Research objectives

The primary objective of this study is to delve into the realm of responsible data science,
aiming to eliminate moral and safety issues associated with big data. By understanding the
ethical dimensions of data science, the goal is to develop a framework that fosters responsible
and ethical practices, ensuring the conscientious use of data in various domains.

1
2

1.5. Research Sub objectives

This research aims to enhance responsible data science by proposing an ethical framework
emphasizing transparency, fairness, accountability, and user consent. Strategies will be
developed to strengthen privacy protection in big data, mitigate algorithmic bias, ensure
regulatory compliance, foster stakeholder collaboration, promote educational initiatives, and
establish continuous improvement mechanisms, ensuring ethical practices endure in the
dynamic landscape of data science.

LITERATURE REVIEW

1.6. Literature Review

Scholars like (Floridi, L., & Taddeo, M., 2016)point out the risks associated with collecting
and using vast amounts of personal data. They emphasize the need for strong privacy
protections to safeguard individuals' information. (Acquisti, A., & Grossklags, J.,
2005)explore the balance between privacy and the benefits of data collection, stressing the
importance of informed consent and anonymizing data to protect privacy.

(Noble, S. U., 2018)delves into how biased algorithms can perpetuate discrimination,
especially against marginalized groups, in areas like hiring and law enforcement. (Barocas,
S., & Selbst, A. D., 2016)discuss the challenges of tackling algorithmic bias, advocating for
machine learning techniques that prioritize fairness to prevent discriminatory outcomes.

(Barocas, S., & Selbst, A. D., 2016)stress the significance of incorporating privacy into the
design of systems and using encryption to safeguard data from unauthorized access.
Anderson (2008) highlights the ethical and legal implications of data breaches, emphasizing
the need for organizations to take proactive measures to secure data and be transparent in
their practices.

(Floridi, 2013)proposes ethical principles for big data analytics, including transparency,
accountability, and respecting individuals' rights. (IEEE, 2019)offers guidelines for
developing AI and data technologies that prioritize human values and societal well-being.

2
3

Initiatives like the GDPR aim to give individuals more control over their personal data and
impose obligations on organizations to protect privacy. Scholars like Solove (2006) discuss
the challenges of regulating big data while balancing innovation, advocating for a nuanced
approach that considers both ethical and legal factors.

1.7. Conceptual framework

In the realm of big data ethics, we can envision a conceptual framework that revolves around
four key pillars: privacy concerns and data protection, algorithmic bias and fairness, data
security and unauthorized access, and ethical frameworks and guidelines.

Firstly, privacy concerns and data protection are paramount. This pillar emphasizes the
importance of safeguarding individuals' privacy rights amidst the vast collection and
utilization of personal data. Scholars like (Floridi, L., & Taddeo, M., 2016)stress the need for
robust privacy safeguards, while (Acquisti, A., & Grossklags, J., 2005)delve into the delicate
balance between privacy and the utility of data, advocating for informed consent and data
anonymization.

Secondly, algorithmic bias and fairness are critical considerations. Here, the focus is on
ensuring that algorithms used in decision-making processes are fair and unbiased, particularly
to prevent discrimination against marginalized groups. (Noble, S. U., 2018)sheds light on
how biased algorithms perpetuate discrimination, while (Barocas, S., & Selbst, A. D.,
2016)explore strategies for addressing algorithmic bias through fairness-aware machine
learning techniques.

The third pillar revolves around data security and unauthorized access. In an era of
heightened cyber threats, it's essential to prioritize data security measures to prevent
unauthorized access and protect sensitive information. (Cavoukian, A., & Jonas, J.,
2012)emphasize the importance of privacy by design principles and encryption techniques,
while Anderson (2008) highlights the ethical and legal implications of data breaches,
advocating for proactive security measures and transparent data handling practices.

3
4

Finally, ethical frameworks and guidelines provide a guiding compass for responsible data
usage. Floridi (2013) proposes ethical principles for big data analytics, emphasizing
transparency, accountability, and respect for individuals' rights. The IEEE Global Initiative
on Ethics of Autonomous and Intelligent Systems (2019) offers practical guidelines for
developing AI and data technologies that prioritize human values and societal well-being.

Together, these four pillars form a comprehensive conceptual framework for navigating the
ethical implications of big data. By addressing privacy concerns, tackling algorithmic bias,
ensuring data security, and adhering to ethical principles, we can strive for a more ethical and
responsible approach to harnessing the power of data for societal benefit.

METHODOLOGY

1.8. Research philosophy

In approaching the research on the ethical implications of big data, we adopt a pragmatic
research philosophy that emphasizes practicality and real-world applicability. Our aim is to
not only deepen theoretical understanding but also to provide actionable insights that can
inform decision-making and guide ethical practices in the field of data science.

This research philosophy recognizes the complex and multifaceted nature of the topic,
acknowledging that ethical considerations in big data extend beyond theoretical frameworks
to encompass practical challenges and ethical dilemmas faced by practitioners, policymakers,
and organizations.

Drawing from both qualitative and quantitative research methods, our approach is
interdisciplinary, integrating insights from fields such as computer science, ethics, law,

4
5

sociology, and psychology. By adopting a holistic perspective, we seek to capture the diverse
perspectives and dimensions of ethical concerns in big data.

Moreover, our research philosophy is characterized by a commitment to ethical integrity and


transparency. We prioritize ethical considerations throughout the research process, from data
collection and analysis to dissemination of findings. This includes ensuring informed consent,
protecting privacy, and mitigating potential biases in our research methods and
interpretations.

1.9. Research approach

For my research on the ethical implications of big data, we employ a mixed-methods research
approach to provide a comprehensive understanding of the topic. This approach combines
qualitative and quantitative methods to explore different facets of ethical concerns and their
practical implications in the context of big data.

Qualitative Research:

- In-depth Interviews: We conduct interviews with experts in the fields of data science,
ethics, law, and policymaking to gain insights into their perspectives on ethical
challenges in big data, as well as potential strategies for addressing them.
- Ethnographic Studies: We observe and analyze the practices and dynamics within
organizations and communities that deal with big data, aiming to understand how
ethical considerations are integrated into decision-making processes and everyday
operations.
- Content Analysis: We analyze documents, reports, and literature related to big data
ethics to identify key themes, trends, and emerging issues in the field.

Quantitative Research:

5
6

- Surveys: We administer surveys to practitioners, policymakers, and other stakeholders


to quantify attitudes, perceptions, and practices related to ethical considerations in big
data. This allows us to identify patterns and trends across different groups and
contexts.
- Data Analysis: We analyze large datasets, such as public opinion data or anonymized
user data, to identify correlations, patterns, and potential ethical implications of data
collection, analysis, and use.

1.10. Research strategy

- Literature Review: We begin by conducting a thorough review of existing literature,


including academic papers, books, reports, and policy documents, to understand the
current state of knowledge and identify key themes, debates, and gaps in the literature.

- Stakeholder Engagement: We actively engage with stakeholders from diverse


backgrounds, including academics, industry professionals, policymakers, and civil
society organizations, through interviews, focus groups, and consultation sessions.
This allows us to gather insights from different perspectives and ensure that our
research is relevant and impactful.

- Empirical Research: We conduct empirical research using both qualitative and


quantitative methods to gather primary data on ethical challenges and practices related
to big data. This may involve surveys, interviews, case studies, and observational
studies, depending on the research questions and objectives.

- Ethical Analysis: We conduct in-depth ethical analysis of the data collected, drawing
on ethical frameworks and principles to evaluate the implications of big data practices
on privacy, fairness, transparency, and other ethical values. This involves identifying
ethical dilemmas, evaluating potential solutions, and making recommendations for
ethical decision-making in the context of big data.

6
7

- Policy Analysis: We analyze relevant policies, regulations, and guidelines governing


big data practices at the national and international levels to assess their adequacy in
addressing ethical concerns. This helps to identify gaps in current policy frameworks
and inform recommendations for policy reform.

- Dissemination and Engagement: Finally, we disseminate our research findings


through various channels, including academic publications, policy briefs, workshops,
and conferences, to raise awareness and stimulate dialogue on the ethical implications
of big data. We actively engage with stakeholders to facilitate knowledge exchange
and promote uptake of our research findings in practice and policy.

1.11. Research Choice

In conducting my research on the ethical implications of big data, we make deliberate choices
regarding our research design and methodology to ensure that our study is rigorous, relevant,
and aligned with our research objectives. Our research choices encompass several key
considerations:

- Exploratory vs. Confirmatory Research: Given the complexity and evolving nature of
the topic, we adopt an exploratory research approach to gain a deeper understanding
of the ethical challenges and opportunities in big data. This allows us to explore
diverse perspectives, identify emerging issues, and generate new insights that can
inform future research and practice.

- Qualitative vs. Quantitative Research: Recognizing the multifaceted nature of ethical


considerations in big data, we employ a mixed-methods research approach that
combines qualitative and quantitative methods. Qualitative methods, such as
interviews and ethnographic studies, enable us to capture rich, nuanced data on
stakeholders' experiences, perceptions, and values. Quantitative methods, such as
surveys and data analysis, provide statistical rigor and allow us to quantify attitudes,
behaviors, and trends related to big data ethics.

7
8

- Longitudinal vs. Cross-Sectional Research: To capture the dynamic nature of big data
ethics, we may adopt a longitudinal research design, tracking changes and
developments over time. This allows us to observe trends, assess the impact of
interventions, and identify evolving ethical challenges. Alternatively, we may conduct
cross-sectional research to provide a snapshot of ethical practices and concerns at a
particular point in time, enabling comparative analysis across different contexts or
populations.

- Case Study Selection: In selecting case studies for our research, we prioritize diversity
and relevance, aiming to encompass a range of industries, organizational contexts, and
geographical locations. Considerations to explore how ethical considerations manifest
in different settings and identify best practices and lessons learned that can inform
broader ethical frameworks and guidelines for big data.

- Interdisciplinary Collaboration: Recognizing the interdisciplinary nature of big data


ethics, we engage in collaborative research with experts from diverse fields, including
computer science, ethics, law, sociology, and public policy. This interdisciplinary
approach enriches our research by integrating different perspectives, methodologies,
and theoretical frameworks, fostering innovation, and ensuring the relevance and
applicability of our findings.

1.12. Time frame

Considering the given facts and the nature of my study subject, "Responsible data science: a
comprehensive analysis and approach to eliminate big data's moral and safety issues," it
seems that a cross-sectional time horizon would be more appropriate.

Allow me to provide you with the reasons:

8
9

1. *study Question Nature*: my study focuses on examining and resolving ethical and
security concerns linked to big data. These concerns may be analyzed within a given period,
focusing on a single snapshot moment in time, rather than being monitored over many time
points.

2. *Data Collection*: Given I mention of utilizing secondary data for your study, it is
probable that this data has already been gathered and is accessible at particular time intervals.
Thus, a cross-sectional temporal horizon is suitable for my data source.

3. *Practical Constraints*: Considering practical limitations, such as the available time to


conduct my study, using a cross-sectional time horizon would be more effective. It enables
me to concentrate on analyzing preexisting data instead of gathering data over a prolonged
duration.

By using a cross-sectional time frame, me may get valuable information on the ethical and
security concerns linked to big data, based on a single momentary picture of data. By
adopting this technique, I will be able to successfully tackle your research goals while
working within the limitations of my project.

3.5.1. Reason for choosing cross sectional time frame.


1. *Snapshot of Data at a Specific Point in Time*: my study intends to assess and answer
moral and safety problems linked with big data. These concerns may be investigated during a
specified period, offering a snapshot of the data environment at that point. By employing a
cross-sectional temporal horizon, I can capture the status of big data practices, ethical issues,
and safety concerns at a given moment in time.

2. *Existing Data Availability*: Since I stated utilizing secondary data for my study, it's
probable that this data is already gathered and accessible for analysis. A cross-sectional
method enables me to exploit existing datasets without the requirement for longitudinal data
gathering, which may be time-consuming and resource-intensive.

9
10

3. *Efficiency and Practicality*: Given the extent of my study and any practical restrictions,
such as time limits, a cross-sectional temporal horizon provides efficiency. It helps me to
concentrate on studying current data within a particular period, making it viable to
accomplish my research goals within the given resources and timescale.

2. *Focus on Current status*: By evaluating data from a specific moment in time, me may
acquire insights into the status of big data practices and ethical issues. This may assist
uncover immediate difficulties and generate suggestions for acceptable data science
methods without the need to observe changes over time.

3.5.2. Reason for not choosing longitudinal time frame


1. *Focus on Immediate Concerns*: My study intends to solve moral and safety problems
linked with big data. These challenges may demand urgent attention and action rather
than observing developments over time. A cross-sectional method enables me to capture
the present status of these challenges and provide remedies without waiting for
longitudinal data gathering and analysis.

2. *Resource Constraints*: Conducting longitudinal research involves considerable


resources, including time, financing, and participant participation. Given the complexity of
large data and ethical issues, doing a longitudinal study may be problematic within the
restrictions of your research endeavor.

3. *Data Availability*: While longitudinal studies give insights into changes and trends over
time, they may not be essential if the main emphasis is on understanding the present
status of moral and safety concerns in big data. Since my stated utilizing secondary data
sources, which are likely acquired at various moments in time, a cross-sectional approach
matches better with the existing data.

4. *Immediate Actionable Insights*: my study may attempt to deliver actionable insights


and suggestions for appropriate data science methods. A cross-sectional method enables
me to assess existing data and recommend practical solutions based on the present state
of things, rather than waiting for longitudinal data to find trends and patterns.

5. *Ethical Considerations*: Given the ethical implications of big data and responsible data
science, it's necessary to address current problems and eliminate risks as they develop. A

10
11

cross-sectional strategy allows you to concentrate on resolving moral and safety concerns
in real-time, rather than waiting for longitudinal data to support decision-making.

Overall, although longitudinal studies have their merits, such as following changes over
time, a cross-sectional strategy is better ideal for my study on resolving urgent moral and
safety problems related with big data. It enables me to deliver timely insights and
suggestions for appropriate data science activities depending on the present state of
things.

2.1. Data collection procedures

1.1.1. Type of Data


Qualitative Data:

- Interview Transcripts: Data from in-depth interviews with experts, practitioners, and
stakeholders can offer rich insights into their perspectives, experiences, and values
related to big data ethics.
- Observational Notes: Data from ethnographic studies or participant observations can
provide detailed descriptions of behaviors, practices, and interactions within
organizations or communities dealing with big data.

Quantitative Data:

- Survey Responses: Data from surveys administered to individuals or organizations


can yield quantitative information on attitudes, perceptions, behaviors, and practices
related to big data ethics.
- Quantitative Metrics: Data on key performance indicators, such as data breach
incidents, algorithmic accuracy rates, or privacy compliance measures, can provide
quantitative benchmarks for evaluating the effectiveness of ethical practices and
interventions.

Documentary Data:

11
12

- Policy Documents: Data from legal and regulatory documents, such as privacy laws,
data protection regulations, and industry standards, can provide insights into the
regulatory landscape and policy frameworks governing big data ethics.
- Organizational Policies and Guidelines: Data from internal documents, such as data
governance policies, ethical guidelines, and codes of conduct, can offer insights into
organizational approaches to managing ethical considerations in big data.
Social Media Data:

- Social Media Posts: Data from social media platforms, such as Twitter, Facebook, or
LinkedIn, can provide insights into public discourse, attitudes, and opinions regarding
big data ethics.
- User Interaction Data: Data on user interactions with social media content related to
big data ethics, such as likes, shares, or comments, can offer indicators of engagement
and sentiment.

Secondary Data:

- Literature Reviews: Data extracted from existing academic literature, including


journal articles, books, and conference papers, can provide a comprehensive overview
of research findings, theories, and methodologies relevant to big data ethics.
- Meta-Analytical Data: Data synthesized from meta-analyses or systematic reviews of
existing research can offer aggregated insights into patterns, trends, and gaps in the
literature on big data ethics.

1.1.2. Data Collection Method


Secondary Data Analysis:

Utilize existing datasets from reputable sources containing information on big data ethics.
This could include survey data, industry reports, or publicly available datasets from
organizations involved in data governance or privacy regulation.

12
13

Conduct a comprehensive review and analysis of literature, including academic papers,


books, reports, and policy documents, to gather insights into the ethical challenges and
practices related to big data.

- Surveys:
Design and administer surveys to professionals working in the field of data science,
focusing on their attitudes, perceptions, and practices regarding ethical considerations in big
data.

Use structured survey questionnaires to collect quantitative data on topics such as privacy
concerns, algorithmic bias, data security measures, and adherence to ethical frameworks and
guidelines.

- Interviews:
Conduct semi-structured interviews with key stakeholders, including data scientists,
ethicists, policymakers, industry experts, and representatives from civil society
organizations.

Explore their perspectives, experiences, and insights on ethical challenges, best practices,
and emerging trends in the field of big data ethics.

- Focus Groups:
Organize focus group discussions with professionals or stakeholders from diverse
backgrounds to facilitate interactive dialogue and collective exploration of ethical issues
related to big data.

Use guided discussions to elicit rich qualitative data on participants' perceptions, values, and
priorities regarding big data ethics.

- Ethnographic Studies:
Engage in participant observation and ethnographic research within organizations or
communities involved in big data projects.

Immerse yourself in the context of data collection, observing behaviors, practices, and
interactions related to ethical decision-making and data governance.

13
14

- Document Analysis:
Analyze organizational policies, guidelines, codes of conduct, and regulatory documents
related to data governance, privacy protection, and ethical frameworks for big data.

Extract insights from publicly available documents to understand the institutional context
and regulatory landscape surrounding big data ethics.

- Case Studies:
Conduct in-depth case studies of specific organizations, projects, or initiatives that
exemplify ethical challenges and practices in the realm of big data.

Explore real-world examples to uncover nuances, complexities, and lessons learned in


addressing ethical implications of big data.

1.1.3. Data Collection and Analyze Tools

- Survey Platforms:
Qualtrics: A widely used survey platform that allows for the creation, distribution, and
analysis of surveys.

SurveyMonkey: Another popular platform for creating and administering surveys, with
features for analyzing responses and generating reports.

- Interview and Focus Group Tools:


Zoom: A video conferencing tool that facilitates remote interviews and focus group
discussions.

Microsoft Teams: Another platform for virtual meetings and interviews, with features for
recording and transcribing conversations.

- Ethnographic Research Tools:

14
15

Fieldnotes Apps: Apps like Evernote or OneNote are useful for recording field observations
and taking detailed notes during ethnographic research.

- Document Analysis Tools:


NVivo: A qualitative data analysis software that allows for the organization, coding, and
analysis of textual data from documents, reports, and literature.

Atlas.ti: Another qualitative analysis software with features for coding and analyzing textual
data from various sources.

Data Analysis Tools:

- Statistical Analysis Software:


SPSS: A popular software for statistical analysis, including descriptive statistics, regression
analysis, and hypothesis testing.

R: A programming language and software environment for statistical computing and


graphics, widely used in academic research for data analysis.

Python with pandas and SciPy: Python libraries such as pandas and SciPy provide powerful
tools for data manipulation, analysis, and visualization.

- Data Visualization Tools:


Tableau: A data visualization tool that allows for the creation of interactive and dynamic
dashboards, charts, and graphs.

Power BI: Another visualization tool by Microsoft that integrates with other Microsoft
products and services for data analysis and reporting.

- Text Analysis Tools:


Leximancer: A text analytics software that uses machine learning algorithms to extract
themes and concepts from textual data.

Quirkos: Qualitative data analysis software with features for coding and analyzing textual
data from interviews, focus groups, and open-ended survey responses.

15
16

- Network Analysis Tools:


Gephi: An open-source network analysis tool for exploring and visualizing complex
networks, such as social networks or co-authorship networks.

- Geospatial Analysis Tools:


ArcGIS: A geographic information system (GIS) software for analyzing and visualizing
spatial data, including maps and geographic patterns.

2.2. Sampling

1.1.4. Sampling Strategy


The research population includes professionals in data science, including data scientists,
analysts, and engineers, who use large datasets and analytical techniques to make informed
decisions. The sampling frame is a comprehensive dataset of survey responses from these
professionals, obtained from a reputable source. The sampling method is simple random
sampling, ensuring equal representation of perspectives. The sample size is determined
based on precision and confidence interval, allowing for the detection of meaningful
patterns and differences in attitudes and behaviors related to ethical considerations in big
data. The researcher is vigilant about potential biases, assessing underrepresentation of
certain demographic groups and considering adjustments or weighting strategies to ensure
the validity of the findings. The sampling strategy is meticulously documented, upholding
the integrity of the research and providing readers with a clear understanding of the sample
selection and its implications for the study's findings.

1.1.5. Sample Size

I have used secondary data for this research. Therefore, everything has been calculated
beforehand.

Sample size: 25,000.

16
17

REFERENCES
Acquisti, A. &. G. J., 2005. Privacy and rationality in individual decision making, s.l.: s.n.
Barocas, S. &. S. A. D., 2016. Big data's disparate impact., California: California Law Review.
Cavoukian, A., & Jonas, J., 2012. Privacy by design in the age of big data, s.l.: Information
and Privacy Commissioner of Ontario.
Floridi, L. &. T. M., 2016. What is data ethics?. In: Philosophical Transactions of the Royal
Society A: Mathematical. s.l.:s.n., p. 283.
Floridi, L., 2013. The ethics of information, s.l.: Oxford University Press.
IEEE, 2019. Ethically aligned design: A vision for prioritizing human well-being with
autonomous and intelligent systems, s.l.: IEEE Standards Association.
Noble, S. U., 2018. Algorithms of oppression: How search engines reinforce racism, NYU
Press: s.n.

17

You might also like