DATA LEAKAGE DETECTION
1. INTRODUCTION
1.1 About the Project
In business environment, sensitive data must be handed over to trusted parties for
example, a company may partnerships with other companies that require sharing customer data.
Another enterprise may outsource its data processing, so data must be given to various other
companies in a trusted manner.
If the distributor sees enough evidence that an agent leaks data, he may stop doing
business with him, or may initiate legal proceedings.
A model for accessing the guilt of agents. We also present algorithm for distributing
objects to agents, in a way that improves our chances of identifying a leaker. Finally, we also
consider the opinion of adding ‘fake’ objects to the distributed set. Such objects do not
correspond to real time entities but appear realistic to the agents.
In a sense, the fake objects acts as a type if watermark for the entire set, without
modifying any individual members. If it turns out an agent was given one or more fake objects
that were leaked, then the distributor can be more confident that agent was guilty.
1.2 Organization Profile
Max Technologies is an emerging IT services and HR consultancy firm located in
Tirupur. Max Technologies was established in the year 2009 by most young and dynamic
entrepreneur [Link] Prabhakaran,B.E(ECE)., with the strong technical knowledge.
The director of this organization comes with vast and wide experience from the software
as well as hardware. Max Technologies provides software development, recruitment, HR
management, sales, support and IT enabled services.
Max Technologies is having a decade of in depth knowledge of developing software and
which aids in developing the job specification and in analyzing the technical competencies
required for software professionals.
Max Technologies are having a dedicated team of well qualified professionals who
understand the client needs and they focusing on providing recruitment services across all levels.
Max Technologies maintain a good relationship with the customers and provides better
services to them. Max Technologies is to give quality services in the better and suitable
environment so as to satisfy the customer needs and also to shine them in the competitive field.
2. SYSTEM ANALYSIS
Analysis may be defined as the process of dividing into parts, identifying each part and
establishing relationships in the parts. Analysis is a detailed study of the various operations
performed by a system and relationships within and outside the system.
Analysis is a continuing activity at all stages of the project. It is a process of studying
problem to find the best solutions to the problem by which the existing system is learnt and
which the existing problems are understood.
Objectives and requirements are defined and the solutions are evaluated. Once analysis is
completed, the analyst has firm understanding of what is to be done.
Analysis consists of two sub phases: planning and requirement definitions. They include
understanding the customer’s problem performing a feasibility study, developing a
recommended solution determining the acceptance criteria and planning the development
process.
The products of planning are a system definition and project plans. The system definition is
typically expressed in English of some other natural languages and may incorporates charts,
figures, graphs, tables and equations of various kinds.
The exact notation used in the system definition is highly dependent on the problem area.
Obviously, one uses different terminology to describe an accounting system than to describe a
process control system.
2.1 Existing System
Traditionally, leakage detection is handled by watermarking, e.g., a unique code is
embedded in each distributed copy. If that copy is later discovered in the hands of an
unauthorized party, the leaker can be identified water marks can be useful in some cases, but it
involves some modification to the original data. Furthermore, sometimes watermarks can be
destroyed if it is received by the malicious recipient.
In the existing system generating and adding watermark to the original document
consumers more time and it is very difficult to implement.
2.2 Proposed System
In the proposed system the client sends the file request to the admin who acts as the
distribution of data. The admin after receiving the request sends the particular file to the client by
adding a pseudo polynomic key to the file. The pseudo polynomic key is considered as a fake
object. The key is generated automatically by using a function , pseudorandom key generation.
After inserting the fake object to the file the admin sends the file to the requested client.
In the mean time, if an intruder gets the file with the fake object the person cannot open and see
the contents. At the same time, the admin will be alerted by a message indicating the intruder’s
interruption.
2.3 Objectives of Proposed System
A data distributor has given sensitive data to a set of supposedly trusted agents. Some of
the data is leaked and found in an unauthorized place.
We propose data allocation strategies that improve the probability of identifying leakage.
Our goal is to detect when the distributor’s sensitive data has been leaked by agents, and if
possible to identify the agents who leaks the data.
3. SYSTEM SPECIFICATION
3.1 Hardware Specification
System : Pentium IV 2.4 GHz.
Hard Disk : 40 GB.
Floppy Drive : 1.44 Mb.
Monitor : 15” SVGA Colour.
Mouse : Logitech.
RAM : 512 Mb.
3.2 Software Specification
Operating system : Windows XP.
Front End : [Link]
Back End : MS-ACCESS
5. SYSTEM DEVELOPMENT
5.1 Module Description
“Data Leakage Detection” helps to detect the leaked file and it consists of the
following modules
Client Module
Admin Module
5.1.1 Client Module
In this module, client send request to the admin to access the data or records. After
receiving the response from the admin, client can view the data or records.
5.1.2 Admin Module
The admin module have the following sub modules.
Receive request
Generate fake object
Inject fake object
Transfer file
Receive Request
In this module, the admin receives the request from the client to access the data or
records.
Generate Fake Objects
In this module, before sending response to the client the admin add a pseudo polynomic
key to the file. The pseudo polynomic key is generated automatically by using a function pseudo
random key generation.
Inject Fake Object
In this module, admin inject the fake objects with the data or record to the requested file.
Transfer File
After inserting the fake object to the file the admin transfer the file to the requested client.
4.2 Input Design
Once the requirements are frozen in the analysis phase, the requirement specifications are
carried out over the design phase. Design phase involves in deciding how to implement the
requirements. System design is the process of identifying the main components of the system
without going to its interior details.
Each component is considered to be a black box, meaning that for a given input, specified
output is produced, but the details about how the input is being processed is not known. In the
system the input screens are designed in the manner to be simply software. This includes the user
reliabilities and user friendly.
Objectives Of Input Design
Input design consists of developing specifications and procedures for data preparation,
the steps necessary to put transaction data into a usable from for processing and data entry,
the activity of data into the computer processing. The five objectives of input design are:
Controlling the amount of input
Avoiding delay
Avoiding error in data
Avoiding extra steps
Keeping the process simple
4.3 Output Design
Output design is the process of converting computer data into hard copy that is understood
by all. When designing output system analyst must accomplish the following,
Determine what information is present.
Decide whether to display, print or speak the information and select the output medium.
Arrange the presentation of information in an acceptable format.
Decide how to distribute the output to intended recipients.
Types of output:
Whether the output is formatted report or a simple listing of the contents of a file, a
computer process will produce the output.
A Document
A Message
Retrieval from a data store
Transmission from a process or system activity
Directly from an output sources
4.6 Data Flow Diagram
A data flow diagram (DFD) is a graphical representation of the "flow" of data through
an information system, modeling its process aspects.
Often they are a preliminary step used to create an overview of the system which can
later be elaborated.
DFDs can also be used for the visualization of data processing (structured design).
A DFD shows what kinds of information will be input to and output from the system,
where the data will come from and go to, and where the data will be stored
There are only four symbols:
Squares representing external entities, which are sources or destinations of data.
Rounded rectangles representing processes, which take data as input, do
something to it, and output it.
Arrows representing the data flows, which can either, be electronic data or
physical items.
Open-ended rectangles representing data stores, including electronic stores such
as databases or XML files and physical stores such as or filing cabinets or stacks
of paper.
Database Name: dbleak
Table Name: tbClient_Request
Table Name: tbdocumentList
Field name Data type Width Description
Docuname Text 25 Document Name
Username Text 25 Username
Field name Data type Width Description
Abtdocument Text 25 Document Details
Document Text 25 Document List
List
Table Name: tbReceiveRequest
Field name Data type Width Description
Docuname Text 25 Document Name
Username Text 25 Username
Table Name: tbPseudokey
Primary Key: PseudoKey
Field name Data type Width Description
Pseudo Key Number 25 Pseudo polynomic
Generation
Object Name Text 25 Key Name
Table Name: tbInjectkey
Primary Key: PseudoKey
Field name Data type Width Description
Docuname Text 25 Document Name
Username Text 25 Username
Pseudo Key Number 25 Pseudo polynomic
Generation
Object Name Text 25 Key Name
Table Name: tbTransferfile
Field name Data type Width Description
Docuname Text 25 Document Name
Username Text 25 Username
Fake Object Text 25 Object Name
Table Name: tbClientReceive
Field name Data type Width Description
Docuname Text 25 Document Name
Username Text 25 Username
Table Name: tbSendthirdparty
Field name Data type Width Description
Docuname Text 25 Document Name
Third Party Text 25 Username
Table Name: tbleakIdentify
Field name Data type Width Description
Docuname Text 25 Document Name
Username Text 25 Username
Table Name: tbAlertMessage
Field name Data type Width Description
Docuname Text 25 Document Name
Username Text 25 Username
Message Text 50 Alert message