Facilitating Document Annotation Using Content And Querying
Value
Abstract:
A large number of organizations today generate and share textual descriptions of
their products, services, and actions .Such collections of textual data contain
significant amount of structured information, which remains buried in the
unstructured text. While information extraction algorithms facilitate the extraction
of structured relations, they are often expensive and inaccurate, especially when
operating on top of text that does not contain any instances of the targeted
structured information. We present a novel alternative approach that facilitates
the generation of the structured metadata by identifying documents that are likely
to contain information of interest and this information is going to be subsequently
useful for querying the database. Our approach relies on the idea that humans are
more likely to add the necessary metadata during creation time, if prompted by
the interface; or that it is much easier for humans (and/or algorithms) to identify
the metadata when such information actually exists in the document, instead of
naively prompting users to fill in forms with information that is not available in the
document. As a major contribution of this paper, we present algorithms that
identify structured attributes that are likely to appear within the document ,by
jointly utilizing the content of the text and the query workload. Our experimental
evaluation shows that our approach generates superior results compared to
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS
CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401
Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
approaches that rely only on the textual content or only on the query workload, to
identify attributes of interest.
Architecture:
EXISTING SYSTEM:
Many systems, though, do not even have the basic “attribute-value” annotation
that would make a “pay-as-you-go” querying feasible. Existing work on query
forms can beleveraged in creating the CADS adaptive query forms. They propose
an algorithm to extract a query form that represents most of the queries in the
database using the ”querability” of the columns, while they extend their work
discussing forms customization. Some people use the schema information to auto-
complete attribute or value names in query forms. In keyword queries are used to
select the most appropriate query forms.
PROPOSED SYSTEM:
In this paper, we propose CADS (Collaborative Adaptive Data Sharing platform),
which is an “annotate-as-you-create” infrastructure that facilitates fielded data
annotation .A key contribution of our system is the direct use of the query
workload to direct the annotation process, in addition to examining the content of
the document. In other words, we are trying to prioritize the annotation of
documents towards generating attribute values for attributes that are often used
by querying users.
Modules :
1. Registration
2. Login
3. Document Upload
4. Search Techniques
5. Download Document
Modules Description
Registration:
In this module an Author(Creater) or User have to register
first,then only he/she has to access the data base.
Login:
In this module,any of the above mentioned person have
to login,they should login by giving their emailid and password .
Document Upload:
In this
module Owner uploads an unstructured document as file(along with meta data)
into database,with the help of this metadata and its contents,the end user has to
download the file.He/She has to enter content/query for download the file.
Search Techniques:
Here we are using two techniques for searching the document
1)Content Search,2)Query Search.
Content Search:
It means that the document will be downloaded by giving the
content which is present in the corresponding document.If its present the
corresponding document will be downloaded,Otherwise it won’t.
Query Search:
It means that the document will be downloaded by using query
which has given in the base paper.If its input matches the document will get
download otherwise it won’t.
Download Document:
The User has to download the document using query/content
values which have given in the base paper.He/She enters the correct data in the
text boxes, if its correct it will download the file.Otherwise it won’t.
System Configuration:-
H/W System Configuration:-
Processor - Pentium –III
Speed - 1.1 GHz
RAM - 256 MB (min)
Hard Disk - 20 GB
Floppy Drive - 1.44 MB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - SVGA
S/W System Configuration:-
 Operating System :Windows95/98/2000/XP
 Application Server : Tomcat5.0/6.X
 Front End : HTML, Java, Jsp
 Scripts : JavaScript.
 Server side Script : Java Server Pages.
 Database : My sql
 Database Connectivity : JDBC.
Conclusion:
We proposed adaptive techniques to suggest relevant at-tributes to
annotate a document, while trying to satisfy the user querying needs. Our solution
is based on a probabilistic framework that considers the evidence in the document
content and the query workload. We present two ways to combine these two
pieces of evidence, content value and Querying value: a model that considers both
components conditionally independent and a linear weighted model. Experiments
shows that using our techniques, we can suggest attributes that improve the
visibility of the documents with respect to the query workload by up to 50%. That
is, we show that using the query workload can greatly improve the annotation
process and increase the utility of shared data.
CLOUING
DOMAIN: WIRELESS NETWORK PROJECTS

More Related Content

DOCX
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Facilitating document annotation usin...
DOCX
Facilitating document annotation using content and querying value
DOCX
JPJ1421 Facilitating Document Annotation Using Content and Querying Value
DOCX
facilitating document annotation using content and querying value
PPTX
Google indexing
PDF
CEK KEMIRIPAN PADA CROSSREF
PPT
Automatic Metadata Generation Charles Duncan
PDF
Collecting and Using Funding Data Crossref
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Facilitating document annotation usin...
Facilitating document annotation using content and querying value
JPJ1421 Facilitating Document Annotation Using Content and Querying Value
facilitating document annotation using content and querying value
Google indexing
CEK KEMIRIPAN PADA CROSSREF
Automatic Metadata Generation Charles Duncan
Collecting and Using Funding Data Crossref

What's hot (19)

PPTX
Reference linking and Cited-by
PPT
Citation Analysis for the Free, Online Literature
PPTX
The Global reach of Crossref metadata
PPTX
Ben Ryan (University of Leeds) – Timescapes Project
PPTX
Collecting and using funding data in your publications
PPT
How search engines work
PPT
Presentation federated search
PPTX
Working with Crossref and registering content
PPTX
Data, data, everywhere? Not nearly enough!
PPTX
Web crawler
PDF
Azure catalog
PPTX
Bigdata overview
PDF
Updating and Scheduling of Streaming Web Services in Data Warehouses
PDF
4. New metadata developments
PDF
New Metadata Developments - Crossref LIVE South Africa
PPTX
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
PDF
Globus Integrations (JupyterHub, Django, ...)
RTF
Introduction to Database Log Analysis
PPT
Federated Search: The Good, The Bad And The Ugly
Reference linking and Cited-by
Citation Analysis for the Free, Online Literature
The Global reach of Crossref metadata
Ben Ryan (University of Leeds) – Timescapes Project
Collecting and using funding data in your publications
How search engines work
Presentation federated search
Working with Crossref and registering content
Data, data, everywhere? Not nearly enough!
Web crawler
Azure catalog
Bigdata overview
Updating and Scheduling of Streaming Web Services in Data Warehouses
4. New metadata developments
New Metadata Developments - Crossref LIVE South Africa
Azure data catalog your data your way eugene polonichko dataconf 21 04 18
Globus Integrations (JupyterHub, Django, ...)
Introduction to Database Log Analysis
Federated Search: The Good, The Bad And The Ugly
Ad

Viewers also liked (12)

PDF
2013 2014 ieee finalyear btech mtech java projects richbraintechnologies
PDF
2013 2014 ieee finalyear me mtech java projects richbraintechnologies
PDF
2013 2014 ieee finalyear beme dotnet projects richbraintechnologies
DOCX
Pack prediction based cloud bandwidth and cost reduction system
DOCX
Enforcing secure and privacy preserving information brokering in distributed ...
DOCX
Spatial approximate string search
DOCX
Power allocation for statistical qo s provisioning in
DOCX
Crowdsourcing predictors of behavioral outcomes
DOCX
Efficient rekeying framework for secure multicast with diverse subscription-p...
DOCX
Personalized mobile search engine
DOCX
Extracting spread spectrum hidden
DOCX
Secure and efficient data transmission for cluster based wireless sensor netw...
2013 2014 ieee finalyear btech mtech java projects richbraintechnologies
2013 2014 ieee finalyear me mtech java projects richbraintechnologies
2013 2014 ieee finalyear beme dotnet projects richbraintechnologies
Pack prediction based cloud bandwidth and cost reduction system
Enforcing secure and privacy preserving information brokering in distributed ...
Spatial approximate string search
Power allocation for statistical qo s provisioning in
Crowdsourcing predictors of behavioral outcomes
Efficient rekeying framework for secure multicast with diverse subscription-p...
Personalized mobile search engine
Extracting spread spectrum hidden
Secure and efficient data transmission for cluster based wireless sensor netw...
Ad

Similar to Facilitating document annotation using content and querying value (20)

PDF
Annotation Approach for Document with Recommendation
PDF
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION
PDF
Enabling SQL Access to Data Lakes
PPT
Cibm work shop 2chapter six
PDF
History Of Database Technology
PDF
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
PPTX
Fundamentals of Database Design
PPT
Database
PPTX
Share point metadata
PPT
Database
PPT
Database
PDF
TCS_DATA_ANALYSIS_REPORT_ADITYA
PDF
A Review of Data Access Optimization Techniques in a Distributed Database Man...
PDF
A Review of Data Access Optimization Techniques in a Distributed Database Man...
PDF
System Design Interview Questions PDF By ScholarHat
PDF
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
DOC
Database Management System
PDF
Database Management Systems ( Dbms )
DOC
Odam an optimized distributed association rule mining algorithm (synopsis)
PDF
Sweeny ux-seo om-cap 2014_v3
Annotation Approach for Document with Recommendation
USING GOOGLE’S KEYWORD RELATION IN MULTIDOMAIN DOCUMENT CLASSIFICATION
Enabling SQL Access to Data Lakes
Cibm work shop 2chapter six
History Of Database Technology
professional fuzzy type-ahead rummage around in xml type-ahead search techni...
Fundamentals of Database Design
Database
Share point metadata
Database
Database
TCS_DATA_ANALYSIS_REPORT_ADITYA
A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
System Design Interview Questions PDF By ScholarHat
Methodology for Optimizing Storage on Cloud Using Authorized De-Duplication –...
Database Management System
Database Management Systems ( Dbms )
Odam an optimized distributed association rule mining algorithm (synopsis)
Sweeny ux-seo om-cap 2014_v3

More from IEEEFINALYEARPROJECTS (20)

DOCX
Scalable face image retrieval using attribute enhanced sparse codewords
DOCX
Scalable face image retrieval using attribute enhanced sparse codewords
DOCX
Reversible watermarking based on invariant image classification and dynamic h...
DOCX
Reversible data hiding with optimal value transfer
DOCX
Query adaptive image search with hash codes
DOCX
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
DOCX
Local directional number pattern for face analysis face and expression recogn...
DOCX
An access point based fec mechanism for video transmission over wireless la ns
DOCX
Towards differential query services in cost efficient clouds
DOCX
Spoc a secure and privacy preserving opportunistic computing framework for mo...
DOCX
Privacy preserving back propagation neural network learning over arbitrarily ...
DOCX
Non cooperative location privacy
DOCX
Harnessing the cloud for securely outsourcing large
DOCX
Geo community-based broadcasting for data dissemination in mobile social netw...
DOCX
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
DOCX
Dynamic resource allocation using virtual machines for cloud computing enviro...
DOCX
A secure protocol for spontaneous wireless ad hoc networks creation
DOCX
Utility privacy tradeoff in databases an information-theoretic approach
DOCX
Two tales of privacy in online social networks
DOCX
Spatial approximate string search
Scalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewords
Reversible watermarking based on invariant image classification and dynamic h...
Reversible data hiding with optimal value transfer
Query adaptive image search with hash codes
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Local directional number pattern for face analysis face and expression recogn...
An access point based fec mechanism for video transmission over wireless la ns
Towards differential query services in cost efficient clouds
Spoc a secure and privacy preserving opportunistic computing framework for mo...
Privacy preserving back propagation neural network learning over arbitrarily ...
Non cooperative location privacy
Harnessing the cloud for securely outsourcing large
Geo community-based broadcasting for data dissemination in mobile social netw...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Dynamic resource allocation using virtual machines for cloud computing enviro...
A secure protocol for spontaneous wireless ad hoc networks creation
Utility privacy tradeoff in databases an information-theoretic approach
Two tales of privacy in online social networks
Spatial approximate string search

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
CEH Module 2 Footprinting CEH V13, concepts
PPTX
Build automations faster and more reliably with UiPath ScreenPlay
PDF
Examining Bias in AI Generated News Content.pdf
PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
Ensemble model-based arrhythmia classification with local interpretable model...
PDF
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
PDF
Auditboard EB SOX Playbook 2023 edition.
PPTX
Internet of Everything -Basic concepts details
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
Human Computer Interaction Miterm Lesson
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
giants, standing on the shoulders of - by Daniel Stenberg
NewMind AI Weekly Chronicles – August ’25 Week IV
LMS bot: enhanced learning management systems for improved student learning e...
Rapid Prototyping: A lecture on prototyping techniques for interface design
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
CEH Module 2 Footprinting CEH V13, concepts
Build automations faster and more reliably with UiPath ScreenPlay
Examining Bias in AI Generated News Content.pdf
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
Ensemble model-based arrhythmia classification with local interpretable model...
ment.tech-Siri Delay Opens AI Startup Opportunity in 2025.pdf
Auditboard EB SOX Playbook 2023 edition.
Internet of Everything -Basic concepts details
SGT Report The Beast Plan and Cyberphysical Systems of Control
Electrocardiogram sequences data analytics and classification using unsupervi...
Human Computer Interaction Miterm Lesson
Advancing precision in air quality forecasting through machine learning integ...
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
giants, standing on the shoulders of - by Daniel Stenberg

Facilitating document annotation using content and querying value

  • 1. Facilitating Document Annotation Using Content And Querying Value Abstract: A large number of organizations today generate and share textual descriptions of their products, services, and actions .Such collections of textual data contain significant amount of structured information, which remains buried in the unstructured text. While information extraction algorithms facilitate the extraction of structured relations, they are often expensive and inaccurate, especially when operating on top of text that does not contain any instances of the targeted structured information. We present a novel alternative approach that facilitates the generation of the structured metadata by identifying documents that are likely to contain information of interest and this information is going to be subsequently useful for querying the database. Our approach relies on the idea that humans are more likely to add the necessary metadata during creation time, if prompted by the interface; or that it is much easier for humans (and/or algorithms) to identify the metadata when such information actually exists in the document, instead of naively prompting users to fill in forms with information that is not available in the document. As a major contribution of this paper, we present algorithms that identify structured attributes that are likely to appear within the document ,by jointly utilizing the content of the text and the query workload. Our experimental evaluation shows that our approach generates superior results compared to GLOBALSOFT TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 Visit: www.finalyearprojects.org Mail to:[email protected]
  • 2. approaches that rely only on the textual content or only on the query workload, to identify attributes of interest. Architecture: EXISTING SYSTEM: Many systems, though, do not even have the basic “attribute-value” annotation that would make a “pay-as-you-go” querying feasible. Existing work on query forms can beleveraged in creating the CADS adaptive query forms. They propose an algorithm to extract a query form that represents most of the queries in the database using the ”querability” of the columns, while they extend their work discussing forms customization. Some people use the schema information to auto- complete attribute or value names in query forms. In keyword queries are used to select the most appropriate query forms.
  • 3. PROPOSED SYSTEM: In this paper, we propose CADS (Collaborative Adaptive Data Sharing platform), which is an “annotate-as-you-create” infrastructure that facilitates fielded data annotation .A key contribution of our system is the direct use of the query workload to direct the annotation process, in addition to examining the content of the document. In other words, we are trying to prioritize the annotation of documents towards generating attribute values for attributes that are often used by querying users. Modules : 1. Registration 2. Login 3. Document Upload 4. Search Techniques 5. Download Document Modules Description Registration: In this module an Author(Creater) or User have to register first,then only he/she has to access the data base. Login: In this module,any of the above mentioned person have to login,they should login by giving their emailid and password .
  • 4. Document Upload: In this module Owner uploads an unstructured document as file(along with meta data) into database,with the help of this metadata and its contents,the end user has to download the file.He/She has to enter content/query for download the file. Search Techniques: Here we are using two techniques for searching the document 1)Content Search,2)Query Search. Content Search: It means that the document will be downloaded by giving the content which is present in the corresponding document.If its present the corresponding document will be downloaded,Otherwise it won’t. Query Search: It means that the document will be downloaded by using query which has given in the base paper.If its input matches the document will get download otherwise it won’t. Download Document: The User has to download the document using query/content values which have given in the base paper.He/She enters the correct data in the text boxes, if its correct it will download the file.Otherwise it won’t.
  • 5. System Configuration:- H/W System Configuration:- Processor - Pentium –III Speed - 1.1 GHz RAM - 256 MB (min) Hard Disk - 20 GB Floppy Drive - 1.44 MB Key Board - Standard Windows Keyboard Mouse - Two or Three Button Mouse Monitor - SVGA S/W System Configuration:-  Operating System :Windows95/98/2000/XP  Application Server : Tomcat5.0/6.X  Front End : HTML, Java, Jsp  Scripts : JavaScript.  Server side Script : Java Server Pages.  Database : My sql  Database Connectivity : JDBC.
  • 6. Conclusion: We proposed adaptive techniques to suggest relevant at-tributes to annotate a document, while trying to satisfy the user querying needs. Our solution is based on a probabilistic framework that considers the evidence in the document content and the query workload. We present two ways to combine these two pieces of evidence, content value and Querying value: a model that considers both components conditionally independent and a linear weighted model. Experiments shows that using our techniques, we can suggest attributes that improve the visibility of the documents with respect to the query workload by up to 50%. That is, we show that using the query workload can greatly improve the annotation process and increase the utility of shared data.