Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "document classification"

x

Sort By:

Relevance

OS

Linux 25
Windows 24
Mac 22
More...
BSD 7
ChromeOS 7

Category

Artificial Intelligence 19
Scientific/Engineering 6
Business 4
Software Development 3
Multimedia 2
System 2
Text Editors 2
Communications 1
Education 1
Internet 1

License

OSI-Approved Open Source 17
Creative Commons Attribution License 2

Translations

English 2
French 1
Tamil 1

Programming Language

Java 8
Python 8
C++ 6
C 2
More...
JavaScript 2
C# 1
MATLAB 1
PHP 1
S/R 1

Status

Beta 5
Production/Stable 3
Planning 2
Pre-Alpha 2
More...
Alpha 2
Mature 1

Showing 29 open source projects for "document classification"

View related business solutions

Auth0 for AI Agents now in GA
Ready to implement AI with confidence (without sacrificing security)?

Connect your AI agents to apps and data more securely, give users control over the actions AI agents can perform and the data they can access, and enable human confirmation for critical agent actions.

Start building today
Run applications fast and securely in a fully managed environment
Cloud Run is a fully-managed compute platform that lets you run your code in a container directly on top of scalable infrastructure.

Run frontend and backend services, batch jobs, deploy websites and applications, and queue processing workloads without the need to manage infrastructure.

Try for free
1

Cleanlab

The standard data-centric AI package for data quality and ML

cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...

Downloads: 0 This Week

Last Update: 2025-02-27
See Project
2

Apache OpenNLP

Apache OpenNLP

Apache OpenNLP is a machine learning-based NLP library that provides tools for text-processing tasks such as tokenization, sentence segmentation, and named entity recognition.

Downloads: 1 This Week

Last Update: 2025-12-06
See Project
3

flair

A very simple framework for state-of-the-art NLP

...Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), sentiment analysis, part-of-speech tagging (PoS), special support for biomedical texts, sense disambiguation and classification, with support for a rapidly growing number of languages. A text embedding library. Flair has simple interfaces that allow you to use and combine different word and document embeddings, including our proposed Flair embeddings and various transformers. A PyTorch NLP framework. Our framework builds directly on PyTorch, making it easy to train your own models and experiment with new approaches using Flair embeddings and classes.

Downloads: 0 This Week

Last Update: 2025-02-05
See Project
4

NeMo Curator

Scalable data pre processing and curation toolkit for LLMs

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline...

Downloads: 0 This Week

Last Update: 2025-10-01
See Project
Desktop and Mobile Device Management Software
It's a modern take on desktop management that can be scaled as per organizational needs.

Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.

Learn More
5

DeepSeek AIO

Access and use all DeepSeek AI models in one program.

DeepSeek AIO is a simple program that allows you to interact with all DeepSeek large language models in one place. It supports text-based chats, data analysis, code generation, language translation, and more. The program is designed to make it easy for users to use DeepSeek's AI tools for different purposes without switching between multiple platforms.

Downloads: 24 This Week

Last Update: 2025-11-26
See Project
6

DocWire SDK

Award-winning modern data processing SDK in C++20

DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to...

Downloads: 4 This Week

Last Update: 2025-11-01
See Project
7

e-Dokyumento

e-Dokyumento is web-based Document Management System (DMS)

e-Dokyumento is opensource web-based Document Management System (DMS) A Document Management which automates the basic office document workflow such as receiving, filing, routing, and approving through capturing (scanning), digitizing (OCR Reading), storing, tagging, and electronically routing and approving (e-signature) of electronic documents. # Demo : https://e-dokyumento.herokuapp.com/ https://edokyu.seillig.com/ (refer to Readme.md for the...

2 Reviews

Downloads: 1 This Week

Last Update: 2022-05-14
See Project
8

Hugging Face Transformer

CPU/GPU inference server for Hugging Face transformer models

Optimize and deploy in production Hugging Face Transformer models in a single command line. At Lefebvre Dalloz we run in-production semantic search engines in the legal domain, in the non-marketing language it's a re-ranker, and we based ours on Transformer. In that setup, latency is key to providing a good user experience, and relevancy inference is done online for hundreds of snippets per user query. Most tutorials on Transformer deployment in production are built over Pytorch and FastAPI....

Downloads: 0 This Week

Last Update: 2022-08-22
See Project
9

Interpret-Text

State-of-the-art explainers for text-based machine learning models

A library that incorporates state-of-the-art explainers for text-based machine learning models and visualizes the result with a built-in dashboard. Interpret-Text builds on Interpret, an open source python package for training interpretable models and helping to explain blackbox machine learning systems. We have added extensions to support text models. Interpret-Text incorporates community-developed interpretability techniques for NLP models and a visualization dashboard to view the results....

Downloads: 0 This Week

Last Update: 2023-12-19
See Project
Convert your data into real-time, interactive dashboards and reports.
For organizations interested in a cost-effective and powerful BI Platform with no annual subscriptions

Enterprise quality BI Platform designed for the modern business and at a fraction of the cost of comparable options.

Learn More
10

DynaQ

Innovative text document search. http://dynaq.opendfki.de for details.

The goal of DynaQ is to develop an inquiry system to explore the personal information space, supporting you with the searching paradigm 'orienteering'. DynaQ is a (desktop)search engine with enhanced functionality for file, email and blog search. Look at our GitLab homepage for sourcecode and documentation: http://dynaq.opendfki.de

Downloads: 0 This Week

Last Update: 2021-08-05
See Project
11

fastText

Library for fast text classification and representation

...Such categories can be review scores, spam v.s. non-spam, or the language in which the document was typed. Nowadays, the dominant approach to build such classifiers is machine learning, that is learning classification rules from examples. In order to build such classifiers, we need labeled data, which consists of documents and their corresponding categories (or tags, or labels).

Downloads: 1 This Week

Last Update: 2023-10-10
See Project
12

Document Classification

Document/Text Classification using Naive Bayes model.

Downloads: 0 This Week

Last Update: 2015-08-20
See Project
13

20newsgroupClassify in NaiveBayes Matlab

...Mitchell's book (Machine Learning, Tom Mitchell) 2. Please download the data from http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html Report: A Comprehensive Report below: "20 newsgroup Classification problem" Accuracy: 83.625% Number of Training data: 50% of Total document chosen sequentially Number of Training data: 50% of remaining document Download the code and other necessary files in the Files Tab.

Downloads: 0 This Week

Last Update: 2015-10-29
See Project
14

GTkNN

GPU-based Textual kNN (GT-kNN)

The following code is a parallel kNN implementation that uses GPUs for the high dimensional data in text classification. You can use it to classify documents using kNN or to generate meta-features based on the distances between a query document and its k nearest neigbors

Downloads: 0 This Week

Last Update: 2015-07-09
See Project
15

Arabic Wikipedia into Named Entity

“Arabic Wikipedia into Named Entity Taxonomy” is a dataset consists of 4000 of Arabic Wikipedia articles that classified into coarse-grained NE taxonomy. This dataset can be used in document classification tasks in relation to NER. To use this corpus, please cite the following publication: F. Alotaibi and M. Lee, "Mapping Arabic Wikipedia into the Named Entities Taxonomy", In Proceedings of COLING 2012: Posters, p43-52, IIT, Mumbai, India, December 8-15. 2012. Author URL: http://www.cs.bham.ac.uk/~fsa081/index.html http://fsalotaibi.kau.edu.sa Email: fsalotaibi {AT} kau.edu.sa fsa081 {AT} cs.bham.ac.uk

Downloads: 0 This Week

Last Update: 2014-08-24
See Project
16

avimmir

(audio, video, image) Multimedia Multimodal Information Retrieval

audio classification; speaker segmentation; speaker clustering; speaker recognition; spoken document retrieval; image retrieval; video retrieval; etc.

Downloads: 0 This Week

Last Update: 2013-11-23
See Project
17

MyNook

A machine learning system for supervised document classification

An open source system for supervised document classification based on statistical machine learning techniques. On the contrary of the state of art classification techniques, MyNook just requires the title of the document, not the content itself.

Downloads: 0 This Week

Last Update: 2016-10-31
See Project
18

DauroLab

A toolbox for Multiclass Document Classification. However, it can also be used for generic Machine Learning purposes.

Downloads: 0 This Week

Last Update: 2014-06-29
See Project
19

Content based File Organizer

This is a document organizer that learns from user behavior. It uses classification algorithms to prepare label-suggestions for files. It also has a search feature that extends user queries with WordNet dictionary.

Downloads: 0 This Week

Last Update: 2014-12-16
See Project
20

NLP4J Natural Language Processing 4 Java

NLP4J library is a toolset written in Java for Natural Language Processing. This version is oriented to Document Classification and uses Naive Bayes, TF-IDF, etc. There are also pre-processing tools.

Downloads: 0 This Week

Last Update: 2014-07-01
See Project
21

Trainable Relation Extraction framework

T-Rex (Trainable Relation Extraction) is a highly configurable machine learning-based Information Extraction from Text framework, which includes tools for document classification, entity extraction and relation extraction.

Downloads: 0 This Week

Last Update: 2013-05-02
See Project
22

Young Researchers' Induction Foundation

Collection of Statistical Language Processing Tools and Modules for Information Retrieval, Document Classification, Vectorization, Pattern Matching, Knowledge/Text Mining related problems.

Downloads: 0 This Week

Last Update: 2013-04-17
See Project
23

Qualiweb

Qualiweb aims at providing semantic web metrics for modeling a website visitors needs according to a given taxonomy or document classification. Web metrics provided by Qualiweb give an indication of how successful each of the website topics have been.

Downloads: 0 This Week

Last Update: 2013-03-19
See Project
24

Judge

JUDGE (Java Utility for Document Genre Eduction) features automatic classification and clustering of documents, optionally as a webservice. The program is written entirely in Java and makes use of the Weka machine learning toolkit.

Downloads: 0 This Week

Last Update: 2015-12-01
See Project
25

WSDC-Web Service Document Classificator

The project is to build a web service automatic document classification (like POPFile do for mail)

Downloads: 0 This Week

Last Update: 2013-02-25
See Project

Previous
You're on page 1
2
Next

Related Searches

opennlp

deepseek

rtf portable

signing documents

document classification

sentiment analysis

naive bayes for text classification with matlab code

knn c++

arabic corpus

freeware

Related Categories

Artificial Intelligence

Scientific/Engineering

Business

Software Development

Multimedia

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2025 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise

×

Thanks for helping keep SourceForge clean.

X

You seem to have CSS turned off. Please don't fill out this field.

You seem to have CSS turned off. Please don't fill out this field.

Briefly describe the problem (required):

Upload screenshot of ad (required):

Select a file, or drag & drop file here.

✔

✘

Screenshot instructions:

Click URL instructions:
Right-click on the ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)

More information about our ad policies

Ad destination/click URL: