Showing 6115 open source projects for "web site scraper"

View related business solutions
  • Say goodbye to broken revenue funnels and poor customer experiences Icon
    Say goodbye to broken revenue funnels and poor customer experiences

    Connect and coordinate your data, signals, tools, and people at every step of the customer journey.

    LeanData is a Demand Management solution that supports all go-to-market strategies such as account-based sales development, geo-based territories, and more. LeanData features a visual, intuitive workflow native to Salesforce that enables users to view their entire lead flow in one interface. LeanData allows users to access the drag-and-drop feature to route their leads. LeanData also features an algorithms match that uses multiple fields in Salesforce.
    Learn More
  • The Original Buy Center Software. Icon
    The Original Buy Center Software.

    Never Go To The Auction Again.

    VAN sources private-party vehicles from over 20 platforms and provides all necessary tools to communicate with sellers and manage opportunities. Franchise and Independent dealers can boost their buy center strategies with our advanced tools and an experienced Acquisition Coaching™ team dedicated to your success.
    Learn More
  • 1
    shot-scraper

    shot-scraper

    A command-line utility for taking automated screenshots of websites

    shot-scraper is a command-line utility for taking automated screenshots of web pages using a headless browser engine. After installation, a single command can capture a full-page screenshot of a URL and save it to a file, making it ideal for documentation, monitoring, and visual regression tasks. Under the hood it uses a modern browser (installed via a one-time shot-scraper install step) and exposes options for viewport size, full-page versus clipped screenshots, and device emulation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    TEAMMATES Developer Web Site

    TEAMMATES Developer Web Site

    This is the project website for the TEAMMATES feedback management tool

    TEAMMATES is a free online tool for managing peer evaluations and other feedback paths of your students. It is provided as a cloud-based service for educators/students and is currently used by hundreds of universities across the world.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Web-Check

    Web-Check

    All-in-one OSINT tool for analysing any website

    Comprehensive, on-demand open source intelligence for any website. Get an insight into the inner-workings of a given website: uncover potential attack vectors, analyse server architecture, view security configurations, and learn what technologies a site is using. Currently the dashboard will show: IP info, SSL chain, DNS records, cookies, headers, domain info, search crawl rules, page map, server location, redirect ledger, open ports, traceroute, DNS security extensions, site performance,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Learn More
  • 5
    Site Kit for WordPress

    Site Kit for WordPress

    Site Kit is a one-stop solution for WordPress users

    Site Kit is a first-party WordPress plugin that brings key Google services into a single dashboard so site owners can see how their content performs and fix issues without leaving wp-admin. After a guided setup and verification flow, it connects properties to Search Console, Analytics, AdSense, PageSpeed Insights, and other services, surfacing the most relevant metrics per page and per site. The plugin focuses on clarity: traffic sources, search queries, top pages, and monetization signals...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    CyberScraper 2077

    CyberScraper 2077

    A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

    CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    html-metadata

    html-metadata

    MetaData html scraper and parser for Node.js (supports Promises

    The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    MeshCentral

    MeshCentral

    A complete web-based remote monitoring and management web site

    The open source, multi-platform, self-hosted, feature-packed web site for remote device management. MeshCentral is a full computer management web site. With MeshCentral, you can run your own web server to remotely manage and control computers on a local network or anywhere on the internet. Once you get the server started, create device group and download and install an agent on each computer you want to manage.
    Downloads: 44 This Week
    Last Update:
    See Project
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • 10
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    Fluent UI Web

    Fluent UI Web

    Collection of utilities andcomponents for building web applications

    A collection of UX frameworks for creating beautiful, cross-platform apps that share code, design, and interaction behavior. Build for one platform or for all. Everything you need is here. Build your own apps using the same open source components we do, with accessibility, internationalization, and performance included. From tutorials to a fun collection of API references, find what you need to design and develop your own Fluent experience. From Word and Excel to PowerBI and Teams, many...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Ulixee Hero

    Ulixee Hero

    The web browser built for scraping

    It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 13
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    blogdown

    blogdown

    Create Blogs and Websites with R Markdown

    blogdown is an R package that enables the creation and maintenance of static websites and blogs using R Markdown and Hugo (or other static-site generators). Developed by Yihui Xie and team, it provides functions to initialize sites, write posts, manage themes, and deploy with minimal fuss. It seamlessly blends R code chunks and web content, ideal for data storytellers and technical bloggers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Heimdall

    Heimdall

    An Application dashboard and launcher

    As the name suggests Heimdall Application Dashboard is a dashboard for all your web applications. It doesn't need to be limited to applications though, you can add links to anything you like. Heimdall is an elegant solution to organize all your web applications. It’s dedicated to this purpose so you won’t lose your links in a sea of bookmarks. Why not use it as your browser start page? It even has the ability to include a search bar using either Google, Bing or DuckDuckGo. ...
    Downloads: 54 This Week
    Last Update:
    See Project
  • 16
    eleventy

    eleventy

    A simpler site generator. Transforms a directory of templates

    A static site generator for modern web development, focusing on flexibility and customization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    WordPress

    WordPress

    Just a mirror of the WordPress subversion repository

    WordPress is one of the world’s most widely used content management systems (CMS), powering blogs, websites, and increasingly web apps. It offers a flexible architecture of themes and plugins, where users can extend functionality or customize layout without touching core code. The administrative dashboard includes post and page editors, media library, user roles, plugin/theme installation, and site settings. Through its REST API and headless mode, WordPress also serves as a backend for decoupled front ends using frameworks like React, Vue, or Gatsby. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 19
    Linux command

    Linux command

    Linux command encyclopedia search tool

    Linux command encyclopedia search tool, the content includes Linux command manual, detailed explanation, study, and collection. The current warehouse has collected more than 570 Linux commands. It is a non-profit warehouse. It has generated a web site for easy use. Currently, the site does not have any advertisements. The content includes Linux command manuals, detailed explanations, and learning. Very worthy collection of Linux command quick reference manual. The copyright belongs to the original author, and does not assume any responsibility for any legal issues and risks. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 20
    Certbot

    Certbot

    Get free HTTPS certificates forever from Let's Encrypt

    Certbot is a fully-featured, easy-to-use, extensible client for the Let's Encrypt CA. It fetches a digital certificate from Let’s Encrypt, an open certificate authority launched by the EFF, Mozilla, and others. This certificate then lets browsers verify the identity of web servers and ensures secure communication over the Web. Obtaining and maintaining a certificate is usually such a hassle, but with Certbot and Let’s Encrypt it becomes automated and hassle-free. With just a few simple...
    Downloads: 103 This Week
    Last Update:
    See Project
  • 21
    PT Plugin Plus

    PT Plugin Plus

    Google Chrome extension

    PT Assistant Plus, a browser plug-in (Web Extensions) for Google Chrome and Firefox, is mainly used to assist in downloading seeds from PT stations. PT Assistant Plus is a browser plug-in (Web Extensions), a tool that can improve the efficiency of PT site usage. Applicable to various PT stations, it can make various operations such as downloading seeds easier and faster.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Decap

    Decap

    A Git-based CMS for Static Site Generators

    Open source content management for your Git workflow. Use Decap CMS with any static site generator for a faster and more flexible web project. Get the speed, security, and scalability of a static site, while still providing a convenient editing interface for content. Content is stored in your Git repository alongside your code for easier versioning, multi-channel publishing, and the option to handle content updates directly in Git.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Jekyll RDF

    Jekyll RDF

    A Jekyll plugin to include RDF data in your static site

    Transform your RDF Knowledge Graph into static websites and blogs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    rvest

    rvest

    Simple web scraping for R

    rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    AMP

    AMP

    Web component framework for building ads, emails, websites and more

    AMP is an open source web component framework that allows you to easily create user-first websites, ads, emails, stories and more. AMP creates fast, smooth-loading web pages that prioritize the user-experience, consistently providing a fast experience across all devices and platforms.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next