We're excited to announce the launch of a new open-source product, data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied. Replicating data at scale, across hundreds of tables, with low latency and at a reasonable infrastructure cost is a hard problem, and most data teams we’ve talked to, have faced data quality issues in their replication processes. The hard truth is that the quality of the replication is the quality of the data. Since copying entire datasets in batch is often infeasible at the modern data scale, businesses rely on the Change Data Capture (CDC) approach of replicating data using a continuous stream of updates.

Features

  • Find mismatches across databases
  • Outputs diff of rows in detail
  • Simple CLI/API to create monitoring and alerts
  • Verify 25M+ rows in <10s, and 1B+ rows in ~5min
  • Verifies across many different databases
  • Works for tables with 10s of billions of rows

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow data-diff

data-diff Web Site

Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of data-diff!

Additional Project Details

Programming Language

Python

Related Categories

Python Database Software, Python Data Quality Tool

Registered

2022-07-23