Common data types like dates, addresses, phone numbers and tables can have multiple textual representations, and many heavily-used languages, such as SQL, come in several dialects. These variations can cause data to be misinterpreted, leading to silent data corruption, failure of data processing systems, or even security vulnerabilities
Saggitarius is a new language and system designed to help programmers reason about the format of data, by describing grammatical domains—that is, sets of context-free grammars that describe the many possible representations of a datatype. Then, the Saggitarius algorithm analyzes a user-provided data set and infers which grammar in the domain best matches that data.
Anders Miltner, Devon Loehr, Arnold Mong, Kathleen Fisher, and David Walker. Saggitarius: A DSL for Specifying Grammatical Domains. In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages & Applications (OOPSLA '23), October 2023. [ conference version ]
Anders Miltner, Devon Loehr, Arnold Mong, Kathleen Fisher, and David Walker. Linguistic Tools for Managing Grammatical Domains. In The Eighth Workshop on Language-Theoretic Security (LangSec '22), May 2022. [ workshop version ]
This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA) under the SafeDocs program. The views, opinions and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.