Cracks in The Stack: Hidden Vulnerabilities and Licensing Risks in LLM Pre-Training Datasets

Jahanshahi, M. & Mockus, A.
Accepted in the Second International Workshop on Large Language Models for Code (LLM4Code 2025)
Preprint: https://arxiv.org/abs/2501.02628

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
replication		replication
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback