LociSimiles is a Python package for finding intertextual links in Latin literature using pre-trained language models.
# Load example query and source documents
query_doc = Document("../data/hieronymus_samples.csv")
source_doc = Document("../data/vergil_samples.csv")
# Load the pipeline with pre-trained models
pipeline = ClassificationPipelineWithCandidategeneration(
classification_name="...",
embedding_model_name="...",
device="cpu",
)
# Run the pipeline with the query and source documents
results = pipeline.run(
query=query_doc, # Query document
source=source_doc, # Source document
top_k=3 # Number of top similar candidates to classify
)
pretty_print(results)LociSimiles provides a command-line tool for running the pipeline directly from the terminal:
locisimiles query.csv source.csv -o results.csvlocisimiles query.csv source.csv -o results.csv \
--classification-model julian-schelb/PhilBerta-class-latin-intertext-v1 \
--embedding-model julian-schelb/SPhilBerta-emb-lat-intertext-v1 \
--top-k 20 \
--threshold 0.7 \
--device cuda \
--verbose-
Input/Output:
query: Path to query document CSV file (columns:seg_id,text)source: Path to source document CSV file (columns:seg_id,text)-o, --output: Path to output CSV file for results (required)
-
Models:
--classification-model: HuggingFace model for classification (default: PhilBerta-class-latin-intertext-v1)--embedding-model: HuggingFace model for embeddings (default: SPhilBerta-emb-lat-intertext-v1)
-
Pipeline Parameters:
-k, --top-k: Number of top candidates to retrieve per query segment (default: 10)-t, --threshold: Classification probability threshold for filtering results (default: 0.5)
-
Device:
--device: Chooseauto,cuda,mps, orcpu(default: auto-detect)
-
Other:
-v, --verbose: Enable detailed progress output-h, --help: Show help message
The CLI saves results to a CSV file with the following columns:
query_id: Query segment identifierquery_text: Query text contentsource_id: Source segment identifiersource_text: Source text contentsimilarity: Cosine similarity score (0-1)probability: Classification confidence (0-1)above_threshold: "Yes" if probability ≥ threshold, otherwise "No"
Install the optional GUI extra to experiment with a minimal Gradio front end:
pip install locisimiles[gui]Launch the interface from the command line:
locisimiles-gui