0% found this document useful (0 votes)
45 views22 pages

Beyond Fixed Taxonomies - Zero-Shot Classification and Automated Category Consolidation - by Aimen Louafi - Inside Doctrine - Nov, 2024 - Medium

Uploaded by

陳賢明
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views22 pages

Beyond Fixed Taxonomies - Zero-Shot Classification and Automated Category Consolidation - by Aimen Louafi - Inside Doctrine - Nov, 2024 - Medium

Uploaded by

陳賢明
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Get unlimited access to the best of Medium for less than $1/week. Become a member

Open in app

Beyond FixedSearch
Taxonomies: Zero-shot 44

Classification and Automated Category


Consolidation
Aimen Louafi · Follow
Published in Inside Doctrine
11 min read · 2 days ago

Listen Share More

by Aïmen Louafi and Julien Perrin

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 1/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

In today’s corporate landscape, the analysis of annual/shareholders general


meetings minutes (referred to as AGM) represents a critical yet complex task in
corporate governance and legal compliance. These documents, often containing
hundreds of resolutions across multiple meetings and companies, hold valuable
insights into corporate decision-making, governance practices, and strategic
directions. However, extracting and categorizing this information systematically has
remained a significant challenge — until now.

This blog post presents a systematic approach to analyzing Annual General Meeting
(AGM) resolutions through three key stages. We begin by examining the context and
inherent challenges of processing thousands of diverse legal documents. We then
explore automated classification techniques for categorizing resolutions. Finally, we
address the critical challenge of consolidating similar categories while preserving
important legal distinctions, testing and comparing various approaches to find an
optimal solution.

Understanding the Context


Annual general meeting minutes (AGMs) contain various types of resolutions, from
routine matters like dividend approvals to complex strategic decisions.

They serve multiple purposes:

Legal documentation of corporate decisions

Historical record of governance practices

Reference material for future corporate actions or for drafting

Source of insights for corporate governance research

Here’s a non exhaustive list of potential resolutions that can be found:

Approval of stock option plans for employees

Approval of financial statements

Appointment or reappointment of directors

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 2/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

The ability to effectively classify and analyze resolutions within these documents
can unlock valuable insights for legal professionals, corporate governance
researchers, and compliance officers. Traditional approaches have relied heavily on
manual classification using predefined taxonomies, but this method has proven
increasingly inadequate in the face of evolving corporate practices and the growing
volume of documents.

A Multi-faceted Challenge
1. Volume and Variety
Managing thousands of documents across various companies introduces a
significant challenge. These documents come with diverse writing styles,
formatting, and varying levels of detail and complexity. Additionally, the process of
digitizing physical documents often results in OCR-related errors, further
complicating the task.

2. Categorical Complexity
Classifying these documents presents its own set of obstacles. Categories frequently
overlap — for instance, a single clause might pertain to both financial decisions and
risk management. Moreover, interpretations are often context-dependent, with
corporate practices constantly evolving, resulting in new and emerging clause
types. Variations also arise based on region, industry, or company type, and some
documents don’t explicitly mention categories, ruling out a straightforward
extractive task. Adding to the complexity, the potential number of categories
remains unknown and can fluctuate over time.

This complexity — stemming from the vast volume, variety, and categorical
ambiguity — makes traditional rule-based or classification tasks approaches
inadequate. The overlapping categories, context-dependent interpretations, and
evolving nature of corporate practices demand a more flexible and adaptive
solution. This is where Large Language Models (LLMs) come in. Their ability to
understand nuanced language, generalize across varied contexts, and learn from
vast amounts of unstructured data makes them uniquely suited to tackle these
challenges, enabling more accurate classification and insight extraction from
diverse documents.

Advantages of This Approach


Flexibility: Eliminates the need for maintaining or updating a rigid category
system.

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 3/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Adaptability: Capable of handling new types of resolutions as corporate


governance practices evolve.

Nuanced Understanding: Accurately distinguishes subtle differences between


similar resolutions.

Scalability: Efficiently processes large volumes of resolutions at scale.

Error Correction: Mitigates errors introduced during OCR processing.

Models Evaluated
We conducted an extensive experiment analyzing 1,000,000 AGM resolution clauses
using various LLM models. Our approach utilized a carefully crafted prompt
designed to extract decision types while maintaining consistency and
generalization.

We evaluated various models, both closed ones and open sources one. For open
sourced ones, we leveraged vLLM with run_batch to optimize the throughput and
cost.

We tested three different models, each with distinct characteristics, matching our
costs constraints:

Models tested

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 4/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Sample outputs for the three models

Key Findings
After analyzing the results, GPT-4o-mini emerged as the optimal choice for several
reasons:

Category Consolidation: The model showed superior ability to group similar


decisions under consistent categories, resulting in fewer overall category variations
compared to other models. On the contrary, Mistral-7B yielded very precise
categories, but there were many duplicate ones. LLAMA3–7B yielded more outright
wrong categories.

Discriminative Power: Despite its tendency to consolidate, it maintained excellent


discrimination ability for crucial details, particularly in:

Bylaws modifications

Capital increase procedures

Cost-Effectiveness: While slightly more expensive per token, its better


categorization accuracy and consistency provided better value for the investment.

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 5/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Surprisingly, although it’s a close sourced model, it is on par with serving and
batching requests on open-sources models.

The Problem of Category Consolidation


One of the most significant challenges in using LLMs for zero-shot classification is
managing the diversity of generated categories. After extracting the categories with
an LLM, we ended with a lot of duplicate categories, like :

“Director Appointment”

“Director Nomination”

“New Executive Leadership Selection”

This is because the models might:

Create semantically similar but differently named categories

Generate categories at different levels of abstraction

Produce overlapping or nested classifications

Suggest context-specific categories that need broader alignment

These categories, while technically correct, require post-processing consolidation to


create a coherent and useful taxonomy.

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 6/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

The top 10 most frequent categories we get with the raw LLM output, many duplicates “Délégation de
pouvoir”

Exploring Different Deduplication Approaches


K-Means Clustering
Our first approach utilized K-Means clustering on sentence embeddings generated
from the category names. This method groups categories based on their semantic
similarity in the embedding space.

We explored various embeddings from the MTEB Leaderboard in French (a popular


embedding leaderboard): mostly gte-Qwen2–1.5B-instruct, KaLM-embedding-
multilingual-mini-v1, bilingual-embedding-base.

We processed the 200,000 distinct categories generated by the model by computing


embeddings for each category. Since the embeddings were trained using cosine
similarity but K-means clustering operates on Euclidean distance, we normalized
the vectors using L2 normalization before applying K-means.

However, remember that we have no idea what is the optimal number of clusters
(and there could be new ones emerging). Hence, we need to dynamically compute
https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 7/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

the optimal K number of clusters.

For this, the scientific litterature’s consensus seems to be that the elbow method is a
terrible criteria. So we experimented with using:

Silhouette Score

For a point i :

where:

a(i) = average distance between point i and all other points in its cluster

b(i) = min {average distance between i and all points in other clusters}

Bayesian Information Criterion (BIC)

where:

L = likelihood of the data

k = number of free parameters = K(d + 1)

n = number of data points

K = number of clusters

d = number of dimensions

Akaike Information Criterion (AIC)


https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 8/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

where:

k = number of free parameters = K(d + 1)

L = likelihood of the data

K = number of clusters

d = number of dimensions

Indeed, we have very different optimal number of clusters for elbow and Silhouette (same X-axis scale)

With the three metrics and all embeddings vectors, we ended up with a very high
number of optimal clusters (around 40,000).

Some of the centroids we get from clustering (K=20000)

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 9/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

We encountered several significant challenges:

1. Computational Performance: The high dimensionality of the embedding


vectors combined with the large number of data points made K-means
clustering computationally intensive.

2. Curse of Dimensionality: K-means clustering is known to perform poorly with


high-dimensional data, a phenomenon known as the curse of dimensionality.

3. Cluster Quality: Manual inspection of the clusters revealed mixed results. While
some clusters were coherent, others contained overly diverse elements. This
inconsistency stems from K-means’ inability to incorporate custom thresholds
for precision and recall (most clusters had at least one false positive).

4. Business vs. Statistical Optimization: A crucial insight emerged: the statistically


optimal number of clusters often doesn’t align with business requirements. For
example, one cluster contained resolutions referencing different legal articles
(such as “Modifying XX regarding article R-111 of the commerce code” and
“Modifying XX regarding article R-99 of the commerce code”). While these items
are semantically similar, from a business perspective, they should remain
distinct.

This experience highlighted the importance of balancing mathematical


optimization with domain-specific requirements when designing clustering
solutions.

Latent Dirichlet Allocation (LDA)


While semantically similar sentences can have very different meanings, dense
representations make it hard to extract clear business insights since they aggregate
information into non-human-readable formats. This led us to adopt Latent Dirichlet
Allocation (LDA), a statistical approach that analyzes documents by finding both
word distributions across topics and topic distributions within documents. LDA
effectively separates distinct topics while grouping related words, making the
results easy to interpret through lists of the most probable words per topic.

Our implementation starts with text preprocessing, followed by vocabulary


computation and training of the LDA model.

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 10/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

# Preprocessing
nlp = spacy.load("fr_core_news_sm", disable=["ner"])
docs = nlp.pipe(categories, n_process=-1)
cleaned_categories = [clean_text(doc=doc) for doc in docs]

# Vectorize the text


vectorizer = CountVectorizer(
ngram_range=(1, 1),
stop_words=stop_words,
max_features=10000,
)
X = vectorizer.fit_transform(cleaned_categories)
# Fit the LDA model
lda = LatentDirichletAllocation(n_components=n_topics)
Y = lda.fit_transform(X)

Clusters look quite promising:

Topic #0: [‘société’, ‘siège’, ‘social’, ‘modification’, ‘adresse’, ‘statut’] Modification de


l’emplacement du siège social dans les statuts de la Société. Modification de l’adresse du
siège social

Topic #1: [‘société’, ‘commissaire’, ‘compte’, ‘nomination’, ‘exercice’, ‘mandat’]


Nomination d’une société Commissaire aux Comptes Titulaire. Nomination de la société
commissaire aux comptes pour un mandat de six exercices.

Topic #2: [‘réserver’, ‘salarier’, ‘capital’, ‘décision’, ‘augmentation’, ‘social’] Décision sur
une augmentation de capital réservée aux salariés et accordant un délai à la présidence
pour mettre en place un plan d’épargne entreprise. Délégation à la présidence pour
effectuer une augmentation du capital social réservée aux salariés adhérents d’un plan
d’épargne entreprise.

However, we faced many challenges:

The curse of dimensionality presents a significant challenge for LDA,


particularly due to our limited vocabulary derived from short, domain-specific
sentences. We observed notable performance degradation when scaling beyond
approximately 500 topics — a critical limitation given our need to categorize
diverse clauses into precise categories (bear in mind that the optimal number of
clusters in the previous method was 40,000).

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 11/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Statistical models are highly sensitive to the vocabulary on which they are
based, making careful, extensive analysis essential for optimal performance.
Currently, non-discriminating words frequently appear and are sometimes
plagued by ambiguities, as seen in cases where critical terms like ‘non’ are
missing, leading to opposing meanings.

Topic: [‘réduction’, ‘capital’, ‘social’, ‘perte’, ‘motiver’, ‘non’] Décision de réduction du


capital social non motivée par des pertes avec conditions suspensives. Réduction du capital
social par réabsorption de pertes.

Because topic compatibility with categories is represented by a probability


distribution, assigning a single topic to each category is challenging. A category
may be associated with multiple topics, complicating our goal of deduplication.

Some topics are not relevant business-wise:

Topic : [‘alsace’, ‘triperies’, ‘reunies’, ‘boyauderies’, ‘approbation’, ‘société’] Dissolution


sans liquidation de la société TRIPERIES BOYAUDERIES REUNIES D’ALSACE, suite à
une fusion par voie d’absorption, sans augmentation de capital. Approbation de l’adhésion
de la SEM NovaRhéna au GIE EPL SUD ALSACE.

Although LDA is heavily penalized by the problem’s dimensionality, limiting its


ability to meet our objective, it could still serve as a valuable tool for extracting high-
level concepts within our categories. This could help users identify related
categories or enhance our search engine in future iterations.

Paraphrase Detection
Our analysis revealed that at its core, this was fundamentally a paraphrase
detection challenge rather than a general semantic similarity problem. This
realization was crucial because category consolidation should only occur when two
descriptions are true paraphrases of each other, not merely when they share
semantic proximity.

In the legal domain, this distinction is particularly critical — terms that appear
semantically similar may carry significantly different legal implications. We
therefore prioritized precision over recall in our approach, recognizing that false
positives in deduplication (incorrectly merging distinct categories) could be more
problematic than false negatives (failing to merge true duplicates). This conservative

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 12/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

approach helps preserve the nuanced distinctions in legal terminology, where slight
variations in phrasing often reflect meaningful differences in corporate governance.

For paraphrase detection, we’re back with the MTEB benchmark, but in the Pair
Classification leaderboard. We chose
paraphrase-multilingual-MiniLM-L12-v2, paraphrase-multilingual-mpnet-base-v2,
sentence_croissant_alpha_v0.2 as our candidate models.

However, computing all the cosine similarity pairs can be too time consuming: as
this has a quadratic runtime, it fails to scale to large (10,000 and more) collections of
sentences. Instead, we leveraged the paraphrase_mining module from
SentenceTransformers, which is optimized for this task through chunking.

In the end, we fetch pairs of similar sentences with a similarity score. Using these
scores, we build a graph where edges connect categories exceeding a carefully
calibrated similarity score threshold. Categories within each connected component
of this graph are then consolidated into clusters, with the most frequently occurring
description selected as the representative category.

import networkx as nx
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import paraphrase_mining

sentences = [...]
model = SentenceTransformer("all-MiniLM-L6-v2")
paraphrases = paraphrase_mining(model, sentences)
G = nx.Graph()
# Adding all similarity pairs with score >= threshold
for score, i, j in paraphrases:
if score >= THRESHOLD:
G.add_edge(i, j, weight=score)
clusters = list(nx.connected_components(G))

To ensure accuracy, we conducted a rigorous manual evaluation of the paraphrase


detection results on a representative subset of categories. This evaluation enabled
us to identify an optimal similarity threshold that maximizes precision — ensuring
reliable category consolidation while preventing the merging of legally distinct
categories. The threshold was specifically tuned to balance the competing needs of

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 13/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

maintaining legal precision and achieving meaningful consolidation of truly


duplicate categories.

Some similar sentence retrieved

The paraphrase-multilingual-mpnet-base-v2 model proved superior for our use case,


as it successfully preserved crucial legal distinctions in resolution texts. Unlike the
other tested models, it correctly distinguished between resolutions referencing
different legal articles — a critical requirement for legal document analysis.

With this approach, we successfully consolidated our initial category set by


reducing its size by 35%, limiting redundancy while maintaining the integrity of
legally distinct categories. This consolidation dramatically reduces manual
reconciliation time, improves accuracy in compliance monitoring, simplifies the
search experience and enables automated cross-subsidiary analysis that was
previously impractical.

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 14/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

New top 10 resolutions categories after our deduplication process

Final thoughts
The integration of zero-shot classification with intelligent category deduplication
represents a significant step forward in legal document analysis. As LLMs continue
to evolve, we anticipate further improvements in both accuracy and efficiency.
However, the key lesson from our work remains clear: successful application of AI
in specialized domains requires careful attention to domain-specific requirements
and constraints.

The next steps involve exploring alternative clustering methods, such as


hierarchical clustering, to better group similar categories. Additionally, fine-tuning
the LLM for this specific task will enhance its performance. We also aim to leverage
the LLM directly in the deduplication process, allowing for more accurate and
efficient identification of redundant categories.

Engineering Llm Clustering Lda NLP

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 15/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Follow

Written by Aimen Louafi


0 Followers · Writer for Inside Doctrine

More from Aimen Louafi and Inside Doctrine

Aimen Louafi in Inside Doctrine

Comprehensive Analysis of OCR Solutions for High-Volume French


Documents Processing: Performance…
Imagine a world where navigating through volumes of detailed corporate documents is as easy
as a simple keyword search, or combing through…

Apr 24 301

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 16/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Ben Riou in Inside Doctrine

Upgrading EKS to 1.25


TL; DR: We just completed our Kubernetes Upgrade Campaign on EKS to 1.25. If you want to
reproduce it at home (or at work…), here are a few…

Feb 28, 2023 36 2

Philippe Chadenier in Inside Doctrine

Indexing and aggregating lawyers blogs posts at scale


How we handle thousands of lawyers’ blogs using the News-Please library

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 17/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Oct 9 185

Jérémie Uzan in Inside Doctrine

How We Published Our Design System: A Journey into Continuous


Integration
At Doctrine, our path to publishing our Design System has been an adventure full of lessons. It
wasn’t a decision we made overnight, but…

Oct 22 139

See all from Aimen Louafi

See all from Inside Doctrine

Recommended from Medium

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 18/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Deepak in Top Python Libraries

Building a Python Web Scraper with Data Analysis, Visualization, and


Automation
In today’s data-driven world, the ability to gather, analyze, and present real-time data is
invaluable. This project will guide you through…

6d ago 150 2

León Andrés M. in The Quantastic Journal

AI on the Podium: Revolution or Mistake in Stockholm?


About the controversy of the 2024 Nobel Prize in Physics

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 19/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

5d ago 468 4

Lists

Natural Language Processing


1798 stories · 1408 saves

The New Chatbots: ChatGPT, Bard, and Beyond


12 stories · 496 saves

Leadership
61 stories · 478 saves

Leadership upgrades
7 stories · 109 saves

Tomaz Bratanic in Towards Data Science

Building Knowledge Graphs with LLM Graph Transformer


A deep dive into LangChain’s implementation of graph construction with LLMs

2d ago 364 3

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 20/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Abdur Rahman in Stackademic

Python is No More The King of Data Science


5 Reasons Why Python is Losing Its Crown

Oct 23 3.1K 19

Gabriel Melo in Qantev

Fine-Tuning Donut Transformer For Document Classification


Document classification is a machine learning problem in which, given a document file as input,
one receives its class as output. This task…

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 21/22
2024/11/7 晚上11:04 Beyond Fixed Taxonomies: Zero-shot Classification and Automated Category Consolidation | by Aimen Louafi | Inside Doctr…

Aug 8 189 1

Datadrifters

LitServe: FastAPI on Steroids for Serving AI Models — Tutorial with Llama


3.2 Vision
I recently tried an open-source gem called LitServe, no more wrestling with serving AI models.

3d ago 191

See more recommendations

https://medium.com/doctrine/beyond-fixed-taxonomies-zero-shot-classification-and-automated-category-consolidation-06517c1319ae 22/22

You might also like