AutoGluon: An Open Source AutoML Library

AutoGluon is an open-source library for AutoML. It automates the process of applying machine learning to real-world problems, abstracting away low-level details and manual training steps such as model selection, hyperparameter tuning, and ensembling.

AutoGluon1 — Key Features of AutoGluon: AutoML Library

Key Features of AutoGluon

1. Ease of Use: AutoGluon is designed to abstract the complexities of the machine learning pipeline by providing a high-level interface. This abstraction facilitates rapid prototyping and experimentation while reducing the cognitive load on the user.

2. AutoML for Multiple Tasks: The framework is equipped with task-specific modules for automated learning. These include tabular data modeling (structured datasets), image recognition (using convolutional architectures), and natural language processing (via embeddings and sequence models).

3. Hyperparameter Optimization: The process of hyperparameter tuning in AutoGluon utilizes search strategies grounded in optimization theory. Techniques like random search and Bayesian optimization are employed to find configurations that minimize validation loss.

4. Time and Resource Constraints: AutoGluon incorporates mechanisms for bounded optimization where the training process adheres to user-defined time budgets or hardware limitations.

5. Model Interpretability: Interpretability tools are grounded in information theory and model-agnostic explanation frameworks.

6. Leaderboard and Evaluation: The evaluation phase is guided by empirical risk minimization, where models are scored based on their generalization performance over a holdout dataset. Leaderboards rank models by objective metrics such as accuracy, log-loss, or F1-score.

Installation of AutoGluon

First Install the library using the following command:

!pip install autogluon

Now, import the specific functionality you wish to use in the AutoGluon library. For Example: We can use the following command to use a Predictor function.

from autogluon.tabular import TabularPredictor

Workflow of AutoML using AutoGluon

The AutoML workflow in AutoGluon is designed to simplify the machine learning pipeline by automating repetitive and complex steps. The major stages in this workflow include:

Data Preparation and Input: AutoGluon begins with a tabular, image, or text dataset as input. It expects the data to be in a structured form (like a Pandas DataFrame for tabular data). The input data includes the features (independent variables) and the target (dependent variable) for supervised learning.
Automatic Data Preprocessing: AutoGluon automatically handles missing values, encodes categorical variables, scales numerical data, and applies text preprocessing when needed. This reduces the effort needed for manual data wrangling.
Model Selection and Training: It uses a variety of models such as LightGBM, CatBoost, XGBoost, Random Forest, Neural Networks, and ensembles. AutoGluon tests multiple models with different hyperparameters and chooses the best-performing ones. All models are trained in parallel using efficient resource management.
Hyperparameter Optimization: AutoGluon applies multi-fidelity hyperparameter tuning using random search, Bayesian optimization, or scheduling techniques like Hyperband. This improves the model’s performance without excessive computational costs.
Ensembling and Stacking: AutoGluon performs model stacking (multi-layered ensembling) where the outputs of one model feed into another. This improves predictive accuracy by combining the strengths of different models.
Evaluation: After training, it evaluates the models using metrics like accuracy, log loss, ROC-AUC, or custom-defined metrics. It then generates a leaderboard showing the performance of each model.
Prediction and Deployment: The trained model can be used to make predictions on new, unseen data. The final model (often an ensemble) is saved and can be deployed as part of a production pipeline.

Functions of AutoGluon

AutoApp — Use Cases for AutoGluon Library

AutoGluon provides various functional modules, each tailored to a specific type of machine learning task:

1. Tabular Prediction

Handles structured datasets with rows and columns.
Supports classification, regression, and quantile prediction.
Automates preprocessing, feature engineering, model selection, and training.
Function: TabularPredictor()

2. Image Prediction

Supports image classification.
Leverages pretrained deep learning models (ResNet, EfficientNet, etc.).
Applies automatic augmentation and transfer learning.
Function: ImagePredictor()

3. Text Prediction (NLP)

Used for text classification and regression tasks.
Based on pretrained transformer models.
Performs tokenization, fine-tuning, and automated optimization.
Function: TextPredictor()

4. Multimodal Prediction

Handles datasets that contain a mix of text, image, and tabular features.
Unified interface for multiple data modalities.
Function: MultiModalPredictor()

5. Time Series Forecasting

Handles univariate and multivariate time-series prediction.
Supports lag features, seasonal decomposition, and forecasting horizon customization.
Function: TimeSeriesPredictor()

6. Hyperparameter Optimization

Tuning of model configurations automatically.
Integrated with ray, scikit-optimize, and hyperband.
Allows both random and guided search strategies.

7. Model Export and Deployment

Export trained models for inference (.predict() and .save() functions).
Support for real-time inference pipelines.

Implementation on Tabular Data using AutoGluon

ArchitectureAuto — AutoGluon Architecture for Tabular Data

1. Install and Import all dependencies

First, install the AutoGluon library using the pip command.

!pip install autogluon

Now, import all necessary libraries and dependencies.

Python

import pandas as pd
from autogluon.tabular import TabularPredictor
from sklearn.model_selection import train_test_split

2. Load, Pre-process, and Split data

Now, load your dataset and apply data pre-processing on it. AutoGluon handles missing values and type inference automatically. You can split into train/test sets now. You can download dataset from here.

Python

df = pd.read_csv('WA_Fn-UseC_-Telco-Customer-Churn.csv')
df.drop('customerID', axis=1, inplace=True)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df['TotalCharges'] = df['TotalCharges'].fillna(df['TotalCharges'].median())
df['Churn'] = df['Churn'].map({'Yes': 1, 'No': 0})
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

3. AutoML Configuration and Model Training

In this step, we use Tabular Predictor to initialize prediction. We can then call the fit function to start training using multiple models.

Python

label = 'Churn'
predictor = TabularPredictor(
    label=label,
    eval_metric='balanced_accuracy',
    problem_type='binary',
    path='AutoML_Churn_Model'
).fit(
    train_data=train_data,
    time_limit=600,
    presets='best_quality', 
    num_bag_folds=5, 
    num_stack_levels=2, 
    hyperparameters={
        'GBM': {},  
        'CAT': {'iterations': 1000},
        'XGB': {'n_estimators': 1000},
        'NN_TORCH': {},
        'RF': {},
        'XT': {},
        'KNN': {},
    }
)

Output

AutogluonEval — Model Training and AutoML Configuration

4. View and Analyze Leaderboard

This part of implementation helps to see model rankings. It shows metrics like accuracy, F1, training time, etc. and Helps identify top-performing models based on chosen metrics.

Python

leaderboard = predictor.leaderboard(test_data, silent=True)
print("\nLeaderboard: \n", leaderboard)

Output

In the above output, we can analyze the model rankings and proceed accordingly.

5. Feature Importance Visualization

This plot returns importance scores for each input feature. It is useful for understanding which features drive predictions.

Python

print("\nFeature Importance: ")
print(predictor.feature_importance(test_data))

Output

Autogluon2 — Visualizing Feature Importance

6. Evaluate the best model performance

Metrics include accuracy, RMSE, log-loss depending on task type.

Python

preds = predictor.predict(test_data.drop(columns=[label]))
probs = predictor.predict_proba(test_data.drop(columns=[label]))[1]
performance = predictor.evaluate(test_data)
print("Metrics: ")
print(performance)

Output

Metrics: {'balanced_accuracy': np.float64(0.782238864678543), 'accuracy': 0.7693399574166075, 'mcc': np.float64(0.5095293371050179), 'roc_auc': np.float64(0.8638685602492573), 'f1': 0.6501614639397201, 'precision': 0.5431654676258992, 'recall': 0.8096514745308311}

7. Save the model

We can now save the model. The saved model includes all necessary training info. It can be reloaded and used further.

Python

predictor.save() 
loaded_predictor = TabularPredictor.load("AutoML_Churn_Model")

Output

TabularPredictor saved. To load, use: predictor = TabularPredictor.load("/content/AutoML_Churn_Model")

You can download source code from here.

Advantages of AutoGluon

Lets user focus on data by hiding implementation details.
Auto-detects task type, which is, classification or regression.
Often has good performance models.
Works well on CPUs and GPUs.
Adds reliability to metrics by various evaluation techniques like cross validation.

AutoGluon: An Open Source AutoML Library

Key Features of AutoGluon

Installation of AutoGluon

Workflow of AutoML using AutoGluon

Functions of AutoGluon

Implementation on Tabular Data using AutoGluon

1. Install and Import all dependencies

2. Load, Pre-process, and Split data

3. AutoML Configuration and Model Training

4. View and Analyze Leaderboard

5. Feature Importance Visualization

6. Evaluate the best model performance

7. Save the model

Advantages of AutoGluon

Explore