Skip to content

gyhdog99/MoCLE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MoCLE

arXiv arXiv HF

This repository contains the implementation of the paper:

MoCLE: Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
Yunhao Gou*, Zhili Liu*, Kai Chen*, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang†
*Equal contribution †Corresponding Author
Arxiv preprint, 2023

drawing

Installation

  1. Install LAVIS to the current directory, the primary codebase on which MoCLE is built.

    conda create -n lavis python=3.8
    conda activate lavis
    git clone https://github.com/salesforce/LAVIS.git
    cd LAVIS
    pip install -e .
  2. Clone the repository of MoCLE.

    git clone https://github.com/gyhdog99/mocle.git
  3. Build our modified PEFT package.

    cd mocle
    cd peft-main
    pip install -e .
  4. Copy mocle.py and mocle.yaml in this repository into the LAVIS directory following the architecture below:

    cd ../
    cp mocle.py ../lavis/models/blip2_models
    cp mocle.yaml ../lavis/configs/models/blip2
  5. Modify ../lavis/models/__init__.py in LAVIS as follows:

    • Add from lavis.models.blip2_models.mocle import MoCLE in the beginning of the file.
    • Add "MoCLE" to __all__ = [...,...].

Prepare Models

  1. MoCLE is based on Vicuna-7B-v1.1. Download the corresponding LLM checkpoint here.

  2. Set the llm_model argument in ../lavis/configs/mocle.yaml to the local path towards the downloaded Vicuna checkpoint.

  3. Download the pre-trained checkpoint of MoCLE.

    # Clusters Temperature Main Model Clustering Model
    16 0.05 c16_t005 c16
    64 0.05 c64_t005 c64
    64 0.10 c64_t010 c64
  4. Set finetuned and kmeans_ckpt in ../lavis/configs/mocle.yaml to the weights of the downloaded main model and clustering model, respectively. (Please adjust the total_tasks and gates_tmp parameters as # Clusters and Temperature accordingly).

Model Inference

  1. Load an image locally

    import torch
    from PIL import Image
    # setup device to use
    device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
    # load sample image
    raw_image = Image.open(".../path_to_images/").convert("RGB")
  2. Load the models

    from lavis.models import load_model_and_preprocess
    # loads MoCLE model
    model, vis_processors, _ = load_model_and_preprocess(name="mocle", model_type="mocle", is_eval=True, device=device)
    # prepare the image
    image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
  3. Generate

    response = model.generate({"image": image, "prompt": ["Your query about this image"]})
    print(response)

Model Training

Coming soon.

Acknowledgement

  • LAVIS: Implementations of our MoCLE are built upon LAVIS.
  • PEFT: Implementations of our Mixture of LoRA experts are based on PEFT.

Citation

If you're using MoCLE in your research or applications, please cite using this BibTeX:

@article{gou2023mixture,
  title={Mixture of cluster-conditional lora experts for vision-language instruction tuning},
  author={Gou, Yunhao and Liu, Zhili and Chen, Kai and Hong, Lanqing and Xu, Hang and Li, Aoxue and Yeung, Dit-Yan and Kwok, James T and Zhang, Yu},
  journal={arXiv preprint arXiv:2312.12379},
  year={2023}
}

About

MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •