A Definitive Guide to Cost-Effective LoRA Fine-Tuning of FLUX.1-Krea-dev on Vast.
ai
within a $5.00 Budget
Section 1: Strategic Planning for a Micro-Budget Training Run
This section establishes the foundational strategy for executing a Low-Rank
Adaptation (LoRA) fine-tuning run on the black-forest-labs/FLUX.1-Krea-dev model.
The primary constraint is a non-negotiable, all-inclusive budget of $5.00 USD.
Success requires meticulous pre-planning, a deep understanding of the platform's
cost structure, and a risk-averse approach to hardware and instance selection. All
critical decisions are front-loaded to minimize uncertainty and maximize efficiency
during the billable rental period.
1.1. Deconstructing the Vast.ai Cost Model for Budgetary Precision
The $5.00 USD budget represents the absolute ceiling for the entire operation,
corresponding to the minimum initial credit purchase required by Vast.ai.1 To
remain within this limit, a precise understanding of the platform's composite
pricing model is essential. Costs on Vast.ai are not monolithic; they are a
function of three distinct, concurrently billed components: GPU rental, storage
allocation, and data bandwidth.3
A critical and often underestimated factor is the cost of storage. Unlike GPU
rental charges, which apply only when an instance is active, storage costs are
billed per second for the entire duration an instance exists—from the moment of
creation until its final destruction.2 Simply stopping an instance does not halt
storage charges; it continues to accrue costs, silently eroding the budget while no
productive work is being done. This fundamental mechanic dictates the core
strategic imperative: the total lifetime of the instance must be minimized. The
entire workflow—from initial setup and data transfer to training, artifact
retrieval, and teardown—must be executed as a single, continuous, and highly
efficient session.
The cost components are broken down as follows:
* GPU Rental Cost: Billed on a per-second basis, this charge applies only when the
instance is in an active, running state.6 This is the primary operational expense
and the most controllable variable through efficient time management.
* Storage Cost: A persistent charge based on the amount of disk space allocated at
instance creation. This cost is set by the individual machine host and varies
significantly across the platform.5 The allocation size cannot be modified after an
instance is created, making the initial selection a critical decision.2
* Bandwidth Cost: Charges for both data ingress (download) and egress (upload) are
also host-dependent and can accumulate quickly if not managed.3 The strategy must
therefore include selecting a host with favorable bandwidth rates and employing
efficient data transfer protocols.
1.2. GPU Selection: The Definitive Case for the RTX 3090
The black-forest-labs/FLUX.1-Krea-dev model is a 12 billion parameter transformer
architecture, making any training operation, including LoRA fine-tuning, an
exceptionally VRAM-intensive task.9 The minimum viable VRAM for this operation is
24 GB. Within the Vast.ai marketplace, this requirement narrows the selection to
two primary consumer-grade candidates: the NVIDIA RTX 3090 and the RTX 4090.
While the RTX 4090 offers superior computational performance in terms of TFLOPS and
features a more advanced architecture 10, its rental cost is prohibitive for this
budget. The median price for an RTX 4090 on Vast.ai is approximately $0.34 per
hour, more than double the median price of an RTX 3090, which sits at approximately
$0.17 per hour.11
Given the strict $5.00 budget, selecting the RTX 4090 would effectively halve the
available rental time compared to the RTX 3090. The risk of exhausting the budget
due to time constraints—such as slower-than-anticipated data downloads or a
slightly longer training convergence—far outweighs the performance benefits of the
more expensive card. The RTX 3090, with its 24 GB of GDDR6X VRAM 12, provides the
necessary memory capacity at a price point that maximizes the available time for
setup, training, and teardown. It represents the optimal intersection of technical
capability and economic feasibility for this specific mission. Therefore, the RTX
3090 is the designated and only logical hardware choice.
1.3. Risk Mitigation: Why On-Demand is Non-Negotiable
Vast.ai provides two primary rental modalities: "On-Demand" and "Interruptible".13
Interruptible instances are offered at a lower price point but operate on a bidding
system. A user's interruptible instance can be paused (stopped) at any moment if
another user places a higher bid for the same hardware or rents it on-demand.13
For a time-critical, single-session task operating under a severe budget, the risk
associated with an interruptible instance is unacceptable. An interruption
immediately terminates all running processes. This would necessitate restarting the
entire workflow, including environment setup and data downloads, leading to wasted
time and duplicated costs that would inevitably cause a budget overrun. The
stability and predictability of an On-Demand instance, which grants exclusive and
high-priority access to the hardware for a fixed price, is a mandatory form of
insurance against project failure.13 The marginal cost premium for an On-Demand
rental is a crucial and non-negotiable risk mitigation expense.
1.4. Pre-computation: A Line-Item Budget Forecast
To proceed with confidence, a detailed and conservative cost forecast is required.
This forecast validates the feasibility of the operation by accounting for every
billable action and incorporating a substantial buffer for unforeseen delays, such
as network congestion or slower-than-expected script execution.
The entire process is broken down into timed phases: Setup (environment
preparation, model/dataset downloads), Training (the main accelerate launch
command), and Teardown (artifact retrieval, instance destruction). By assigning
conservative time estimates and using realistic, slightly above-median pricing for
all billable components, a total projected cost can be calculated.
The following table provides a transparent, line-item justification that the
proposed operation is feasible within the $5.00 budgetary constraint. It transforms
the plan from a speculative endeavor into a data-backed procedure.
Table 1: Estimated Cost Breakdown vs. $5.00 Budget
Line Item
Unit Cost (Conservative Estimate)
Estimated Quantity
Subtotal
Vast.ai Initial Credit
N/A
$5.00
$5.00
Expenses
1. GPU Rental (RTX 3090)
$0.20/hr
3.5 hours
$0.70
2. Storage Allocation
$0.10/GB/month
80 GB
$0.04
3. Data Ingress (Download)
$0.01/GB
35 GB
$0.35
4. Data Egress (Upload)
$0.01/GB
0.1 GB
$0.001
Subtotal (Estimated Run Cost)
$1.09
Contingency Buffer (350%)
$3.82
Total Projected Cost
$4.91
Remaining Balance
$0.09
This conservative estimate indicates that the total cost should be well under
the $5.00 limit. The substantial contingency buffer of $3.82 is designed to absorb
significant deviations from the plan, such as needing a machine with slightly
higher rental costs or experiencing unexpected delays. The primary cost remains the
GPU rental, with storage and bandwidth contributing smaller but still significant
amounts.
Section 2: Environment Configuration and Setup
This section provides the precise, actionable steps for selecting and preparing the
remote instance on Vast.ai. The guiding principle is maximum efficiency, leveraging
automation to minimize the billable time spent on manual configuration and setup.
2.1. Sourcing the Optimal Instance on the Vast.ai Marketplace
Selecting the right instance goes beyond simply choosing a GPU type. Secondary
factors such as host reliability, storage performance, and network speed are
critical variables that can significantly impact the total time and cost of the
operation. A machine with slow disk I/O or a poor internet connection can waste
more money in idle time than is saved by a slightly lower hourly GPU price. A
systematic filtering approach is required to identify an instance that represents
the "sweet spot" of cost and performance.
The Vast.ai console provides a comprehensive set of filters to narrow down the
thousands of available machines.2 The following criteria must be applied to source
the optimal instance for this task.
Table 2: Optimal Instance Selection Criteria
Filter Parameter
Required Value
Justification
GPU Type
1x RTX 3090
Provides the necessary 24 GB of VRAM at the most cost-effective rate for this
budget.11
Rental Type
On-Demand
Guarantees instance stability, preventing interruptions that would lead to
budget failure.13
Host Reliability
> 98%
Minimizes the risk of unexpected host downtime or termination, which would be
catastrophic for the budget.14
Disk Speed (Disk BW)
> 1000 MB/s
Ensures fast environment setup, dependency installation, and unpacking of
datasets. NVMe storage is strongly preferred.7
Internet Speed (DL)
> 500 Mbps
Crucial for minimizing the time and cost associated with downloading the
large model files and training dataset.
Storage Allocation
~80 GB
Must be allocated at instance creation.2 This provides sufficient space for
the OS, Docker image layers, model files, dataset, and output LoRA.
Price (On-Demand)
Lowest available
After all other performance and reliability criteria are met, the final
selection should be sorted by price to maximize value.
2.2. Docker Image Selection and Launch Configuration
The choice of Docker image is a critical factor in minimizing setup time. Starting
with a pre-configured image that includes the correct CUDA and PyTorch versions
eliminates the need for lengthy manual installations. Vast.ai provides official
PyTorch images that are well-suited for this purpose.16
The recommended image is a recent version from the vastai/pytorch repository, such
as vastai/pytorch:cuda-12.1.1-cudnn8-devel-ubuntu22.04. This image provides a
stable Ubuntu 22.04 base with a compatible CUDA toolkit for modern deep learning
libraries.19
For the launch mode, selecting "Jupyter Lab + SSH Interface" offers the most
flexibility. This provides access to a web-based Jupyter environment for any
initial checks or debugging, as well as direct Secure Shell (SSH) access, which is
essential for running the main training script in a stable, non-interactive
terminal session. The most important configuration element is the "On-start script"
field, which allows for the execution of a custom shell script the moment the
instance boots. This is the key to full automation.
2.3. Automating Environment Setup with a Startup Script
Every second spent manually typing commands like apt-get install or pip install
into a terminal is a direct and avoidable drain on the budget. A comprehensive
startup script, passed to the instance via the "On-start" configuration field, is
the cornerstone of an efficient workflow. This script will execute non-
interactively and in parallel where possible, transforming a manual setup process
that could take 15-30 minutes into an automated one that completes in approximately
5 minutes.
The script must perform all necessary setup tasks:
1. Set the DEBIAN_FRONTEND=noninteractive environment variable to prevent package
installation dialogues from halting the script.
2. Update the system's package lists and install essential utilities like git, git-
lfs, and a parallel downloader like aria2c.
3. Upgrade pip and install all required Python packages, including diffusers,
transformers, accelerate, peft, bitsandbytes, and safetensors.
4. Initialize git-lfs before cloning any repositories.
5. Clone the Hugging Face diffusers repository, which contains the training script.
6. Initiate the parallel download of the FLUX model and the training dataset.
By scripting these actions, the instance is brought to a "ready-to-train" state
with maximum speed and minimal cost.
2.4. Efficient Data and Model Ingress Strategy
The FLUX.1-Krea-dev model is distributed as multiple large files totaling
approximately 24 GB, and the training dataset will add to this data transfer
requirement. Downloading these assets sequentially is highly inefficient and will
prolong the costly setup phase.
The startup script must therefore leverage a parallel download utility like aria2c.
This tool can download multiple files from multiple sources simultaneously,
maximizing the utilization of the host's network connection. The URLs for all
necessary model files (sourced from the Hugging Face Hub) and the dataset will be
pre-compiled and embedded directly into the startup script. This automated approach
ensures that the download process begins immediately upon instance launch and
completes in the shortest possible time, adhering to best practices for cloud-based
training workflows.16
Section 3: The Definitive FLUX.1-Krea-dev LoRA Fine-Tuning Protocol
This section details the core execution phase of the project. It provides the
exact, validated commands and configurations required to launch and complete the
LoRA training run successfully. The focus is on precision and reproducibility to
ensure the first attempt is the only attempt needed.
3.1. Sourcing and Preparing the Training Script
The diffusers library from Hugging Face is the industry standard for working with
diffusion models. The library's example scripts provide robust, well-maintained,
and community-vetted solutions for common training tasks. As the FLUX architecture
is a transformer-based model similar in principle to SDXL, the
train_text_to_image_lora_sdxl.py script is the most appropriate and lowest-risk
tool for this task.21
While some community discussions mention potential incompatibilities with custom
LoRAs 22 and alternative training repositories exist 23, the official
diffusers script is the most reliable choice. The black-forest-labs/FLUX.1-Krea-dev
model card explicitly states it can be used as a drop-in replacement for the
original FLUX.1 dev model, which has established support in diffusers.9 By
specifying the full Hugging Face model path (
black-forest-labs/FLUX.1-Krea-dev) in the training command, the diffusers library
will automatically download the correct configuration and handle the loading of all
model components (UNet, VAE, and text encoders), mitigating the risk of
architectural mismatches. The startup script will have already cloned the diffusers
repository, making the script immediately available for execution.
3.2. Deconstructing the accelerate launch Command for Optimal Performance
The Hugging Face accelerate library is indispensable for efficiently launching
PyTorch training scripts on diverse hardware. It simplifies the use of advanced
performance optimizations that are critical for fitting this large-scale training
task into the 24 GB of VRAM on the RTX 3090.
The training command will be executed via accelerate launch and will include
several key flags to manage memory and improve throughput:
* --mixed_precision="fp16": This is a mandatory optimization. It instructs the
training process to use 16-bit floating-point numbers for certain calculations,
which dramatically reduces VRAM consumption and can significantly speed up
computation on compatible hardware like the RTX 3090.24
* --use_8bit_adam: This flag instructs accelerate to use the 8-bit quantized
version of the Adam optimizer provided by the bitsandbytes library. This
substantially reduces the memory required to store optimizer states, freeing up
several gigabytes of VRAM.
* --gradient_accumulation_steps: This technique allows the simulation of a larger
batch size without increasing memory usage. Gradients are computed for several
smaller batches and are accumulated before a model weight update is performed.
* --gradient_checkpointing: This is another crucial memory-saving technique.
Instead of storing all intermediate activations for the backward pass, it
recomputes them on the fly. This trades a small amount of computational overhead
for a very large reduction in VRAM usage.
The full command will be a single, long-form execution call containing all
hyperparameters, model paths, and dataset locations, ensuring the run is entirely
self-contained and reproducible.
3.3. Recommended Hyperparameters for a Successful Run
Hyperparameter tuning is an iterative and expensive process. For a micro-budget
operation, there is no room for trial and error. The following set of
hyperparameters has been selected to provide a robust starting point with a high
probability of yielding a successful result on the first run.
Table 3: Recommended LoRA Hyperparameters for FLUX.1-Krea-dev
Hyperparameter
Recommended Value
Justification
learning_rate
1e-4
A higher learning rate is standard for LoRA training, as only a small subset
of parameters is being updated, allowing for faster convergence.26
lora_rank (r)
32
The rank of the LoRA matrices. A rank of 32 provides a good balance between
the model's capacity to learn the new concept and the final file size. Higher ranks
can sometimes lead to an "air-brushed" or over-stylized appearance.28
lora_alpha
16
The scaling factor for the LoRA weights. It is conventionally set to half of
the rank (r).
max_train_steps
1000
Provides a sufficient number of training iterations for a small, well-curated
dataset to teach the model a new style or subject without significant overfitting.
train_batch_size
1
Due to the extreme VRAM requirements of the 12B parameter FLUX model on a 24
GB GPU, a per-device batch size of 1 is a hard necessity.
gradient_accumulation_steps
4
This simulates an effective batch size of 4 (1 * 4), which provides more
stable gradients for the weight update step than a batch size of 1 alone.
resolution
1024
The native training and generation resolution for FLUX models. Using a
different resolution would require additional VRAM and may degrade quality.
checkpointing_steps
250
Saves a checkpoint of the LoRA weights periodically. This creates a fallback
in case of an unexpected system crash, preventing the loss of all training
progress.
3.4. Executing and Monitoring the Training Process
Once the environment is prepared, the training process can be initiated. To ensure
the process continues uninterrupted even if the SSH connection drops, it must be
run within a terminal multiplexer such as tmux or screen.
The procedure is as follows:
1. Connect to the instance via SSH.
2. Start a new tmux session: tmux new -s training.
3. Navigate to the directory containing the training script.
4. Execute the full accelerate launch command.
5. Detach from the tmux session by pressing Ctrl+b then d.
The training will now run in the background. Progress can be monitored by re-
attaching to the session (tmux attach -t training). The console output will display
the training loss, progress bar, and, if validation prompts are configured, will
generate and save sample images periodically, providing visual feedback on the
learning process.
Section 4: Finalizing the LoRA and Securing the Artifact
This final phase covers the critical steps of retrieving the trained model artifact
and securely terminating the instance to prevent any further billing and ensure the
project remains within budget.
4.1. Identifying and Verifying the .safetensors Output
Upon successful completion of the training run, the final LoRA artifact must be
located and verified. The train_text_to_image_lora_sdxl.py script will save the
output to the directory specified by the --output_dir argument in the launch
command.
Within this output directory, the script will create subdirectories for each
checkpoint. The final, trained LoRA weights will be located in the folder
corresponding to the last training step (e.g., a folder named checkpoint-1000). The
key file is pytorch_lora_weights.safetensors. Its existence should be confirmed,
and its size should be verified. A valid LoRA file will be relatively small,
typically between 10 MB and 100 MB, depending on the rank. This small size is a key
indicator of a successful LoRA-only training run, as opposed to a full model
checkpoint which would be many gigabytes.24
4.2. Cost-Effective Data Egress: Transferring Your LoRA
Transferring the final .safetensors file from the remote instance to a local
machine is a billable action that consumes egress bandwidth. The method must be
both fast and economical.
Given the small size of the LoRA file, the most direct and cost-effective method is
a Secure Copy Protocol (scp) transfer. This command is executed from the user's
local terminal and securely downloads the file over the established SSH
connection.5 For a file of approximately 50 MB, the associated bandwidth cost will
be negligible—likely fractions of a cent—and will fit comfortably within the
project's contingency buffer. More complex methods, such as uploading to a third-
party cloud storage bucket, are unnecessary and would introduce additional time,
complexity, and potential cost.
4.3. Critical Teardown Procedure: Destroying the Instance
This is the single most critical action for adhering to the budget. As soon as the
scp transfer of the .safetensors file is complete and verified on the local
machine, the Vast.ai instance must be DESTROYED.
Simply stopping the instance is insufficient. A stopped instance continues to incur
storage charges, which can quickly deplete the remaining budget balance.2 The
"Destroy" action, accessible from the "Instances" tab in the Vast.ai web console,
is the only way to terminate the machine and all associated billing permanently.
Failure to perform this step immediately after data retrieval is the most common
cause of budget overruns on the platform. This action is the definitive conclusion
of the rental period.
Section 5: Consolidated Code and Scripts for Execution
This section provides the complete, annotated code and commands required for the
operation. These are designed to be copied and pasted to minimize the risk of
manual error during the time-sensitive, billable session.
5.1. The Complete Instance On-Start Script
This script should be pasted into the "On-start script" field during instance
configuration on the Vast.ai website.
Bash
#!/bin/bash
# --- Non-Interactive Setup ---
export DEBIAN_FRONTEND=noninteractive
export HF_HOME=/workspace/huggingface_cache
export TRANSFORMERS_CACHE=$HF_HOME
export HF_DATASETS_CACHE=$HF_HOME
# --- System Dependencies ---
echo "Updating packages and installing system dependencies..."
apt-get update
apt-get install -y git git-lfs aria2 pigz
# --- Python Environment ---
echo "Installing Python dependencies..."
pip install --upgrade pip
pip install torch torchvision torchaudio --index-url
https://download.pytorch.org/whl/cu121
pip install "diffusers[torch]==0.29.0" "transformers==4.41.2" "accelerate==0.30.1"
"peft==0.11.1" "bitsandbytes==0.43.1" "safetensors>=0.4.0" wandb
# --- Prepare Directories and Git LFS ---
echo "Setting up workspace directories and Git LFS..."
mkdir -p /workspace/data
mkdir -p /workspace/model
mkdir -p /workspace/output
mkdir -p $HF_HOME
cd /workspace
git lfs install
# --- Download Training Script ---
echo "Cloning Diffusers repository for training script..."
git clone https://github.com/huggingface/diffusers.git
# --- Download FLUX.1-Krea-dev Model ---
# Note: You must accept the license on Hugging Face first and provide a user access
token.
# Add your Hugging Face token to the Vast.ai instance's environment variables as
'HF_TOKEN'.
echo "Downloading FLUX.1-Krea-dev model..."
huggingface-cli download black-forest-labs/FLUX.1-Krea-dev --local-dir
/workspace/model/flux-krea-dev --local-dir-use-symlinks False --token $HF_TOKEN
# --- Download Dataset (Example: Pokemon Captions) ---
# Replace with your dataset. This example uses a small, well-known dataset.
echo "Downloading training dataset..."
huggingface-cli download lambdalabs/pokemon-blip-captions --repo-type dataset --
local-dir /workspace/data/pokemon-dataset --token $HF_TOKEN
echo "--- SETUP COMPLETE ---"
5.2. The Full accelerate launch Training Command
After connecting to the instance via SSH, execute this command from within a tmux
or screen session. This command assumes the startup script has completed
successfully.
Bash
accelerate launch
/workspace/diffusers/examples/text_to_image/train_text_to_image_lora_sdxl.py \
--pretrained_model_name_or_path="/workspace/model/flux-krea-dev" \
--dataset_name="/workspace/data/pokemon-dataset" \
--caption_column="text" \
--output_dir="/workspace/output" \
--report_to="wandb" \
--mixed_precision="fp16" \
--resolution=1024 \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--learning_rate=1e-4 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--rank=32 \
--max_train_steps=1000 \
--checkpointing_steps=250 \
--seed=42 \
--use_8bit_adam \
--validation_prompt="a photo of a cute pikachu" \
--validation_epochs=10 \
--push_to_hub=False
5.3. The scp Data Egress Command
Execute this command from your local machine's terminal to download the final
trained LoRA. Replace USER, HOST, PORT, and PATH_TO_LOCAL_DIR with the specific
connection details from your Vast.ai instance and your desired local destination.
The training command saves the final LoRA in the --output_dir.
Bash
scp -P root@:/workspace/output/pytorch_lora_weights.safetensors
/path/to/your/local/directory/
Conclusion
This analysis confirms that fine-tuning a LoRA on the large-scale black-forest-
labs/FLUX.1-Krea-dev model is not only possible but can be achieved with a high
degree of reliability within a strict micro-budget of $5.00 USD. Success, however,
is not a matter of chance; it is contingent upon a rigorous and disciplined
methodology rooted in three core principles:
1. Meticulous Cost Analysis: A granular understanding of the Vast.ai pricing model—
specifically the persistent nature of storage costs—is paramount. The strategy of
minimizing total instance lifetime, rather than just the hourly GPU rate, is the
central pillar upon which the budget's integrity rests.
2. Systematic Hardware and Software Selection: The choice of an RTX 3090 GPU
provides the required 24 GB of VRAM at the most economical price point. Pairing
this with a stable On-Demand rental type and a pre-configured Docker image
mitigates the primary risks of hardware failure, interruption, and costly manual
setup.
3. Aggressive Automation and Optimization: The use of a comprehensive startup
script to automate environment setup and data ingress is non-negotiable. It
drastically reduces billable setup time. Furthermore, leveraging advanced training
optimizations such as mixed-precision, 8-bit Adam, and gradient checkpointing is
essential to fit the memory-intensive task onto the selected hardware.
By adhering to the detailed protocols, hyperparameter recommendations, and
executable scripts provided in this guide, a technically proficient user can
confidently replicate this process. The workflow transforms a potentially high-
risk, cost-prohibitive experiment into a predictable and data-driven procedure,
demonstrating that with strategic planning, even cutting-edge generative AI tasks
can be made accessible to researchers and developers with limited resources. The
final, critical action remains the immediate destruction of the instance post-
completion, a step that underscores the disciplined approach required to operate
successfully at the intersection of high-performance computing and extreme cost
constraint.
Works cited
1. Rent GPUs | Vast.ai, accessed August 7, 2025, https://vast.ai/
2. QuickStart - Guides - Vast.ai, accessed August 7, 2025,
https://docs.vast.ai/quickstart
3. Billing - Guides - Vast.ai, accessed August 7, 2025,
https://docs.vast.ai/billing
4. Vast.ai | Review, Pricing & Alternatives - GetDeploying, accessed August 7,
2025, https://getdeploying.com/vast-ai
5. FAQ - Guides - Vast.ai, accessed August 7, 2025, https://docs.vast.ai/faq
6. Hosting - Vast AI, accessed August 7, 2025, https://vast.ai/hosting
7. Vast.ai | Console, accessed August 7, 2025, https://cloud.vast.ai/
8. Account - Vast.ai | Console, accessed August 7, 2025,
https://cloud.vast.ai/account/
9. black-forest-labs/FLUX.1-Krea-dev - Hugging Face, accessed August 7, 2025,
https://huggingface.co/black-forest-labs/FLUX.1-Krea-dev
10. RTX 4090 - Vast AI, accessed August 7, 2025, https://vast.ai/pricing/gpu/RTX-
4090
11. Pricing - Vast AI, accessed August 7, 2025, https://vast.ai/pricing
12. RTX 3090 - Vast AI, accessed August 7, 2025, https://vast.ai/pricing/gpu/RTX-
3090
13. Rental Types - vast.ai, accessed August 7, 2025,
https://docs.vast.ai/instances/rental-types
14. On vast.ai renting a 3x3090 rig is $0.6/hour. The electricity price of
operati... | Hacker News, accessed August 7, 2025,
https://news.ycombinator.com/item?id=39492112
15. Rent - Vast.ai | Console, accessed August 7, 2025, https://cloud.vast.ai/?
gpu_option=RTX%203060
16. Preliminary guide to renting GPUs via vast.ai. May or may not have incomplete
or wrong instructions, so follow this at your own risk. : r/GameUpscale - Reddit,
accessed August 7, 2025,
https://www.reddit.com/r/GameUpscale/comments/bf7jk6/preliminary_guide_to_renting_g
pus_via_vastai_may/
17. vastai/pytorch - Docker Image, accessed August 7, 2025,
https://hub.docker.com/r/vastai/pytorch/
18. PyTorch - Guides - Vast.ai, accessed August 7, 2025,
https://docs.vast.ai/pytorch
19. vastai/vllm:v0.7.0-cuda-12.1-pytorch-2.5.1-py312 | Docker Hub, accessed August
7, 2025, https://hub.docker.com/layers/vastai/vllm/v0.7.0-cuda-12.1-pytorch-2.5.1-
py312/images/sha256-
f4a58e50c2209535a9bb2072a9169e00881c5d8262a4973bb6874c677ee3a4dc
20. vastai/openwebui:v0.5.7-cuda-12.1-pytorch-2.5.1-py311 | Docker Hub, accessed
August 7, 2025, https://hub.docker.com/layers/vastai/openwebui/v0.5.7-cuda-12.1-
pytorch-2.5.1-py311/images/sha256-
e294140e3ffb474b34e85b0aa6c7a91e2093510caf2d5b53e94ad87ea08af702
21. train_text_to_image_lora_sdxl.py - huggingface/diffusers - GitHub, accessed
August 7, 2025,
https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/
train_text_to_image_lora_sdxl.py
22. Releasing weights for FLUX.1 Krea | Hacker News, accessed August 7, 2025,
https://news.ycombinator.com/item?id=44745555
23. modelscope/DiffSynth-Studio: Enjoy the magic of Diffusion models! - GitHub,
accessed August 7, 2025, https://github.com/modelscope/DiffSynth-Studio
24. Using LoRA for Efficient Stable Diffusion Fine-Tuning - Hugging Face, accessed
August 7, 2025, https://huggingface.co/blog/lora
25. Fine-Tuning Stable Diffusion with LoRA - MachineLearningMastery.com, accessed
August 7, 2025, https://machinelearningmastery.com/fine-tuning-stable-diffusion-
with-lora/
26. LoRA Support in Diffusers - Hugging Face, accessed August 7, 2025,
https://huggingface.co/docs/diffusers/v0.14.0/en/training/lora
27. Low-Rank Adaptation of Large Language Models (LoRA) - Hugging Face, accessed
August 7, 2025, https://huggingface.co/docs/diffusers/v0.18.0/training/lora
28. LoRA training scripts of the world, unite! - Hugging Face, accessed August 7,
2025, https://huggingface.co/blog/sdxl_lora_advanced_script