0% found this document useful (0 votes)
82 views19 pages

AI Tools

The document provides an overview of Large Language Models (LLMs), explaining their definition, mechanics, and relevance for developers. It covers how LLMs are trained, their architecture, and practical applications such as code generation and document summarization. Additionally, it highlights prompt engineering techniques and best practices for effectively interacting with LLMs to optimize their outputs.

Uploaded by

Ash Ketchum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views19 pages

AI Tools

The document provides an overview of Large Language Models (LLMs), explaining their definition, mechanics, and relevance for developers. It covers how LLMs are trained, their architecture, and practical applications such as code generation and document summarization. Additionally, it highlights prompt engineering techniques and best practices for effectively interacting with LLMs to optimize their outputs.

Uploaded by

Ash Ketchum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

AI Tools + Prompt Engineering for Developers

What It Is (Definition)

LLM (Large Language Model)


A Large Language Model is a type of AI model trained on massive amounts of text data
to understand, generate, and interact with human language.

Think of an LLM as a very smart autocomplete system that doesn't just finish your
sentence — it can summarize articles, write code, answer questions, translate
languages, and more.

It’s called “large” because:

• It has billions (or even trillions) of parameters.

• It’s trained on huge datasets scraped from the internet (books, articles, code,
etc.).
Examples:

• OpenAI’s GPT series (e.g., GPT-4)

• Google’s PaLM

• Meta’s LLaMA

• Anthropic’s Claude

How It Works (Mechanics)

At a high level, here’s how LLMs work:

1. Training

• The model is trained on tons of text using self-supervised learning.

• Goal: Predict the next word in a sentence.


Example:

"The capital of France is ___" → "Paris"

2. Architecture

• Most LLMs use a Transformer architecture (introduced in the 2017 “Attention Is


All You Need” paper).

• Transformers allow the model to pay attention to different parts of a sentence at


once — this is how they understand context so well.

3. Tokens, Not Words

• LLMs don’t see words directly — they see tokens (pieces of words).

• “ChatGPT” → might be split into [‘Chat’, ‘G’, ‘PT’]

4. Inference (Using the Model)

• After training, you can send a prompt (input) to the model.

• It returns a response based on its learned patterns.

Think of it as:
You give it a question (prompt), and it responds like a very educated guess based on
everything it has seen during training.
Why It Matters (Real-World Relevance)

LLMs are changing the way developers build software:

Developers use LLMs to:

• Build chatbots, code assistants, and documentation generators

• Generate or autocomplete code snippets

• Parse unstructured text (like emails, logs, PDFs)

• Summarize customer feedback, legal docs, or health records

• Translate or localize app content

• Power AI agents in tools like LangChain, AutoGen, etc.

Software engineers today are expected to:

• Know how to use APIs like OpenAI or Hugging Face

• Understand prompt engineering

• Integrate LLMs in apps using frameworks like LangChain, FastAPI, or Flask


Theory of LLMs (Large Language Models)

A Large Language Model (LLM) is a deep neural network, typically based on the
Transformer architecture, trained on a massive corpus of text to model the probability
of a word (token) given a sequence of previous tokens.

In simpler terms:

It learns the structure and patterns of human language by predicting the next token in
a sequence.

At its core, an LLM is a probabilistic model:

P(tokent∣token1,token2,...,tokent−1)P(\text{token}_t | \text{token}_1, \text{token}_2, ...,


\text{token}_{t-1})

That means the model doesn't "understand" language like humans — it learns
statistical associations.

How It Works (Mechanics – Deep Dive)

Let’s break it into components:

1. Tokenization

Before any text can be processed, it's split into tokens (subword units):

• "ChatGPT is amazing!" → [Chat, G, PT, is, amazing, !]

• Tokens are mapped to numbers via a vocabulary (token-ID mapping).

2. Embedding Layer

Each token is converted into a high-dimensional vector using embedding matrices.

• For example:
"Chat" → [0.22, -0.88, ..., 1.05] (say, 768 dimensions)

• These vectors are learned during training and represent semantic meaning.
3. Transformer Architecture (The Brain)

Core Components:

• Self-Attention:
Allows each token to look at all other tokens in the sequence to understand
context.

• Multi-Head Attention:
Multiple attention mechanisms run in parallel to capture different aspects of
context.

• Positional Encoding:
Since Transformers don’t process text in order like RNNs, this adds info about
token position.

• Feedforward Layers:
After attention, these are regular neural network layers that refine the
representation.

⏱ Transformer Block Flow:

Input Tokens Embeddings [Multi-Head Attention Feedforward Norm &


Residual] x N Output logits

4. Training Objective (Next-Token Prediction)

The model is trained to minimize the loss between predicted tokens and actual tokens
in a sequence.

• Loss Function: Usually cross-entropy

• It learns through backpropagation and gradient descent

Example:

Input: "The sky is"

Target: "blue"

Model guess: "cloudy" → Loss is high

Model guess: "blue" → Loss is low


5. Generation & Sampling (Inference Time)

Once trained, we use the model for text generation using sampling methods:

• Greedy Search: Always pick the most probable token

• Top-k Sampling: Randomly pick from top-k likely tokens

• Temperature: Controls randomness in output

• Beam Search: Keeps multiple candidates during generation for more fluent
results

Why It Matters (Real-World Relevance – Theoretical Lens)

Understanding theory gives you powerful leverage:

• Helps you write better prompts — knowing how tokenization & attention
work improves precision.

• Helps debug model issues — hallucinations, length limits, context-window


overflows.

• Lets you fine-tune or adapt models using transfer learning or LoRA.

• Helps you optimize for latency, cost, or performance by understanding size,


token limits, etc.

Examples & Use Cases (From Theory to Practice)

1. Context Length Matters (Attention is Limited)

• GPT-4-turbo supports ~128k tokens (~300 pages)

• If you input more than that? Earlier tokens are truncated or ignored

• Fix: Use windowing or summarization techniques

2. Long Prompts = Longer Attention Computation

• Attention complexity is O(n²) — doubling the prompt length quadruples


compute time!

• Tip: Keep prompts short when possible


Developer Tips

Best Practices

• Learn prompt engineering with knowledge of tokenization and embeddings

• Use token counters to avoid hitting limits (e.g., tiktoken)

• Choose right sampling methods for creative vs. factual tasks

Common Misconceptions

• “LLMs understand like humans” → No, they mimic patterns from data

• “Bigger = always better” → Not always; use right size for your use case

• “Training from scratch is better than using OpenAI/HF” → Hugely expensive


and impractical for 99% of developers
Mastering LLM Tools – Overview

Let’s go tool-by-tool and break them down across:

Tool Best For Key Strengths API / Interface

ChatGPT Code, analysis, tool use, Plugins, function calling, Web + API
(OpenAI) API workflows GPTs, Vision (chat + tools)

Reasoning,
Claude Huge context (up to 200K+), Web + Claude
summarization, long
(Anthropic) low hallucinations API
docs

Gemini Google ecosystem, Multimodal, Android Web + Vertex AI


(Google) images + code integrations API

ChatGPT – Power Use Guide

Best Use Cases:

• Code generation, debugging, test writing

• API or CLI assistant

• Function calling & workflows

• File analysis (w/ Pro)

Pro Features (GPT-4 Turbo):

• Custom GPTs with instructions & tools

• Vision: Upload images, diagrams, UI mockups

• Files: Upload codebases, PDFs, CSVs for analysis

• Tools: Code Interpreter, Browser, DALL·E

Dev API Features:

POST [Link]

• Models: gpt-3.5-turbo, gpt-4, gpt-4-turbo

• Function calling (JSON schema)

• Streaming + role-based conversations


Prompting Tip:

"You're a senior Python engineer. Improve this FastAPI endpoint for security and
performance."

Claude 3 – Deep Reasoning & Document Expert

Best Use Cases:

• Long document summarization (200K+ tokens!)

• Writing-first tasks (letters, guides, explanations)

• Ingesting PDFs, markdown, CSVs with deep understanding

Claude 3 Models:

• Claude 3 Haiku (fastest)

• Claude 3 Sonnet (default)

• Claude 3 Opus (best reasoning)

Dev API Features:

POST [Link]

• Models: claude-3-opus-20240229, etc.

• File upload + message threading

• Tool use coming soon (early access)

Prompting Tip:

“Here’s a product spec. Summarize key requirements, risks, and edge cases.”

Gemini – Multimodal + Google Integration

Best Use Cases:

• Code generation with real-time web context

• Android app workflows (Firebase, Studio)

• Multimodal: Images + PDFs + Text in one prompt


Features:

• 1M token context window (on Gemini 1.5)

• Auto-detects uploaded content (PDF, image, code)

• Good for Google Sheets, Gmail, Calendar integration

Dev API (via Vertex AI):

from vertexai.language_models import ChatModel

• Models: gemini-1.5-pro, gemini-1.0-pro

• Supports tool calling + function execution

• Multi-part input support

Prompting Tip:

“You’re an Android Studio tutor. Write a Kotlin UI layout for this screen idea [upload
image].”

Comparison Cheat Sheet

Feature/Need ChatGPT Claude Gemini

Code generation Excellent Good Good

Long doc support Medium (128K) Huge (200K+) (1M in lab)

Reasoning Great Best Good

Vision/Image support Vision (for now) Multimodal

Google integration Sheets, Gmail

Tool + function calling Powerful (coming) In Gemini 1.5

Cost efficiency Turbo is cheap Haiku fast API still pricy


Dev Tips

Use each for what they’re best at:

• Claude = long-form clarity + structured docs

• ChatGPT = building + code + tooling

• Gemini = multimodal + search + Google tools

Test APIs for automation:

• Auto-summarize PRs with Claude

• Generate test cases from schema with ChatGPT

• Create Google Calendar events from voice notes with Gemini

Don’t:

• Mix tools blindly (choose based on task)

• Expect perfect reasoning from free-tier models

• Ignore token limits or rate caps


Prompt Engineering for Developers

1. What It Is (Definition)

Prompt Engineering is the practice of crafting effective inputs (prompts) to large


language models (LLMs) to get reliable, relevant, and accurate outputs.

For developers, it's like giving clear function parameters to a smart but fuzzy API.

Think of it as designing the frontend UX for your backend AI brain.

2. How It Works (Mechanics)

An LLM doesn’t “think” — it predicts the next token based on patterns in training data.
Your prompt becomes the full program context, so:

Key Concepts:

• System Prompt: Sets identity/personality (e.g., "You're a senior Python dev").

• User Prompt: The actual instruction or question.

• Few-shot Examples: Including samples to guide output.

• Chain-of-Thought: Ask the model to “think out loud” before answering.

• Output Formatting: Use markdown, JSON, or code fences to control style.

Prompt = Programming

Prompting is like scripting:

System: "You're an AI who writes clean Python tests."

User: "Write a pytest function to test this FastAPI endpoint: ..."

3. Why It Matters (Real-World Relevance)

Prompting is how developers turn LLMs into usable tools:


From code generation to auto-documentation, bug detection, and chat agents.

No fine-tuning required — prompt engineering = fast, cheap way to prototype


Used by devs in Copilot, ChatGPT, RAG, LangChain, Agents, API wrappers

It's the foundation of LLM-as-a-Tool development.


4. Examples & Use Cases

1. Code Generation Prompt

You're a senior Python engineer.

Write a FastAPI route to upload an image and store it in a local folder.

Use Pydantic for validation.

2. Bug Fixing Prompt

Here’s a Python function. It throws a `TypeError` on line 23.

Explain the bug and suggest a fix:

```python

def calculate_total(items: list[int]):

return sum(item['price'] for item in items)

3. Unit Test Writing

Write a pytest function to test this endpoint:

POST /login with JSON {username, password}. Should return 200 on success.

4. Docstring Generator

Add PEP-257 docstrings to all the functions in this Python file:

```python

def add(x, y): return x + y

5. Regex Explainer

Explain this regex: `^(?=.*[A-Z])(?=.*\d)[A-Za-z\d]{8,}$`

Also, convert it into Python code with a usage example.

5. Developer Tips

Best Practices:

• Be explicit: Assume the model needs context. Add roles, constraints, formats.

• Use markdown/code blocks to structure inputs.

• Give examples (few-shot learning).


• Use step-by-step / chain-of-thought to improve accuracy.

• Use JSON output formatting for structured data.

Common Mistakes:

• Vague prompts like “help me with code”

• Asking for too many tasks in one prompt

• Ignoring context window (cut-off outputs)

• Relying on default chat behavior for precise tasks

Prompt Engineering Tools:

• PromptLayer – Track and version prompts

• LangSmith – Debug and monitor prompt chains

• OpenAI Playground – Experiment easily

6. Bonus Resources

Learn Prompting:

• [Link] – Free, dev-focused course

• OpenAI Cookbook: Prompting Guide

• FlowGPT – Prompt sharing and discovery

Experiment & Test:

• OpenAI Playground

• Prompt Engineering Guide

TL;DR Cheat Sheet for Devs

Task Prompt Format Idea

Role setup “You are a senior engineer who...”

Structure “Output JSON in this format: {name, type, summary}”

Testing “Write a test case for...”


Task Prompt Format Idea

Tool usage “Use Python’s re module to…”

Code chaining “First analyze the code, then write improved version”

RAG flows “Summarize this document. Then generate 3 follow-up questions”


How to Write Effective Prompts: A Simple Structure

1. Context / Background

Provide a brief context or background info so the AI understands the setting or topic.

• Example: "I’m building a web app that tracks fitness routines..."

2. Clear Task / Question

State exactly what you want the AI to do.

• Example: "Explain how to implement user authentication in FastAPI."

3. Constraints / Requirements

Add any specific constraints or details you want included or avoided.

• Example: "Use OAuth2 and include code samples in Python."

4. Format / Output Style

Specify how you want the answer formatted.

• Example: "Give me a step-by-step guide with bullet points."

5. Optional: Examples / References

Provide examples or mention preferred styles for clarity.

• Example: "Use simple language like explaining to a beginner."

Template

[Context / Background]

[Clear Task / Question]

[Constraints / Requirements]

[Format / Output Style]

(Optional) [Examples / References]


Example Prompt Using This Structure

[Role/Persona]

Act as a highly experienced Full Stack Developer with expertise in Spring Boot (Java),
[Link], and JWT-based authentication. You are also skilled in designing scalable
RESTful APIs and follow best practices in clean code and software architecture.

[Task or Goal]

Your goal is to help me design a robust backend architecture for a Hospital Management
System (HMS) that includes user authentication, role-based access control (Admin,
Doctor, Patient), and appointment booking functionality. You also need to recommend
the proper folder structure, best practices, and database schema for scalability.

[Context or Background]

This is a final-year academic project I’m building with a team of 4 members. I am


responsible for the backend and JWT-based authentication module. The system will
have three types of users – Admin (as a User with role-based access), Doctors, and
Patients. Patients can register, book appointments, and view records. Doctors can
manage appointments and view patient details. Admin can manage doctor and patient
records. We are using MySQL as the database. I want to follow a layered architecture
with Controllers, Services, DTOs, Repositories, and Entities. Exception handling and
security filters are mandatory.

[Instructions or Constraints]

- Provide detailed guidance on folder structure following industry standards.

- Use simple yet professional language suitable for an academic submission.

- Recommend proper class naming conventions and package names.

- Include code snippets (not the full project) to explain key parts like security
configuration, token filter, user authentication flow, and appointment booking API.

- Mention the use of annotations like `@PreAuthorize` for role-based access.

- Avoid overly advanced enterprise-level patterns that are not needed for a college
project.
[Output Format]

Break down the response into the following clear sections:

1. Project Folder Structure (with explanation)

2. Entity and Database Design (with ER Diagram in text format)

3. Key Classes & Code Snippets (Controller, Service, SecurityConfig, Filter)

4. Authentication Flow Explanation (Step-by-step)

5. Best Practices to Follow

6. Conclusion & Next Steps

You might also like