{"meta":{"title":"Scaling Copilot SDK deployments","intro":"Design your GitHub Copilot SDK deployment to serve multiple users, handle concurrent sessions, and scale horizontally across infrastructure.","product":"GitHub Copilot","breadcrumbs":[{"href":"/en/copilot","title":"GitHub Copilot"},{"href":"/en/copilot/how-tos","title":"How-tos"},{"href":"/en/copilot/how-tos/copilot-sdk","title":"Copilot SDK"},{"href":"/en/copilot/how-tos/copilot-sdk/set-up-copilot-sdk","title":"Set up Copilot SDK"},{"href":"/en/copilot/how-tos/copilot-sdk/set-up-copilot-sdk/scaling","title":"Scaling"}],"documentType":"article"},"body":"# Scaling Copilot SDK deployments\n\nDesign your GitHub Copilot SDK deployment to serve multiple users, handle concurrent sessions, and scale horizontally across infrastructure.\n\n> \\[!NOTE]\n> Copilot SDK is currently in technical preview. Functionality and availability are subject to change.\n\nConsider the different isolation patterns for CLI sessions, and how you want to manage concurrent sessions and resources, when implementing your application.\n\n**Best for:** Platform developers, SaaS builders, and any deployment serving more than a few concurrent users.\n\n## Session isolation patterns\n\nBefore choosing a pattern, consider three dimensions:\n\n* **Isolation**: Who can see which sessions?\n* **Concurrency**: How many sessions can run simultaneously?\n* **Persistence**: How long do sessions live?\n\n![Diagram showing the three scaling dimensions for Copilot SDK deployments: isolation, concurrency, and persistence.](/assets/images/help/copilot/copilot-sdk/scaling-core-concepts.png)\n\n### Pattern 1: Isolated CLI per user\n\nEach user gets their own CLI server instance. This is the strongest isolation—a user's sessions, memory, and processes are completely separated.\n\n![Diagram showing the isolated CLI per user pattern, where each user gets a dedicated CLI server instance.](/assets/images/help/copilot/copilot-sdk/scaling-isolated-cli-pattern.png)\n\n**When to use:**\n\n* Multi-tenant SaaS where data isolation is critical.\n* Users with different authentication credentials.\n* Compliance requirements such as SOC 2 or HIPAA.\n\n```typescript\n// CLI pool manager—one CLI per user\nclass CLIPool {\n    private instances = new Map<string, { client: CopilotClient; port: number }>();\n    private nextPort = 5000;\n\n    async getClientForUser(userId: string, token?: string): Promise<CopilotClient> {\n        if (this.instances.has(userId)) {\n            return this.instances.get(userId)!.client;\n        }\n\n        const port = this.nextPort++;\n\n        // Spawn a dedicated CLI for this user\n        await spawnCLI(port, token);\n\n        const client = new CopilotClient({\n            cliUrl: `localhost:${port}`,\n        });\n\n        this.instances.set(userId, { client, port });\n        return client;\n    }\n\n    async releaseUser(userId: string): Promise<void> {\n        const instance = this.instances.get(userId);\n        if (instance) {\n            await instance.client.stop();\n            this.instances.delete(userId);\n        }\n    }\n}\n```\n\n### Pattern 2: Shared CLI with session isolation\n\nMultiple users share one CLI server but have isolated sessions via unique session IDs. This is lighter on resources, but provides weaker isolation.\n\n![Diagram showing the shared CLI pattern, where multiple users share one CLI server with isolated sessions.](/assets/images/help/copilot/copilot-sdk/scaling-shared-cli-pattern.png)\n\n**When to use:**\n\n* Internal tools with trusted users.\n* Resource-constrained environments.\n* Lower isolation requirements.\n\n```typescript\nconst sharedClient = new CopilotClient({\n    cliUrl: \"localhost:4321\",\n});\n\n// Enforce session isolation through naming conventions\nfunction getSessionId(userId: string, purpose: string): string {\n    return `${userId}-${purpose}-${Date.now()}`;\n}\n\n// Access control: ensure users can only access their own sessions\nasync function resumeSessionWithAuth(\n    sessionId: string,\n    currentUserId: string\n): Promise<Session> {\n    const [sessionUserId] = sessionId.split(\"-\");\n    if (sessionUserId !== currentUserId) {\n        throw new Error(\"Access denied: session belongs to another user\");\n    }\n    return sharedClient.resumeSession(sessionId);\n}\n```\n\n### Pattern 3: Shared sessions (collaborative)\n\nMultiple users interact with the same session—like a shared chat room with Copilot. This pattern requires application-level session locking.\n\n![Diagram showing the shared sessions pattern, where multiple users interact with the same session through a message queue and session lock.](/assets/images/help/copilot/copilot-sdk/scaling-shared-sessions.png)\n\n**When to use:**\n\n* Team collaboration tools.\n* Shared code review sessions.\n* Pair programming assistants.\n\n> \\[!NOTE]\n> The SDK doesn't provide built-in session locking. You must serialize access to prevent concurrent writes to the same session.\n\n```typescript\nimport Redis from \"ioredis\";\n\nconst redis = new Redis();\n\nasync function withSessionLock<T>(\n    sessionId: string,\n    fn: () => Promise<T>,\n    timeoutSec = 300\n): Promise<T> {\n    const lockKey = `session-lock:${sessionId}`;\n    const lockId = crypto.randomUUID();\n\n    // Acquire lock\n    const acquired = await redis.set(lockKey, lockId, \"NX\", \"EX\", timeoutSec);\n    if (!acquired) {\n        throw new Error(\"Session is in use by another user\");\n    }\n\n    try {\n        return await fn();\n    } finally {\n        // Release lock only if we still own it\n        const currentLock = await redis.get(lockKey);\n        if (currentLock === lockId) {\n            await redis.del(lockKey);\n        }\n    }\n}\n\n// Serialize access to a shared session\napp.post(\"/team-chat\", authMiddleware, async (req, res) => {\n    const result = await withSessionLock(\"team-project-review\", async () => {\n        const session = await client.resumeSession(\"team-project-review\");\n        return session.sendAndWait({ prompt: req.body.message });\n    });\n\n    res.json({ content: result?.data.content });\n});\n```\n\n## Comparison of isolation patterns\n\n|                      | Isolated CLI per user | Shared CLI + session isolation | Shared sessions           |\n| -------------------- | --------------------- | ------------------------------ | ------------------------- |\n| **Isolation**        | Complete              | Logical                        | Shared                    |\n| **Resource usage**   | High (CLI per user)   | Low (one CLI)                  | Low (one CLI and session) |\n| **Complexity**       | Medium                | Low                            | High (requires locking)   |\n| **Auth flexibility** | Per-user tokens       | Service token                  | Service token             |\n| **Best for**         | Multi-tenant SaaS     | Internal tools                 | Collaboration             |\n\n## Horizontal scaling\n\n### Multiple CLI servers behind a load balancer\n\nTo serve more concurrent users, run multiple CLI server instances behind a load balancer. Session state must be on **shared storage** so any CLI server can resume any session.\n\n![Diagram showing multiple CLI servers behind a load balancer with shared storage for session state.](/assets/images/help/copilot/copilot-sdk/scaling-multiple-cli-servers.png)\n\n```typescript\n// Route sessions across CLI servers\nclass CLILoadBalancer {\n    private servers: string[];\n    private currentIndex = 0;\n\n    constructor(servers: string[]) {\n        this.servers = servers;\n    }\n\n    // Round-robin selection\n    getNextServer(): string {\n        const server = this.servers[this.currentIndex];\n        this.currentIndex = (this.currentIndex + 1) % this.servers.length;\n        return server;\n    }\n\n    // Sticky sessions: same user always hits same server\n    getServerForUser(userId: string): string {\n        const hash = this.hashCode(userId);\n        return this.servers[hash % this.servers.length];\n    }\n\n    private hashCode(str: string): number {\n        let hash = 0;\n        for (let i = 0; i < str.length; i++) {\n            hash = (hash << 5) - hash + str.charCodeAt(i);\n            hash |= 0;\n        }\n        return Math.abs(hash);\n    }\n}\n\nconst lb = new CLILoadBalancer([\n    \"cli-1:4321\",\n    \"cli-2:4321\",\n    \"cli-3:4321\",\n]);\n\napp.post(\"/chat\", async (req, res) => {\n    const server = lb.getServerForUser(req.user.id);\n    const client = new CopilotClient({ cliUrl: server });\n\n    const session = await client.createSession({\n        sessionId: `user-${req.user.id}-chat`,\n        model: \"gpt-4.1\",\n    });\n\n    const response = await session.sendAndWait({ prompt: req.body.message });\n    res.json({ content: response?.data.content });\n});\n```\n\n### Sticky sessions vs. shared storage\n\n![Diagram comparing sticky sessions and shared storage approaches for scaling Copilot SDK deployments.](/assets/images/help/copilot/copilot-sdk/scaling-stick-and-shared-sessions.png)\n\n**Sticky sessions** pin each user to a specific CLI server. No shared storage is needed, but load distribution can be uneven if user traffic varies significantly.\n\n**Shared storage** enables any CLI to handle any session. Load distribution is more even, but requires networked storage for `~/.copilot/session-state/`.\n\n## Vertical scaling\n\n### Tuning a single CLI server\n\nA single CLI server can handle many concurrent sessions. The key is managing session lifecycle to avoid resource exhaustion:\n\n![Diagram showing the resource dimensions for vertical scaling: CPU, memory, disk I/O, and network.](/assets/images/help/copilot/copilot-sdk/scaling-vertical-scaling.png)\n\n```typescript\n// Limit concurrent active sessions\nclass SessionManager {\n    private activeSessions = new Map<string, Session>();\n    private maxConcurrent: number;\n\n    constructor(maxConcurrent = 50) {\n        this.maxConcurrent = maxConcurrent;\n    }\n\n    async getSession(sessionId: string): Promise<Session> {\n        // Return existing active session\n        if (this.activeSessions.has(sessionId)) {\n            return this.activeSessions.get(sessionId)!;\n        }\n\n        // Enforce concurrency limit\n        if (this.activeSessions.size >= this.maxConcurrent) {\n            await this.evictOldestSession();\n        }\n\n        // Create or resume\n        const session = await client.createSession({\n            sessionId,\n            model: \"gpt-4.1\",\n        });\n\n        this.activeSessions.set(sessionId, session);\n        return session;\n    }\n\n    private async evictOldestSession(): Promise<void> {\n        const [oldestId] = this.activeSessions.keys();\n        const session = this.activeSessions.get(oldestId)!;\n        // Session state is persisted automatically—safe to disconnect\n        await session.disconnect();\n        this.activeSessions.delete(oldestId);\n    }\n}\n```\n\n## Ephemeral vs. persistent sessions\n\n![Diagram comparing ephemeral sessions and persistent sessions for Copilot SDK deployments.](/assets/images/help/copilot/copilot-sdk/scaling-ephemeral-vs-persistent-sessions.png)\n\n**Ephemeral sessions** are created per request and destroyed after use. They are ideal for one-shot tasks and stateless APIs.\n\n**Persistent sessions** are named, survive restarts, and are resumable. They are ideal for multi-turn chat and long workflows.\n\n### Ephemeral sessions\n\n```typescript\napp.post(\"/api/analyze\", async (req, res) => {\n    const session = await client.createSession({\n        model: \"gpt-4.1\",\n    });\n\n    try {\n        const response = await session.sendAndWait({\n            prompt: req.body.prompt,\n        });\n        res.json({ result: response?.data.content });\n    } finally {\n        await session.disconnect();\n    }\n});\n```\n\n### Persistent sessions\n\n```typescript\n// Start a conversation\napp.post(\"/api/chat/start\", async (req, res) => {\n    const sessionId = `user-${req.user.id}-${Date.now()}`;\n\n    const session = await client.createSession({\n        sessionId,\n        model: \"gpt-4.1\",\n        infiniteSessions: {\n            enabled: true,\n            backgroundCompactionThreshold: 0.80,\n        },\n    });\n\n    res.json({ sessionId });\n});\n\n// Continue the conversation\napp.post(\"/api/chat/message\", async (req, res) => {\n    const session = await client.resumeSession(req.body.sessionId);\n    const response = await session.sendAndWait({ prompt: req.body.message });\n\n    res.json({ content: response?.data.content });\n});\n\n// Clean up when done\napp.post(\"/api/chat/end\", async (req, res) => {\n    await client.deleteSession(req.body.sessionId);\n    res.json({ success: true });\n});\n```\n\n## Container deployments\n\n### Kubernetes with persistent storage\n\nThe following example deploys three CLI replicas sharing a `PersistentVolumeClaim` so that any replica can resume any session.\n\n```yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: copilot-cli\nspec:\n  replicas: 3\n  selector:\n    matchLabels:\n      app: copilot-cli\n  template:\n    metadata:\n      labels:\n        app: copilot-cli\n    spec:\n      containers:\n        - name: copilot-cli\n          image: ghcr.io/github/copilot-cli:latest\n          args: [\"--headless\", \"--port\", \"4321\"]\n          env:\n            - name: COPILOT_GITHUB_TOKEN\n              valueFrom:\n                secretKeyRef:\n                  name: copilot-secrets\n                  key: github-token\n          ports:\n            - containerPort: 4321\n          volumeMounts:\n            - name: session-state\n              mountPath: /root/.copilot/session-state\n      volumes:\n        - name: session-state\n          persistentVolumeClaim:\n            claimName: copilot-sessions-pvc\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: copilot-cli\nspec:\n  selector:\n    app: copilot-cli\n  ports:\n    - port: 4321\n      targetPort: 4321\n```\n\n![Diagram showing a Kubernetes deployment with multiple CLI server pods sharing a PersistentVolumeClaim for session state.](/assets/images/help/copilot/copilot-sdk/scaling-container-deployments.png)\n\n## Production checklist\n\n| Concern             | Recommendation                                                        |\n| ------------------- | --------------------------------------------------------------------- |\n| **Session cleanup** | Run periodic cleanup to delete sessions older than your TTL.          |\n| **Health checks**   | Ping the CLI server periodically; restart if unresponsive.            |\n| **Storage**         | Mount persistent volumes for `~/.copilot/session-state/`.             |\n| **Secrets**         | Use your platform's secret manager (Vault, Kubernetes Secrets, etc.). |\n| **Monitoring**      | Track active session count, response latency, and error rates.        |\n| **Locking**         | Use Redis or similar for shared session access.                       |\n| **Shutdown**        | Drain active sessions before stopping CLI servers.                    |\n\n## Limitations\n\n| Limitation                      | Details                                                    |\n| ------------------------------- | ---------------------------------------------------------- |\n| **No built-in session locking** | Implement application-level locking for concurrent access. |\n| **No built-in load balancing**  | Use an external load balancer or service mesh.             |\n| **Session state is file-based** | Requires a shared filesystem for multi-server setups.      |\n| **30-minute idle timeout**      | Sessions without activity are auto-cleaned by the CLI.     |\n| **CLI is single-process**       | Scale by adding more CLI server instances, not threads.    |\n\n## Next steps\n\n* For core server-side setup, see [Setting up Copilot SDK for backend services](/en/copilot/how-tos/copilot-sdk/set-up-copilot-sdk/backend-services).\n* For multi-user authentication, see [Using GitHub OAuth with Copilot SDK](/en/copilot/how-tos/copilot-sdk/set-up-copilot-sdk/github-oauth).\n* For installation and your first message, see [Getting started with Copilot SDK](/en/copilot/how-tos/copilot-sdk/sdk-getting-started)."}