How to Fix LM Studio Response Cutoffs in Antigravity

You've connected LM Studio to Antigravity, and the setup looks right — the model loads, short prompts work fine. But ask it to refactor a larger file, and the response just stops partway through. Sometimes it freezes for 30 seconds, then dies with a stream error. Sometimes it cuts at exactly the same token count every time, which gives you the sinking feeling that something is configured wrong rather than broken.

I've been running Gemma 4 through LM Studio connected to Antigravity for several months, and this "connected but streaming fails" pattern is its own category of problem — distinct from connection errors, and unfortunately not covered well in most LLM setup guides. Here's what's actually going on and how to fix it.

If you're not yet getting a connection at all, see LM Studio model not visible: connection troubleshooting first. If you're on Ollama instead of LM Studio, the Ollama streaming cutoff fix guide covers your case.

Separating "can't connect" from "disconnects mid-stream"

Before diving into fixes, it helps to know which layer you're actually dealing with. Local LLM integrations can fail at three distinct points.

The connection layer (ports, firewall) is whether Antigravity can reach LM Studio's local server at all. You can verify this by visiting http://localhost:1234/v1/models in your browser — if you see a model list, you're past this layer.

The model load layer (VRAM, RAM) is whether the model is loaded and can produce any output. Try a one-line prompt directly in LM Studio's chat UI to confirm it generates text fluently.

The streaming persistence layer is whether a long-running generation can stay alive until completion. This is the layer causing the mid-stream cutoff. The characteristic symptoms: short prompts succeed, but longer or multi-file requests fail partway through; the failure often occurs after a predictable time interval (30s, 60s, etc.).

Fix 1: Extend LM Studio's server timeout

LM Studio's local server has a request timeout that defaults to a relatively short window. When Antigravity is generating a response involving multiple files or a long refactor, the generation can easily run past this limit.

Open LM Studio and navigate to the "Local Server" tab. While the server is running, click the gear icon ("Server Options") and look for the timeout settings.

// Reference values for LM Studio server settings
// These are configured through the LM Studio UI, not directly via JSON
{
  "server": {
    "request_timeout": 300,   // Default ~60s → extend to 300s or higher
    "stream_timeout": 0,      // 0 = no stream timeout (recommended for heavy use)
    "max_context_length": 8192
  }
}

After changing these settings, stop and restart the server. In Antigravity, go to Settings → Local LLM, disconnect, and reconnect to clear any cached session state.

This single change fixes the issue in most cases. If the cutoffs were happening at roughly the same time each run, this is almost certainly your culprit.

Fix 2: Increase the context length allocation

LM Studio loads models with a default context length that may be lower than you'd expect — often 2048 or 4096 tokens. When Antigravity passes in a large codebase context along with your instruction, it's easy to blow past this limit. The model doesn't error out gracefully; it simply stops generating at the point where context is exhausted.

In LM Studio, go to "My Models," select your model, and open the detailed load options before clicking "Load."

Context Length recommendations for Gemma 4 12B:
  Default:    2048–4096 tokens  ← frequently too low
  Minimum:    8192 tokens       (covers most single-file work)
  Recommended: 32768 tokens     (for multi-file refactoring)

Note: Higher context = more VRAM required. Watch the memory estimate
at the bottom of the load dialog before confirming.

After changing the context length, unload and reload the model. In LM Studio's Server Logs, you should see the new context size reflected in the model parameters during initialization.

A practical test: paste the full content of your largest project file into LM Studio's chat and ask a simple question. If that alone fails, context length is the issue.

Fix 3: Check generation speed for VRAM saturation

When VRAM is over-allocated, generation speed drops sharply. A model that should produce 20–30 tokens/second might fall to 1–2 tokens/second when VRAM is paging to system RAM. At that speed, Antigravity's client-side timeout fires before the model finishes.

Check LM Studio's server logs (Server tab → Logs) during a generation from Antigravity:

# Healthy generation speed — no timeout risk
[INFO] eval time = 2341 ms / 128 tokens → 54.7 tokens/s
 
# VRAM-constrained, at risk of timeout
[INFO] eval time = 89420 ms / 128 tokens → 1.4 tokens/s

Below 5 tokens/second is a warning sign. Below 2 tokens/second, you'll consistently hit timeouts on complex requests. Options for recovery:

Downsize the model. A 12B model running smoothly at 20 t/s is practically more useful than a 27B model crawling at 2 t/s.

Adjust quantization. Lower quantization reduces VRAM consumption and improves speed, with a moderate quality trade-off:

Quantization vs. VRAM for Gemma 4 12B (approximate):
  Q8_0:    ~13.5 GB VRAM  — highest quality
  Q5_K_M:   ~9.0 GB VRAM  — good balance (recommended starting point)
  Q4_K_M:   ~7.5 GB VRAM  — efficient, quality still solid
  Q3_K_M:   ~6.0 GB VRAM  — minimum viable, quality drops noticeably

Reduce GPU Layers if VRAM is shared. If another GPU-intensive application is running alongside Antigravity, reduce GPU Layers from -1 (full offload) to a partial number to leave headroom.

Fix 4: Configure Antigravity's receive timeout

If the first three fixes don't resolve it, you can extend the timeout on Antigravity's side. This is useful for situations where the model generates correctly but Antigravity disconnects from the stream before it finishes.

// .antigravity/settings.json
{
  "localLLM": {
    "provider": "lm-studio",
    "baseUrl": "http://localhost:1234/v1",
    "requestTimeout": 600,
    "streamTimeout": 600,
    "keepAliveInterval": 30
  }
}

requestTimeout and streamTimeout are both in seconds. 600 (10 minutes) is generous, but necessary if you're doing large-scale refactors in one shot. keepAliveInterval sends a keep-alive signal every 30 seconds, preventing the HTTP layer from treating the open stream as idle and closing it.

Even on localhost, Antigravity's HTTP client can decide the connection is inactive and drop it. Setting keepAliveInterval to 30 addresses this specifically.

Suggested order for diagnosis

If you're starting fresh on this problem, work through the causes in this order:

First, check LM Studio's server timeout settings (Fix 1). If you haven't changed them from default, this is the most likely cause and the fastest fix to verify.

Next, check context length allocation (Fix 2). If cutoffs only happen on longer requests but short ones are fine, this is the likely bottleneck.

Then, check generation speed in the logs (Fix 3). If speed is under 5 tokens/second, focus on model size and quantization before touching timeouts.

Finally, if everything else looks right, add Antigravity's timeout configuration (Fix 4) to handle any remaining edge cases.

Multiple causes can compound with each other, especially on machines where RAM or VRAM is shared with other processes. The methodical one-change-at-a-time approach feels slower but saves time overall.

For a broader overview of local LLM settings in Antigravity, the local LLM configuration guide covers the full setup process from scratch.

Understanding why this problem is easy to miss

One reason these streaming cutoffs are frustrating to debug is that LM Studio shows the connection as active right up until the moment it drops. There's no red indicator, no warning in the status bar. Antigravity shows the model name correctly. Everything looks fine until you try something long.

The disconnect also happens silently in ways that mask the root cause. If the cutoff is from a context limit, you'll see output stop mid-sentence, often at a suspiciously consistent point. If it's a timeout, you'll see a delay before the drop. If it's VRAM saturation, the output will produce correctly but increasingly slowly before the client gives up. Learning to read these patterns — and correlating them with the LM Studio server log timestamps — is the most efficient way to find the right fix without burning time on settings that aren't relevant.

One pattern worth calling out: if you're running LM Studio on a machine with a GPU that's also used for display rendering (most consumer setups), macOS and Windows will pull VRAM back from running applications to serve display compositing. This can cause intermittent drops that only happen when you interact with other windows during a long generation. If the cutoffs seem correlated with switching focus or scrolling elsewhere, this is why.

When the issue is actually the model, not the configuration

Occasionally, a specific model file itself causes streaming problems — a corrupted quantization layer, or a checkpoint where specific token ranges produce malformed outputs that the streaming parser rejects. This is rare but worth knowing about.

If you've exhausted the four fixes above and the issue persists only with one specific model file, try redownloading it. LM Studio doesn't automatically verify model checksums after download, so a partially corrupted file can produce symptoms that look like timeout issues but aren't fixable through configuration.

A quick test: download a different model (even a small one, like a 1.5B or 3B quantization), load it, and run the same prompt that was failing. If it streams to completion, your original model file is worth replacing.

For thorough coverage of LM Studio model management and redownload procedures, the LM Studio documentation is the most reliable reference — it's updated faster than third-party guides when the application changes its model storage format.

Looking back

Streaming cutoffs from LM Studio in Antigravity almost always trace back to one of four configuration gaps: server timeout settings that weren't extended from defaults, a context length allocation that's too low for multi-file tasks, VRAM saturation that slows generation to a crawl, or Antigravity's own receive timeout firing before the model finishes.

Work through them in that order, verify with a test prompt at each step, and most cases resolve without needing to change both sides simultaneously. The configuration changes are minimal — mostly single value edits in LM Studio's server options or a few lines in .antigravity/settings.json — and the payoff is reliable long-form generation without unexpected interruptions.