Building with the Gemini API in Python is exciting until your code hits an error message you've never seen before. RESOURCE_EXHAUSTED? SAFETY? INVALID_ARGUMENT? The first time I saw RESOURCE_EXHAUSTED, I spent 30 minutes hunting down memory leaks and checking CPU usage before realizing it had nothing to do with my machine — it was an API quota limit.
This is a problem a lot of developers run into early on: the Gemini API returns errors that look unfamiliar, especially if you're coming from REST-centric APIs where HTTP 400 and 429 tell you most of what you need to know. The Gemini Python SDK uses gRPC under the hood, which adds its own layer of error terminology on top of standard HTTP codes.
Start Here: Catch Errors by Type
The Gemini API raises exceptions from the google.api_core.exceptions module. The most important habit to develop is catching errors by their specific type rather than using a generic except Exception block. This alone makes debugging dramatically faster — you'll see exactly which exception class was raised instead of hunting through a wall of text.
import google.generativeai as genai
from google.api_core import exceptions as google_exceptions
genai.configure(api_key="YOUR_GEMINI_API_KEY")
model = genai.GenerativeModel("gemini-2.5-flash")
try:
response = model.generate_content("Hello")
print(response.text)
except google_exceptions.ResourceExhausted as e:
print(f"Quota exceeded: {e}")
except google_exceptions.InvalidArgument as e:
print(f"Bad request format: {e}")
except google_exceptions.DeadlineExceeded as e:
print(f"Request timed out: {e}")
except google_exceptions.PermissionDenied as e:
print(f"Authentication error: {e}")
except Exception as e:
print(f"Unexpected error: {type(e).__name__}: {e}")Once you can see which exception type was raised, you know exactly which section of this guide to read. Set this pattern up in your project early — it'll save you a lot of guesswork.
RESOURCE_EXHAUSTED (429) — Quota Exceeded
When you see RESOURCE_EXHAUSTED or HTTP 429, you've hit the API's usage limits. The Gemini API free tier enforces two distinct limits:
- RPM (Requests Per Minute): How many API calls you can make in a 60-second window
- TPD (Tokens Per Day): Total tokens (input + output) across all requests in a 24-hour period
During development, hitting the RPM limit is extremely common — especially when running tests in a loop, building agents that make multiple calls per user turn, or rapidly iterating on prompts. The limit resets after 60 seconds, but waiting is frustrating.
The most robust solution is exponential backoff with jitter: wait progressively longer between retries, and add a small random delay to prevent all concurrent requests from firing at exactly the same moment (the "thundering herd" problem).
import time
import random
import google.generativeai as genai
from google.api_core import exceptions as google_exceptions
def generate_with_retry(model, prompt, max_retries=5):
"""generate_content with exponential backoff on quota errors."""
for attempt in range(max_retries):
try:
return model.generate_content(prompt)
except google_exceptions.ResourceExhausted as e:
if attempt == max_retries - 1:
raise # Give up after the final attempt
wait_time = (2 ** attempt) + random.uniform(0, 1)
print(f"Quota exceeded. Retrying in {wait_time:.1f}s ({attempt + 1}/{max_retries})")
time.sleep(wait_time)
model = genai.GenerativeModel("gemini-2.5-flash")
response = generate_with_retry(model, "Summarize this document...")
print(response.text)If you're consistently hitting the TPD limit, consider caching responses for identical or near-identical prompts. For persistent quota issues in production, check your current usage on Google AI Studio and evaluate whether a paid tier makes sense for your usage pattern.
INVALID_ARGUMENT (400) — Bad Request Format
INVALID_ARGUMENT (HTTP 400) means the request you sent was malformed in some way. This one is tricky because the root cause varies widely. Here are the patterns I run into most often.
Pattern 1: Sending raw bytes for multimodal input
When working with images, passing raw bytes directly to generate_content() will fail with a confusing error message. You need to use PIL to open the image file, or use genai.upload_file() for larger files.
# This raises INVALID_ARGUMENT — don't pass raw bytes
with open("image.jpg", "rb") as f:
image_bytes = f.read()
response = model.generate_content(["What's in this image?", image_bytes])
# → INVALID_ARGUMENT: Request contains an invalid argument.
# This works — use PIL.Image
import PIL.Image
image = PIL.Image.open("image.jpg")
response = model.generate_content(["What's in this image?", image])
print(response.text)Pattern 2: Exceeding the context window
If your input text is too long, you'll hit this error. Look for token_count or max_tokens in the error message — that's the confirmation. You can either split your content into smaller chunks and process them sequentially, or switch to a model with a larger context window. The gemini-2.5-pro model supports a significantly larger context than gemini-2.5-flash, which helps for long documents.
Pattern 3: Overly complex JSON output schemas
When using structured output with response_mime_type="application/json", a deeply nested or overly complex response_schema can trigger this error. The error message won't always tell you exactly which part of the schema is the problem. The fastest diagnostic approach is binary search: comment out half the schema, see if it passes, then narrow down from there.
# Simpler schema — more reliable
from google.generativeai import types
simple_schema = {
"type": "object",
"properties": {
"title": {"type": "string"},
"summary": {"type": "string"},
"keywords": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["title", "summary"]
}
response = model.generate_content(
"Analyze this article: ...",
generation_config={"response_mime_type": "application/json",
"response_schema": simple_schema}
)SAFETY — Content Filtered
The SAFETY finish reason means Gemini's safety filters blocked either your input prompt or the generated response. This one catches people off guard because it doesn't always raise an exception outright — sometimes you get a response object back, but trying to access response.text then crashes your app with a ValueError that doesn't obviously explain what happened.
response = model.generate_content(prompt)
# This crashes silently if SAFETY triggered
print(response.text)
# → ValueError: response.text quick accessor only works when the response contains a valid Part...
# Always check finish_reason before accessing text
if response.candidates and response.candidates[0].finish_reason.name == "STOP":
print(response.text)
elif response.candidates and response.candidates[0].finish_reason.name == "SAFETY":
print("Blocked by safety filters")
# Inspect which category triggered it
for rating in response.candidates[0].safety_ratings:
if rating.probability.name not in ("NEGLIGIBLE", "LOW"):
print(f" Category: {rating.category.name}, Probability: {rating.probability.name}")
else:
finish = response.candidates[0].finish_reason.name if response.candidates else "NO_CANDIDATES"
print(f"Generation stopped: {finish}")During development, you can adjust safety thresholds via the safety_settings parameter if you're working in a context where certain content categories are legitimately needed. In production, ensure any threshold adjustments align with your application's use case and Google's usage policies.
DEADLINE_EXCEEDED — Request Timed Out
DEADLINE_EXCEEDED fires when the API doesn't respond within the timeout window. This shows up most often with long text generation or complex multi-step reasoning tasks on gemini-2.5-pro. The default timeout in the google-genai client is around 60 seconds — which is fine for most queries, but not enough for generating long documents or running deep analysis.
You can extend the timeout via request_options:
import google.generativeai as genai
genai.configure(api_key="YOUR_GEMINI_API_KEY")
model = genai.GenerativeModel("gemini-2.5-pro")
# For long generation or reasoning tasks, 120-300 seconds is a reasonable range
response = model.generate_content(
"Refactor the following 500-line codebase and explain every change...",
request_options={"timeout": 180} # 3 minutes
)
print(response.text)A more robust approach for long outputs is streaming. With streaming, you receive tokens as they're generated rather than waiting for the full response. This means you can display partial results immediately, and the connection stays alive as long as tokens are arriving.
# Streaming for long outputs — handles DEADLINE_EXCEEDED gracefully
for chunk in model.generate_content(
"Write a detailed technical article about...",
stream=True
):
if chunk.text:
print(chunk.text, end="", flush=True)
print() # Final newline after streaming completesStreaming is the approach I reach for whenever I'm generating output longer than a few paragraphs. It's more resilient and gives a much better user experience in production apps.
PERMISSION_DENIED — Authentication Issues
PERMISSION_DENIED means your API key is invalid, expired, or doesn't have the right permissions for what you're trying to do. Work through this checklist when you hit it:
- Verify your API key is actually loaded (
echo $GEMINI_API_KEYin terminal — if it prints nothing, it's not set) - Confirm you're using a Google AI Studio key, not a Vertex AI key — they are not interchangeable, and the error message won't always tell you which one you have
- Check that the Gemini API is enabled in your Google AI Studio project settings
- If you recently rotated your key, make sure all running processes picked up the new value
Avoid hardcoding your API key directly in your Antigravity project files or .env tracked by git. Reading from an environment variable is the safe pattern — it keeps credentials out of your version history.
import os
import google.generativeai as genai
# Safe pattern: read from environment variable
api_key = os.environ.get("GEMINI_API_KEY")
if not api_key:
raise ValueError(
"GEMINI_API_KEY environment variable is not set. "
"Export it with: export GEMINI_API_KEY='your-key-here'"
)
genai.configure(api_key=api_key)
model = genai.GenerativeModel("gemini-2.5-flash")
response = model.generate_content("Hello")
print(response.text)When You're Stuck: Log the Full Exception
If the error message alone doesn't point you to a clear cause, log the complete exception object. The grpc_status_code attribute gives you a more precise category than the HTTP status code, and the full traceback often contains additional context buried in the exception chain.
import traceback
from google.api_core import exceptions as google_exceptions
try:
response = model.generate_content(prompt)
print(response.text)
except google_exceptions.GoogleAPIError as e:
print(f"Exception type: {type(e).__name__}")
print(f"gRPC status code: {e.grpc_status_code}")
print(f"Message: {e.message}")
print(f"Full traceback:\n{traceback.format_exc()}")
except Exception as e:
# Catch non-API errors (e.g., ValueError from SAFETY finish_reason)
print(f"Non-API error: {type(e).__name__}: {e}")
print(traceback.format_exc())Once you can consistently identify which error type you're dealing with, most Gemini API Python issues become straightforward to resolve. The pattern is always the same: identify the exception class, match it to the cause, and apply the appropriate fix. Most issues fall into one of the five categories covered here.
For broader API authentication, rate limiting, and timeout scenarios, AI API Rate Limit, Auth, and Timeout Error Fixes covers complementary patterns worth knowing. If you're moving toward production-grade Python applications with Gemini, Google Antigravity Python SDK Production Master Guide goes deep on architecture, retry strategies, and error handling at scale.