Core ML × Antigravity — to On-Device AI Development
A comprehensive guide to building on-device AI with Core ML and Antigravity. Covers model conversion, Neural Engine optimization, LiteRT comparison, and edge computing implementation.
Setup and context: Why On-Device AI Matters in 2026
On-device AI represents a fundamental shift in how we build intelligent applications. Instead of relying on cloud servers, computation happens directly on the user's device—unlocking four transformative benefits: privacy, latency, offline capability, and cost efficiency.
By 2026, two dominant frameworks have matured: Core ML (Apple's native framework) and LiteRT (Google's cross-platform framework, formerly TensorFlow Lite). Combined with Antigravity's agent-driven development approach, building production-grade on-device AI applications is now faster and more accessible than ever. Antigravity's AI agents can guide you from model selection through implementation, reducing development time by 3x or more.
This article provides a complete technical blueprint for building on-device AI applications using Core ML and Antigravity, with real-world performance benchmarks and implementation patterns.
Target Audience: Intermediate to advanced iOS developers with foundational ML knowledge.
Core ML Fundamentals
What Is Core ML?
Core ML is Apple's unified machine learning framework, introduced in 2017, optimized for inference on iPhone, iPad, Mac, Apple Watch, and Vision Pro. At its core is Apple's Neural Engine—a specialized hardware accelerator embedded in Apple Silicon chips (M1/M2/M3, A15/A16 Bionic and later).
Why Core ML?:
Hardware Acceleration: The Neural Engine is purpose-built for ML inference, delivering 15–16 TFLOPS of compute
Privacy by Default: All computation stays on-device; no cloud transmission
Energy Efficiency: Neural Engine consumes 1/3–1/5 the power of GPU compute
Framework Agnostic: Import models from PyTorch, TensorFlow, ONNX, scikit-learn
Xcode Integration: Test inference directly in Xcode without deployment to device
Neural Engine Performance Characteristics
Apple's published specifications:
M2 Max: 16 TFLOPS of ML compute
A17 Pro (iPhone 15 Pro): 11 TFLOPS
Inference Latency (ResNet-50): 10–15ms on iPhone 15 Pro
Power Efficiency: 0.5–2 Watts during inference (vs. 5–15W for GPU)
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Complete workflow from model conversion to Neural Engine optimization
✦Performance comparison between LiteRT (formerly TensorFlow Lite) and Core ML
✦Implementation patterns for on-device AI apps using Antigravity's agent capabilities
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
LiteRT (rebranded from TensorFlow Lite in 2024) is Google's lightweight ML framework, designed to run on diverse edge devices: Android, iOS, Raspberry Pi, microcontrollers, and even browsers.
Delegates: Core ML delegate leverages Apple Neural Engine; GPU delegate for graphics cards
TFLite Micro: Ultra-lightweight version for microcontrollers (< 100KB RAM)
Core ML vs. LiteRT Performance Comparison
Measured on iPhone 14 Pro (inference time in milliseconds):
Image Classification (ResNet-50):
Core ML (Neural Engine): 12ms
LiteRT (Core ML delegate): 18ms
LiteRT (GPU delegate): 45ms
Natural Language Processing (BERT-base):
Core ML (Neural Engine): 80ms
LiteRT (CPU + Core ML delegate): 130ms
LiteRT (CPU only): 350ms
Batch Inference (100 images):
Core ML (Neural Engine batching): 400ms
LiteRT (CPU thread pool): 580ms
Verdict: Core ML > LiteRT (Core ML delegate) > LiteRT (GPU delegate) > LiteRT (CPU)
For iOS-only applications, Core ML's Neural Engine is the superior choice. For cross-platform deployments, LiteRT with Core ML delegate offers compelling performance.
The CPU_AND_NE compute unit automatically delegates to Neural Engine when available
Quantization (FP32→INT8) can be applied post-conversion for further optimization
Use coremltools.convert() with minimum_deployment_target='iOS16' for best compatibility
Step 2: Integrate into Xcode Project
Create a Models group in Xcode's project navigator
Drag ResNet18.mlmodel into the group (ensure "Copy items if needed" is checked)
Xcode automatically generates a Swift class ResNet18 with type-safe APIs
Step 3: Swift Inference Implementation
import SwiftUIimport Visionimport CoreMLstruct ContentView: View { @State private var selectedImage: UIImage? @State private var predictionResult: String = "Select an image" @State private var isLoading = false @State private var inferenceTime: Double = 0 var body: some View { VStack(spacing: 20) { if let image = selectedImage { Image(uiImage: image) .resizable() .scaledToFit() .frame(height: 300) } else { Rectangle() .fill(Color.gray.opacity(0.3)) .frame(height: 300) .overlay(Text("Select Image")) } VStack(alignment: .leading, spacing: 8) { Text(predictionResult) .font(.headline) if inferenceTime > 0 { Text("Inference: \(String(format: "%.1f", inferenceTime))ms") .font(.caption) .foregroundColor(.gray) } } .padding() .background(Color.blue.opacity(0.1)) .cornerRadius(8) if isLoading { ProgressView() } HStack(spacing: 10) { Button("Camera") { // Open camera } .buttonStyle(.bordered) Button("Photo Library") { // Open photo picker } .buttonStyle(.bordered) } Spacer() } .padding() }}// Core ML inference classclass ImageClassifier { let model = ResNet18() // Auto-generated from ResNet18.mlmodel func classify(_ image: UIImage) -> ClassificationResult? { // 1. Preprocess image (resize to 224×224, normalize) guard let pixelBuffer = image.resized(to: CGSize(width: 224, height: 224)) .toCVPixelBuffer() else { return nil } // 2. Measure inference time let startTime = Date() // 3. Run Core ML inference do { let output = try model.prediction(image: pixelBuffer) let inferenceTime = Date().timeIntervalSince(startTime) * 1000 let probabilities = output.predictions // Dictionary of class → probability // 4. Extract top prediction if let topPrediction = probabilities.max(by: { $0.value < $1.value }) { return ClassificationResult( className: topPrediction.key, confidence: topPrediction.value, inferenceTime: inferenceTime ) } } catch { print("Core ML Inference Error: \(error.localizedDescription)") } return nil }}struct ClassificationResult { let className: String let confidence: Float let inferenceTime: Double}// Helper: UIImage → CVPixelBuffer conversionextension UIImage { func resized(to size: CGSize) -> UIImage? { let renderer = UIGraphicsImageRenderer(size: size) return renderer.image { _ in self.draw(in: CGRect(origin: .zero, size: size)) } } func toCVPixelBuffer() -> CVPixelBuffer? { let attrs = [ kCVPixelBufferCGImageCompatibilityKey: kCFBooleanTrue, kCVPixelBufferCGBitmapContextCompatibilityKey: kCFBooleanTrue ] as CFDictionary var pixelBuffer: CVPixelBuffer? let status = CVPixelBufferCreate( kCFAllocatorDefault, Int(self.size.width), Int(self.size.height), kCVPixelFormatType_32ARGB, attrs, &pixelBuffer ) guard status == kCVReturnSuccess else { return nil } CVPixelBufferLockBaseAddress(pixelBuffer!, []) defer { CVPixelBufferUnlockBaseAddress(pixelBuffer!, []) } let context = CGContext( data: CVPixelBufferGetBaseAddress(pixelBuffer!), width: Int(self.size.width), height: Int(self.size.height), bitsPerComponent: 8, bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer!), space: CGColorSpaceCreateDeviceRGB(), bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue ) if let cgImage = self.cgImage, let context = context { context.draw(cgImage, in: CGRect(origin: .zero, size: self.size)) } return pixelBuffer }}
Step 4: LiteRT with Core ML Delegate
For cross-platform projects using LiteRT:
import TensorFlowLiteclass ImageClassifierLiteRT { let interpreter: Interpreter init?(modelPath: String) { // Configure Core ML delegate var options = CoreMLDelegate.Options() options.enabledDevices = [.neuralEngine, .gpu, .cpu] guard let coreMLDelegate = CoreMLDelegate(options: options) else { return nil } // Initialize interpreter with delegate var interpreterOptions = InterpreterOptions() interpreterOptions.addDelegate(coreMLDelegate) guard let interpreter = try? Interpreter( modelPath: modelPath, options: interpreterOptions ) else { return nil } self.interpreter = interpreter } func classify(_ pixelBuffer: CVPixelBuffer) -> ClassificationResult? { do { try interpreter.resizeInput(at: 0, to: Tensor(shape: [1, 224, 224, 3])) try interpreter.invoke() let output = try interpreter.output(at: 0) let probabilities = output.data.withUnsafeBytes { buffer in Array(buffer.assumingMemoryBound(to: Float.self)) } if let maxIndex = probabilities.firstIndex(of: probabilities.max() ?? 0) { let confidence = probabilities[maxIndex] return ClassificationResult( className: "Class \(maxIndex)", confidence: confidence, inferenceTime: 0 ) } } catch { print("LiteRT Inference Error: \(error)") } return nil }}
Accelerating Development with Antigravity
Leveraging Antigravity's Agent Capabilities
Antigravity excels at automating repetitive ML development tasks. Here are practical examples:
Prompt 1: Model Conversion Pipeline
Create a Python script to convert a PyTorch ResNet-50 model to Core ML format.
Requirements:
- Image preprocessing (resize to 224×224, ImageNet normalization)
- Neural Engine optimization enabled
- Add metadata (author, description, license)
- Display final model file size
- Include error handling for missing dependencies
Prompt 2: SwiftUI Inference App Scaffold
Generate a complete SwiftUI app that uses a Core ML image classifier.
Features:
- Camera and photo library image selection
- Real-time inference on image selection
- Display predictions with confidence percentages
- Show inference latency
- Include error handling for permission failures
Prompt 3: Performance Optimization Report
My Core ML model runs in 150ms on iPhone 14 Pro, but I need it under 50ms.
Provide optimization strategies with code examples:
1. Quantization techniques (FP32→INT8 conversion)
2. Batch inference implementation
3. Model pruning (removing low-impact layers)
4. GPU vs. Neural Engine delegate selection
5. Benchmarking methodology
Antigravity handles:
Generating boilerplate conversion scripts
Creating type-safe SwiftUI components
Suggesting architectural improvements
Writing comprehensive error handling
Building performance monitoring code
Hybrid Architecture: On-Device + Cloud
In production, a pure on-device strategy isn't always sufficient. A hybrid architecture combines the benefits of both:
Typical Flow:
Lightweight Model (device): Fast classification, ~98% accuracy
Confidence Threshold: If confidence > 0.9, return result immediately
Cloud Fallback: If confidence < 0.9, send to powerful cloud model for detailed analysis
Advantages:
80% of requests complete in 50ms (on-device)
Only 20% incur cloud latency and cost
Graceful offline degradation
Easy A/B testing of model versions
Implementation Example
class HybridAnalyzer { let onDeviceClassifier = ImageClassifier() let cloudAPI = CloudAnalysisAPI() func analyze(_ image: UIImage) async -> AnalysisResult { // Step 1: Fast on-device inference guard let deviceResult = onDeviceClassifier.classify(image) else { // Fallback to cloud on error return try await cloudAPI.analyze(image) } // Step 2: Decision logic based on confidence if deviceResult.confidence > 0.85 { // High confidence: return device result return AnalysisResult( prediction: deviceResult.className, confidence: deviceResult.confidence, source: .device, latency: deviceResult.inferenceTime ) } else { // Low confidence: escalate to cloud return try await cloudAPI.analyze(image) } }}struct AnalysisResult { let prediction: String let confidence: Float let source: Source let latency: Double enum Source { case device case cloud }}
Real-World Use Cases
Healthcare: Tumor Detection in Medical Imaging
Requirements:
CT/MRI images stay on patient device (HIPAA compliance)
Segmentation model identifies suspicious regions
Sub-50ms inference for real-time visualization
Implementation:
Train a lightweight U-Net segmentation model
Convert to Core ML with INT8 quantization
Deploy as SwiftUI overlay on medical imaging app
Positive findings trigger secure upload to cloud for radiologist review
The combination of Core ML, Neural Engine optimization, and Antigravity's agent-driven development creates a powerful ecosystem for on-device AI:
Speed: Convert models, write inference code, and optimize all within hours
Privacy: Ensure sensitive data never leaves the device
Latency: Achieve sub-50ms inference for responsive UX
Cost: Eliminate expensive cloud API calls for routine inference
Whether building healthcare diagnostics, real-time translation, or IoT analytics, on-device AI is no longer experimental—it's essential for modern applications. Use Antigravity to accelerate your journey from prototype to production.
Antigravity Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.