AWS and Vexcel Optimize Multimodal AI for Searchable Aerial Imagery

A new AWS-Vexcel collaboration reveals the optimal mix of embedding models, fusion strategies, and captioning for searching aerial imagery by natural language. The findings challenge one-size-fits-all approaches and highlight feature-dependent performance gaps.

By Inside AI June 22, 2026
AI neural network visualization

June 23, 2026, (Inside AI) — Aerial imagery holds billions of pixels across 45+ countries, yet turning them into a natural-language-searchable knowledge base has long required manual inspection or custom computer vision models. A new collaboration between AWS and Vexcel, a major geospatial data provider, shows that multimodal embeddings, large language model captioning, and vector search on Amazon Bedrock and Amazon OpenSearch Serverless can index once and query with plain English.

The work, now evolved into the Vexcel Intelligence product, tackled a core question: what is the optimal combination of embedding model, fusion strategy, captioning, and search method for multi-view aerial imagery? The answer emerged from roughly 100 configurations tested on two benchmark queries in Chicago's Grant Park—"swimming pools" and "roads."

Amazon Nova Multimodal Embeddings delivered the highest average F1 scores: 0.621 for pools and 0.555 for roads. Caption integration proved the single most impactful optimization, boosting best-configuration F1 by 11% for pools and 13% for roads. Yet no single fusion or search method dominated; performance varied sharply by feature type, underscoring the need for modular, evaluable architectures.

Why Geospatial Search Demands a New Playbook

Unlike consumer photo search, each aerial tile comprises seven complementary views: an orthophoto, four oblique angles, a digital surface model, and a digital terrain model. A building's façade might only appear from the south oblique; tree canopy in the DSM can obscure ground features. An embedding model must fuse these perspectives, but how?

Ground truth is another hurdle. Without large labeled datasets, the team used OpenStreetMap to automate evaluation. They also had to define "correct"—tile-level matches (at least one feature present) versus entity-level matches (every pool counted)—which reward different system behaviors.

The modular pipeline, built on Amazon Bedrock and OpenSearch Serverless, let engineers swap embedding models, fusion strategies, and search methods via configuration. This enabled rapid A/B testing across five stages: area-of-interest selection, imagery ingestion, embedding and indexing, search, and evaluation against OpenStreetMap.

Experiments Expose Feature-Dependent Tradeoffs

Four experiments revealed actionable insights. First, model choice matters: Amazon Nova Multimodal Embeddings outperformed Cohere Embed v4 and Amazon Titan Multimodal Embeddings G1, especially on distributed features like roads. Second, fusion strategy is feature-dependent: Cohere batch and attention fusion tied for pools at F1 0.638, but attention fusion led for roads at 0.535, while Cohere batch dropped to 0.479.

Third, captions from Amazon Nova 2 Lite significantly boosted results when combined with image embeddings, but text-only search fell 17% short. Caption vocabulary also affected metadata filtering. Fourth, search method performance diverged: basic k-NN, image+caption fusion, and metadata filtering all hit F1 0.638 for pools, but metadata filtering collapsed to 0.358 for roads due to inconsistent tagging.

"The optimal search method depends entirely on the feature type," the team noted. They recommend starting with basic k-NN over caption-enriched embeddings for consistency, then adding specialized methods for underperforming query categories. The evaluation framework also computes nDCG and stratified metrics to reveal how systems handle sparse versus dense tiles.

Vexcel Intelligence is now in preview, offering searchable vector embeddings and an API across its global library. The collaboration also delivered an AI-powered code onboarding chat service for Vexcel's engineers. As new models launch on Amazon Bedrock, the pipeline can swap them in with a configuration change, immediately measuring impact through the evaluation harness.

More from Inside AI

  • Machine Learning

    Anthropic Accuses China’s Alibaba of Largest-Ever Claude AI Model Theft

    June 25, 2026
  • Generative AI

    China’s Z.ai Narrows AI Frontier Gap with GLM-5.2 After Anthropic Shutdown

    June 25, 2026
  • Artificial Intelligence (AI)

    Amazon Pours $13 Billion into India AI Data Centres as Cloud War Intensifies

    June 25, 2026
  • Artificial Intelligence (AI)

    Mumbai Embraces AI Crowd Monitoring at Top Sites Before Ganeshotsav

    June 25, 2026
  • Artificial Intelligence (AI)

    China’s AI and Rare Earth Leverage Exposes Fragile U.S. Ties, Scholar Warns

    June 25, 2026
  • Machine Learning

    IBM Unveils 0.7nm Chip Tech, Stacking Transistors in 3D for AI Era

    June 25, 2026
  • Generative AI

    Facebook Launches AI-Powered Creator Studio App in India to Boost Creator Growth

    June 25, 2026
  • Agentic AI

    MIT and Microsoft’s Murakkab Slashes AI Agent Energy Use by 73%

    June 25, 2026

Never Miss a Breakthrough

Join 50,000+ readers who get our daily AI intelligence briefing. No fluff, just what matters.

Inside AI is an independent publication covering artificial intelligence news, machine learning research, and the tools shaping the future of technology. No fluff. No hype. Just what matters.

Topics

  • Artificial Intelligence
  • Machine Learning
  • Generative AI
  • Agentic AI
  • Vibe Coding
  • Prompt Engineering
  • AI Tools & Reviews (Coming soon)

Company

  • Editorial Standards
  • Privacy Policy
  • Terms of Service
  • Contact

© 2026 Inside AI. All rights reserved.

Designed by Blue Flare Digital