// AEO ASSESSMENT

Recce AEO Assessment Report
by Novastacks AI

reccehq.com | Global / dbt Ecosystem Market

March 01, 2026 | Prepared by Novastacks AI

5.1 /10

Good

Site Readiness: 7.1 · LLM Visibility: 3.9

Compared against: datafold.com elementary-data.com Recce

Scroll

// 01 EXECUTIVE SUMMARY

Built for AI Discovery. But ChatGPT Doesn't Know the Product Exists.

Recce has done something most SaaS companies haven't: deliberately engineered their site for AI crawlers. llms.txt, llms-full.txt, rich schema across the AI blog, and an open robots.txt with AI crawler comments. The technical foundation is genuinely ahead of its category.

The gap is at the LLM layer. In 6 ChatGPT queries — including a direct "What is Recce?" — Recce was cited zero times. ChatGPT returned the British military definition of "recce" (reconnaissance) and, when it did acknowledge the software product in a comparison query, described it inaccurately as an observability tool. The brand name collision is a structural problem that no amount of schema markup solves on its own.

Recce also operates in a market where Datafold holds a 49x keyword advantage (1,430 vs 29 US keywords). Every category query — data diff, dbt data quality, PR data review — is owned by Datafold or Elementary. Recce has zero top-10 organic positions on any non-branded term. Google's AI Overview for these category queries cites SYNQ, Metaplane, and Panto AI. Recce is absent despite owning the exact use case.

The path forward is clear: build the LLM citation chain Recce's site deserves — through third-party placements on Dev.to, Hashnode, and DZone, plus deliberate Reddit brand disambiguation — then migrate the blog from subdomain to main domain to consolidate the link equity Recce is already earning but splitting.

Domain	Ranked Keywords	Est. Traffic (ETV)	#1 Positions	#2-3	#4-10
reccehq.com	29	552	1	0	3
datafold.com	1430	3694	16	65	338
elementary-data.com	286	1271	1	21	55

^† Taiwan organic data (location_code: 2158) was minimal across all three domains — consistent with the dbt ecosystem being a US/global market. All keyword and ETV data measured via DataForSEO Labs using US location (location_code: 2840).

^‡ LLM visibility tested via ChatGPT GPT-4o (web_search enabled where indicated) using 6 English query types. Google AIO tested via DataForSEO SERP with AI Overview extraction. Assessment date: March 1, 2026.

ChatGPT Citation Rate

0 / 6

Queries where Recce was cited — including a direct branded query

ChatGPT returned the military definition of 'recce' on a direct product query. Even with web search enabled, Recce did not appear in category or use-case queries. The brand name collision with British military slang is suppressing training-data association.

→ Section 02: AI Visibility

Keyword Gap vs Datafold

49×

29 vs 1,430 ranked US keywords — Recce has zero top-10 category positions

Every category query that defines Recce's use case (data diff, dbt PR review, data quality) is owned by Datafold or Elementary. LLMs train on what ranks — Recce isn't in that pool.

→ Section 05: Content Competitiveness

Reddit Presence

0 threads

r/dataengineering and r/dbt — Recce's core communities

Searches for 'recce site:reddit.com' return only military and wargaming content. Reddit is a primary LLM training source for developer tools. Zero community presence means zero citation surface in the channels LLMs weight most for tooling recommendations.

→ Section 06: Brand & Positioning

† Keyword and ETV data sourced from DataForSEO Labs, US market (location_code: 2840). Taiwan data was minimal for all three domains — consistent with the dbt/data engineering ecosystem being global/US-primary. ‡ LLM visibility tested via ChatGPT GPT-4o with web_search enabled. Google AIO tested via live SERP AI Overview extraction. Lighthouse via DataForSEO on-page audit. Assessment date: March 1, 2026.

// 02 AI VISIBILITY

Zero Citations Across Six Queries — Including a Direct Brand Search

The military 'recce' homonym is actively suppressing LLM brand association.

ChatGPT Query Results

Prompt Type	Query	Mentioned?	Who Was Cited
Branded	What is Recce?	No	Military reconnaissance definition returned; no association with the software product (web search disabled — training data only)
Competitor	What is Datafold?	No	Datafold correctly described as data diff and CI/CD integration tool; Recce not mentioned
Category	Best dbt data testing tools 2025	No	Web search active; Elementary, DQLabs, Great Expectations, Soda cited — Recce absent
Use Case	Tools to review data pipeline changes before merging PRs	No	Jenkins, GitHub Actions, Great Expectations recommended; Recce absent despite owning this exact use case
Comparison	Compare Recce vs Datafold vs Elementary	Yes	Recce mentioned but INACCURATE — described as an observability/anomaly detection tool, not a data review agent (training data misclassification)
Long-tail	How to run data diff on dbt PR changes	No	Generic data-diff library and manual column checks recommended; Recce not mentioned (web search disabled)

The brand name collision is the root cause: 'recce' is British slang for reconnaissance with thousands of military uses in LLM training data. Until Recce (the dbt tool) builds enough third-party signal volume to override this, branded ChatGPT queries will return military content. The fix requires deliberate off-site citation building — not on-site optimization.

Google AI Overview Results

Query Type	Query	AIO Triggered?	Prospect Rank	Top Results
Branded	What is Recce AI data review agent	Yes	2	reccehq.com #2, docs.reccehq.com #1 — strong branded query ownership in Google AIO
Competitor	Datafold dbt data quality tool	No	N/A	No AIO; datafold.com dominates positions #1-3; Recce absent
Category	Best dbt data testing tools data quality 2025	Yes	N/A	AIO triggered: SYNQ #1, Metaplane #2 — Recce absent from generated answer
Use Case	Tools to review data pipeline changes before merging	Yes	N/A	AIO triggered: Panto AI #1 — Recce absent despite owning this exact use case
Comparison	Recce vs Datafold vs Elementary	No	1	No AIO; blog.reccehq.com ranks #1 — self-authored comparison content is working
Long-tail	How to run data diff on dbt PR changes	Yes	2	AIO triggered; blog.reccehq.com #2 — surface-level visibility on this use case query

Recce owns Google AIO for its own brand name but is absent from all category queries. SYNQ, Metaplane, and Panto AI appear in AIO for queries that define Recce's core use case — pipeline change review, dbt data testing. These tools rank because they have high-DA citation chains (G2 listings, community review sites, third-party comparisons). Recce's self-authored comparison blog post ranks #1 for 'Recce vs Datafold vs Elementary' — proof the content strategy works when it's targeting the right query.

ChatGPT Citations

0 / 6

Including direct branded query

Google AIO (Category)

0 / 4

Category/use case queries with AIO presence

AIO Branded Rank

reccehq.com on direct branded AIO query

Comparison Rank

blog.reccehq.com for 'Recce vs Datafold vs Elementary'

Citation Surface Analysis

Platform	Presence	Strength	Notable
GitHub	Strong	Strong	DataRecce org, 29 repos; includes AGENTS.md, CLAUDE.md, claude-plugin — deliberate AI crawler optimization ahead of category peers
blog.reccehq.com	Active	Moderate	11 AI blog articles with rich schema (FAQPage, TechArticle, SpeakableSpecification) — on subdomain, splitting authority from main domain
Medium (Dave Flynn)	Yes	Moderate	In the Pipeline series; top article 240+ likes — independent author coverage is a strong LLM training signal
Product Hunt	Yes	Low	Listed; provides a third-party citation anchor but low engagement
YouTube	Yes	Weak	@data-recce channel; 40+ subscribers, 10+ videos — early stage foundation
Reddit (r/dataengineering, r/dbt)	No	Absent	Zero verified presence; brand name conflict floods search results with military/wargaming content — highest priority citation gap
G2 / Capterra	No	Absent	Not listed; G2 is a primary LLM citation source for developer tooling recommendations — direct cause of category AIO absence

Recce has the right technical instincts (llms.txt, AI blog schema, GitHub crawler optimization) but the citation surface is lopsided: strong on owned channels, absent on the third-party platforms LLMs actually weight for tool recommendations. Reddit and G2 are the two highest-ROI gaps — both are free to fix.

Full AI Visibility Breakdown

We monitor what ChatGPT, Perplexity, and Google AI say about brands in your category. Our team has built proprietary tracking across every major LLM — the same systems we run for growth-stage companies.

Talk to Our Team to Unlock

30 min | Free | No commitment

// 03 SITE READINESS

Technical Readiness Leads the Category. Schema Coverage Doesn't.

95 performance score and llms.txt implemented. Homepage has zero structured data.

Site Readiness7.1/10

Signal	Recce	datafold.com	elementary-data.com
Performance (Lighthouse)	✓ 95/100	Unknown	Unknown
Accessibility	✓ 91/100	Unknown	Unknown
SEO Score	⚠ 85/100	Unknown	Unknown
llms.txt / AI Crawler Optimization	✓ Implemented	✗ Not found	✗ Not found
Schema (Homepage)	✗ None	Unknown	Unknown
Schema (AI Blog)	✓ FAQPage + TechArticle + SpeakableSpec	Unknown	Unknown
Blog on Main Domain	✗ blog.reccehq.com (subdomain)	✓ /blog/	✓ /blog/
Analytics (GA4/GTM)	✗ None detected	✓ GA4 + GTM	✓ GTM
robots.txt (AI crawlers)	✓ Open + AI crawler comment	✓ Open	✓ Open
HTTPS	✓	✓	✓

Recce's GitHub Pages static hosting is a genuine advantage — zero JavaScript overhead means AI crawlers receive full content immediately, unlike JS-rendered competitors. The 95 Lighthouse performance score is best-in-category. The two critical gaps are the homepage's complete lack of structured data (the AI blog schema hasn't been extended to the product pages) and the total absence of analytics instrumentation.

Performance Score

95 / 100

GitHub Pages static hosting — fastest in category

Accessibility

91 / 100

Above average; minor WCAG gaps remain

SEO Score

85 / 100

Schema gaps on homepage suppressing this score

llms.txt

✓ Live

llms.txt + llms-full.txt — ahead of all competitors

Site Readiness Analysis

Our senior operators audit every page AI crawlers evaluate — from schema markup to content depth to technical infrastructure. We identify exactly what to fix and in what order.

Talk to Our Team to Unlock

30 min | Free | No commitment

// 04 SITE INFRASTRUCTURE

Three Fixable Issues Capping a 7.1 Site Readiness Score

Schema gaps, a missing analytics stack, and a subdomain blog — all solvable in under 30 days.

Homepage Has Zero Schema Markup — The Highest-Traffic Page Gets No LLM Structure

High

The AI blog pages have excellent schema: FAQPage, TechArticle, SoftwareApplication, SpeakableSpecification. This investment was not extended to the homepage or pricing page — the two pages with the most inbound organic and direct traffic.

Google's AI Overview is most likely to trigger for a company when it finds SoftwareApplication or FAQPage schema on the main domain's root URL. Recce's homepage returns a plain HTML page with no structured data. ChatGPT's web-enabled responses pull from structured pages first; an unstructured homepage reduces citation confidence.

Fix: Add SoftwareApplication schema to the homepage with applicationCategory: "DataManagement", operatingSystem: "Web", and integration with dbt. Add FAQPage schema to the pricing page covering common buyer questions. Estimated effort: 2-4 hours.

Blog on Subdomain Splits Every Backlink Earned by Recce's Content

Medium

blog.reccehq.com ranks #1 for "Recce vs Datafold vs Elementary" and #2 for "how to run data diff on dbt PR changes." Every backlink pointing to these pages benefits blog.reccehq.com — not reccehq.com. The link equity that should be building the main domain's authority is stranded on a subdomain.

Dave Flynn's independent Medium coverage, Product Hunt listing, and GitHub references — when they link to blog posts — all flow to the subdomain. LLMs also treat the subdomain and main domain as separate entities, reducing the consolidated authority signal for Recce as a brand.

Fix: Migrate blog to reccehq.com/blog/ with 301 redirects from blog.reccehq.com/*. This is a GitHub Pages repo restructure. Estimated effort: 1-2 days of engineering. Impact is medium-term (6+ months as link equity consolidates).

No Analytics Instrumentation — Blind to Conversion Funnel Performance

Medium

Zero Google Analytics or Tag Manager detected on reccehq.com. Datafold and Elementary both instrument GA4 + GTM. Without analytics, there's no visibility into which pages drive trial signups, where the funnel drops, or which content pieces convert to paid customers.

For AEO purposes: GA4 data also feeds Google's understanding of user engagement signals, which indirectly influences how Google prioritizes Recce in AI Overviews vs competitors with higher measured engagement.

Fix: Install GA4 + GTM via GitHub Pages. Add GTM container, configure GA4 stream, set up conversion events for trial signups and docs visits. Estimated effort: 2-4 hours. This is table stakes for any growth program.

Infrastructure Deep-Dive

We don't just identify issues — we implement fixes. Schema markup, redirects, analytics instrumentation — our team builds and operates the technical infrastructure that moves LLM citation scores.

Talk to Our Team to Unlock

30 min | Free | No commitment

// 05 CONTENT COMPETITIVENESS

A 49× Keyword Gap and Zero Category Rankings

Datafold owns every search term that defines Recce's use case.

Every Core Use Case Query Is Owned by Datafold

Critical

Recce has 29 ranked US keywords — almost entirely branded. Datafold ranks for 1,430 keywords including 65 in positions #2-3. The specific queries that define Recce's use case have zero Recce presence:

data diff dbt — no ranking
dbt data quality — no ranking
PR data review automation — no ranking
dbt column lineage — ranking #12 (one position)
data pipeline change review — no ranking

LLMs train on pages that rank for these terms. With no organic presence on category queries, Recce is invisible in the content pool LLMs draw from for tool recommendations. ChatGPT's failure to mention Recce in use-case queries isn't surprising — there's nothing in its training data to pull from.

Fix: Create a 6-month category content cluster targeting these exact queries. Start with landing pages: "Recce for dbt Data Diff" and "How to Review Data Pipeline Changes Before Merging PRs". Support with AI blog articles. This is a 6-month program, not a sprint.

Domain	Ranked Keywords (US)	ETV	Top-3 Positions
reccehq.com	29	552	1
datafold.com	1,430	3,694	81
elementary-data.com	286	1,271	22

Self-Authored Comparison Content Is Working — Scale It

Bright Spot

blog.reccehq.com ranks #1 for "Recce vs Datafold vs Elementary" — a query Recce created and owns entirely. This comparison article also gets cited in Google AIO with Recce ranked #1. This is proof that Recce's content strategy works when it targets the right query type.

The pattern: self-authored comparison content targeting [Recce vs Competitor] queries produces citation-rich, high-intent pages that rank well. One article has generated AIO inclusion. The playbook needs to be scaled — 3-5 more comparison articles targeting adjacent queries ("Recce vs data-diff library", "Recce vs Great Expectations for dbt teams", "Recce vs Monte Carlo for small data teams") would build a comparison content moat.

Blog Content Blocked From Contributing to Main Domain Authority

Medium

The AI blog produces ranking content (data diff tutorials, dbt PR review guides) and earns external citations. But all of this SEO value accumulates at blog.reccehq.com — a subdomain. The main domain at reccehq.com doesn't benefit from any of this organic traction.

Google and LLMs evaluate domain-level authority. Every link, every ranking, every citation on the blog makes blog.reccehq.com a more authoritative domain in its own right. reccehq.com — the product domain — stays at 29 keywords. The content is being written; the authority is being wasted.

Competitive Content Analysis

We benchmark content against competitors using the signals LLMs prioritize: category authority, structural depth, and third-party corroboration. Then we design content systems that close the gap systematically.

Talk to Our Team to Unlock

30 min | Free | No commitment

// 06 BRAND & POSITIONING

Strong Technical Brand. Weak Language Association.

ChatGPT thinks 'recce' means military reconnaissance. The community content to fix this doesn't exist yet.

Brand Name Collision with Military Slang Is Suppressing All LLM Association

Critical

"Recce" is standard British English for reconnaissance. It appears in thousands of military training documents, wargaming communities, and tactical fiction — all of which are in LLM training data. When ChatGPT processes a query for "What is Recce?", it surfaces the dominant association: military.

This isn't a SEO problem. No amount of on-site optimization will override the volume of military-context "recce" in LLM training corpora. The fix requires building off-site citation volume that explicitly associates "Recce" (or "Recce dbt tool" / "Recce data review agent") with the software product:

Dev.to, Hashnode, DZone: technical articles using "Recce (the dbt AI review tool)" consistently
Reddit r/dataengineering, r/dbt: community posts using "Recce" in tool context
G2 reviews: buyer-language descriptions of what Recce does
GitHub Discussions and issues that reference the Recce product by name

Target: 20-30 distinct third-party sources that use "Recce" in a software/data context within 90 days.

ChatGPT Misidentifies the Product Category When It Does Respond

High

In the direct comparison query ("Compare Recce vs Datafold vs Elementary"), ChatGPT acknowledged Recce's existence but described it as an observability/anomaly detection tool — not a data review agent or dbt PR companion. This is a training data misclassification problem: the most prominent descriptions of Recce in LLM training data are early-stage descriptions that predate the AI data review agent positioning.

Until third-party content using the correct positioning accumulates, ChatGPT will continue to misrepresent Recce's category to prospects doing AI-assisted research. A buyer comparing Recce to Datafold using ChatGPT right now gets an inaccurate product description.

Fix: All third-party content placements should consistently use the phrase "AI data review agent" and explicitly describe the dbt PR use case. This trains the LLM association over 6-12 months of new crawl cycles.

AI Crawler Optimization Is Genuinely Ahead of Category Peers

Bright Spot

Recce's GitHub repository includes AGENTS.md, CLAUDE.md, and a claude-plugin — deliberate AI crawler optimization artifacts that virtually no SaaS company in this category has implemented. Combined with llms.txt and llms-full.txt on the main domain, Recce has built the infrastructure for AI-native brand discovery before most competitors have considered it.

The AI blog section's use of SpeakableSpecification schema — which explicitly marks content for voice AI and LLM extraction — is particularly sophisticated. These are the signals that will matter as AI crawler sophistication increases.

The gap: this infrastructure is ahead of the citation chain it needs to be effective. The technical readiness is in place; the third-party volume hasn't caught up yet.

Brand Perception Analysis

AI platforms form opinions about brands based on third-party signals — not your own website. We track sentiment, citation sources, and positioning across LLMs, then build the authority signals that shift how AI describes you.

Talk to Our Team to Unlock

30 min | Free | No commitment

// 07 ROADMAP & IMPACT

90-Day Path From LLM Invisible to Category Cited

Quick schema wins in week 1. Citation chain built over 90 days.

Site Readiness Score

Current7.1/10

Projected8.5/10

LLM Visibility Score

Current3.9/10

Projected6.5/10

Horizon 1: Schema + Analytics (0-2 Weeks)

0-2 Weeks

Extends Recce's existing schema investment to the pages that matter most. Installs the analytics foundation every growth program requires.

Add SoftwareApplication + FAQPage JSON-LD schema to the homepage (applicationCategory: DataManagement, integrations: dbt)

Add FAQPage schema to the pricing page covering the top 5 buyer questions

Add BreadcrumbList schema sitewide

Install GA4 + GTM via GitHub Pages — configure conversion events for trial signups and docs visits

Horizon 2: Build the Citation Chain (2-8 Weeks)

2-8 Weeks

Begins overriding the military 'recce' association in LLM training data. Establishes third-party citation volume needed for category AI Overview inclusion.

Publish 3-5 technical articles on Dev.to, Hashnode, and DZone — consistently using 'Recce (the dbt AI review tool)' phrasing to build LLM software association

Create a G2 listing under 'Data Quality Tools' and 'DataOps' categories — directly addresses AIO absence for category queries

Launch targeted presence in r/dataengineering and r/dbt: contribution strategy using 'Recce' in tool context (not promotion — genuine problem-solving)

Scale self-authored comparison content: add 'Recce vs data-diff library', 'Recce vs Great Expectations for dbt', 'Recce vs Monte Carlo for small teams'

Horizon 3: Consolidate Authority (Q2 2026)

Q2 2026

Closes the subdomain authority split. Connects the content investments from Horizon 2 to a single, authoritative domain.

Migrate blog from blog.reccehq.com to reccehq.com/blog/ with 301 redirects — consolidates all blog link equity to the main product domain

Publish 6 additional category landing pages targeting: data diff dbt, dbt data quality, PR data review, dbt column lineage, data pipeline change review

Submit Recce to Capterra and Slashdot (SourceForge) in DevOps/DataOps categories

Consider domain rename from reccehq.com to recce.dev or recceapp.com — evaluate brand equity vs homonym collision cost at 6-month mark

Your Prioritized Roadmap

Every engagement starts with a clear plan: what to fix first, expected impact, and measurable milestones. We Discover, Design, Build, and Operate — you get results, not deliverables.

Talk to Our Team to Unlock

30 min | Free | No commitment

// WHO WE ARE

Agentic Marketing Systems, Built by Senior Operators

Novastacks is not an agency selling AI as a buzzword. We are senior marketing operators with decades of experience at Expedia, Tencent, Klook, and Traveloka who built enterprise-grade AI marketing systems from the ground up.

What We Do

▸AEO (Answer Engine Optimization) — Get your brand cited when prospects ask ChatGPT, Perplexity, and Google AI about your category
▸SEO Integration — Traditional search visibility that compounds with AI visibility
▸Custom AI Growth Systems — Agentic workflows, content engines, and data pipelines built for your business
▸Fractional Growth Partner — Senior strategic leadership without the full-time overhead

What You Get From Us

Head-Level Strategy

Solutions designed by operators who've led growth marketing and SEO at the Director/VP level. Not juniors following playbooks.

Agentic Execution

AI-powered workflows that move at machine speed. Audits, content, optimization, and reporting that would take a team weeks, delivered in days.

Flexible Engagement

No bloated retainers. Scope of work tailored to your stage, budget, and goals. Start small, scale when you see results.

Ready to Be
Recommended by AI?

Book a 30-minute call with us. We'll walk through this assessment, answer your questions, and map out exactly what it takes to get Recce cited by ChatGPT and Perplexity when dbt teams are evaluating data review tools.

Book Discovery Call

30 min | Free | No commitment

Recce AEO Assessment Reportby Novastacks AI

Built for AI Discovery. But ChatGPT Doesn't Know the Product Exists.

ChatGPT Citation Rate

Keyword Gap vs Datafold

Reddit Presence

Zero Citations Across Six Queries — Including a Direct Brand Search

ChatGPT Query Results

Google AI Overview Results

Citation Surface Analysis

Full AI Visibility Breakdown

Technical Readiness Leads the Category. Schema Coverage Doesn't.

Site Readiness Analysis

Three Fixable Issues Capping a 7.1 Site Readiness Score

Homepage Has Zero Schema Markup — The Highest-Traffic Page Gets No LLM Structure

Blog on Subdomain Splits Every Backlink Earned by Recce's Content

No Analytics Instrumentation — Blind to Conversion Funnel Performance

Infrastructure Deep-Dive

A 49× Keyword Gap and Zero Category Rankings

Every Core Use Case Query Is Owned by Datafold

Self-Authored Comparison Content Is Working — Scale It

Blog Content Blocked From Contributing to Main Domain Authority

Competitive Content Analysis

Strong Technical Brand. Weak Language Association.

Brand Name Collision with Military Slang Is Suppressing All LLM Association

ChatGPT Misidentifies the Product Category When It Does Respond

AI Crawler Optimization Is Genuinely Ahead of Category Peers

Brand Perception Analysis

90-Day Path From LLM Invisible to Category Cited

Site Readiness Score

LLM Visibility Score

Horizon 1: Schema + Analytics (0-2 Weeks)

Horizon 2: Build the Citation Chain (2-8 Weeks)

Horizon 3: Consolidate Authority (Q2 2026)

Your Prioritized Roadmap

Agentic Marketing Systems, Built by Senior Operators

What We Do

What You Get From Us

Ready to BeRecommended by AI?

Recce AEO Assessment Report
by Novastacks AI

Ready to Be
Recommended by AI?