Recce AEO Assessment Report
by Novastacks AI
reccehq.com | Global / dbt Ecosystem Market
March 01, 2026 | Prepared by Novastacks AI
Site Readiness: 7.1 · LLM Visibility: 3.9
Built for AI Discovery. But ChatGPT Doesn't Know the Product Exists.
Recce has done something most SaaS companies haven't: deliberately engineered their site for AI crawlers. llms.txt, llms-full.txt, rich schema across the AI blog, and an open robots.txt with AI crawler comments. The technical foundation is genuinely ahead of its category.
The gap is at the LLM layer. In 6 ChatGPT queries — including a direct "What is Recce?" — Recce was cited zero times. ChatGPT returned the British military definition of "recce" (reconnaissance) and, when it did acknowledge the software product in a comparison query, described it inaccurately as an observability tool. The brand name collision is a structural problem that no amount of schema markup solves on its own.
Recce also operates in a market where Datafold holds a 49x keyword advantage (1,430 vs 29 US keywords). Every category query — data diff, dbt data quality, PR data review — is owned by Datafold or Elementary. Recce has zero top-10 organic positions on any non-branded term. Google's AI Overview for these category queries cites SYNQ, Metaplane, and Panto AI. Recce is absent despite owning the exact use case.
The path forward is clear: build the LLM citation chain Recce's site deserves — through third-party placements on Dev.to, Hashnode, and DZone, plus deliberate Reddit brand disambiguation — then migrate the blog from subdomain to main domain to consolidate the link equity Recce is already earning but splitting.
| Domain | Ranked Keywords | Est. Traffic (ETV) | #1 Positions | #2-3 | #4-10 |
|---|---|---|---|---|---|
| reccehq.com | 29 | 552 | 1 | 0 | 3 |
| datafold.com | 1430 | 3694 | 16 | 65 | 338 |
| elementary-data.com | 286 | 1271 | 1 | 21 | 55 |
† Taiwan organic data (location_code: 2158) was minimal across all three domains — consistent with the dbt ecosystem being a US/global market. All keyword and ETV data measured via DataForSEO Labs using US location (location_code: 2840).
‡ LLM visibility tested via ChatGPT GPT-4o (web_search enabled where indicated) using 6 English query types. Google AIO tested via DataForSEO SERP with AI Overview extraction. Assessment date: March 1, 2026.
ChatGPT Citation Rate
0 / 6Queries where Recce was cited — including a direct branded query
ChatGPT returned the military definition of 'recce' on a direct product query. Even with web search enabled, Recce did not appear in category or use-case queries. The brand name collision with British military slang is suppressing training-data association.
→ Section 02: AI Visibility
Keyword Gap vs Datafold
49×29 vs 1,430 ranked US keywords — Recce has zero top-10 category positions
Every category query that defines Recce's use case (data diff, dbt PR review, data quality) is owned by Datafold or Elementary. LLMs train on what ranks — Recce isn't in that pool.
→ Section 05: Content Competitiveness
Reddit Presence
0 threadsr/dataengineering and r/dbt — Recce's core communities
Searches for 'recce site:reddit.com' return only military and wargaming content. Reddit is a primary LLM training source for developer tools. Zero community presence means zero citation surface in the channels LLMs weight most for tooling recommendations.
→ Section 06: Brand & Positioning
† Keyword and ETV data sourced from DataForSEO Labs, US market (location_code: 2840). Taiwan data was minimal for all three domains — consistent with the dbt/data engineering ecosystem being global/US-primary. ‡ LLM visibility tested via ChatGPT GPT-4o with web_search enabled. Google AIO tested via live SERP AI Overview extraction. Lighthouse via DataForSEO on-page audit. Assessment date: March 1, 2026.
Zero Citations Across Six Queries — Including a Direct Brand Search
The military 'recce' homonym is actively suppressing LLM brand association.
ChatGPT Query Results
| Prompt Type | Query | Mentioned? | Who Was Cited |
|---|---|---|---|
| Branded | What is Recce? | No | Military reconnaissance definition returned; no association with the software product (web search disabled — training data only) |
| Competitor | What is Datafold? | No | Datafold correctly described as data diff and CI/CD integration tool; Recce not mentioned |
| Category | Best dbt data testing tools 2025 | No | Web search active; Elementary, DQLabs, Great Expectations, Soda cited — Recce absent |
| Use Case | Tools to review data pipeline changes before merging PRs | No | Jenkins, GitHub Actions, Great Expectations recommended; Recce absent despite owning this exact use case |
| Comparison | Compare Recce vs Datafold vs Elementary | Yes | Recce mentioned but INACCURATE — described as an observability/anomaly detection tool, not a data review agent (training data misclassification) |
| Long-tail | How to run data diff on dbt PR changes | No | Generic data-diff library and manual column checks recommended; Recce not mentioned (web search disabled) |
The brand name collision is the root cause: 'recce' is British slang for reconnaissance with thousands of military uses in LLM training data. Until Recce (the dbt tool) builds enough third-party signal volume to override this, branded ChatGPT queries will return military content. The fix requires deliberate off-site citation building — not on-site optimization.
Google AI Overview Results
| Query Type | Query | AIO Triggered? | Prospect Rank | Top Results |
|---|---|---|---|---|
| Branded | What is Recce AI data review agent | Yes | 2 | reccehq.com #2, docs.reccehq.com #1 — strong branded query ownership in Google AIO |
| Competitor | Datafold dbt data quality tool | No | N/A | No AIO; datafold.com dominates positions #1-3; Recce absent |
| Category | Best dbt data testing tools data quality 2025 | Yes | N/A | AIO triggered: SYNQ #1, Metaplane #2 — Recce absent from generated answer |
| Use Case | Tools to review data pipeline changes before merging | Yes | N/A | AIO triggered: Panto AI #1 — Recce absent despite owning this exact use case |
| Comparison | Recce vs Datafold vs Elementary | No | 1 | No AIO; blog.reccehq.com ranks #1 — self-authored comparison content is working |
| Long-tail | How to run data diff on dbt PR changes | Yes | 2 | AIO triggered; blog.reccehq.com #2 — surface-level visibility on this use case query |
Recce owns Google AIO for its own brand name but is absent from all category queries. SYNQ, Metaplane, and Panto AI appear in AIO for queries that define Recce's core use case — pipeline change review, dbt data testing. These tools rank because they have high-DA citation chains (G2 listings, community review sites, third-party comparisons). Recce's self-authored comparison blog post ranks #1 for 'Recce vs Datafold vs Elementary' — proof the content strategy works when it's targeting the right query.
Citation Surface Analysis
| Platform | Presence | Strength | Notable |
|---|---|---|---|
| GitHub | Strong | Strong | DataRecce org, 29 repos; includes AGENTS.md, CLAUDE.md, claude-plugin — deliberate AI crawler optimization ahead of category peers |
| blog.reccehq.com | Active | Moderate | 11 AI blog articles with rich schema (FAQPage, TechArticle, SpeakableSpecification) — on subdomain, splitting authority from main domain |
| Medium (Dave Flynn) | Yes | Moderate | In the Pipeline series; top article 240+ likes — independent author coverage is a strong LLM training signal |
| Product Hunt | Yes | Low | Listed; provides a third-party citation anchor but low engagement |
| YouTube | Yes | Weak | @data-recce channel; 40+ subscribers, 10+ videos — early stage foundation |
| Reddit (r/dataengineering, r/dbt) | No | Absent | Zero verified presence; brand name conflict floods search results with military/wargaming content — highest priority citation gap |
| G2 / Capterra | No | Absent | Not listed; G2 is a primary LLM citation source for developer tooling recommendations — direct cause of category AIO absence |
Recce has the right technical instincts (llms.txt, AI blog schema, GitHub crawler optimization) but the citation surface is lopsided: strong on owned channels, absent on the third-party platforms LLMs actually weight for tool recommendations. Reddit and G2 are the two highest-ROI gaps — both are free to fix.
Technical Readiness Leads the Category. Schema Coverage Doesn't.
95 performance score and llms.txt implemented. Homepage has zero structured data.
| Signal | Recce | datafold.com | elementary-data.com |
|---|---|---|---|
| Performance (Lighthouse) | ✓ 95/100 | Unknown | Unknown |
| Accessibility | ✓ 91/100 | Unknown | Unknown |
| SEO Score | ⚠ 85/100 | Unknown | Unknown |
| llms.txt / AI Crawler Optimization | ✓ Implemented | ✗ Not found | ✗ Not found |
| Schema (Homepage) | ✗ None | Unknown | Unknown |
| Schema (AI Blog) | ✓ FAQPage + TechArticle + SpeakableSpec | Unknown | Unknown |
| Blog on Main Domain | ✗ blog.reccehq.com (subdomain) | ✓ /blog/ | ✓ /blog/ |
| Analytics (GA4/GTM) | ✗ None detected | ✓ GA4 + GTM | ✓ GTM |
| robots.txt (AI crawlers) | ✓ Open + AI crawler comment | ✓ Open | ✓ Open |
| HTTPS | ✓ | ✓ | ✓ |
Recce's GitHub Pages static hosting is a genuine advantage — zero JavaScript overhead means AI crawlers receive full content immediately, unlike JS-rendered competitors. The 95 Lighthouse performance score is best-in-category. The two critical gaps are the homepage's complete lack of structured data (the AI blog schema hasn't been extended to the product pages) and the total absence of analytics instrumentation.
Three Fixable Issues Capping a 7.1 Site Readiness Score
Schema gaps, a missing analytics stack, and a subdomain blog — all solvable in under 30 days.
Homepage Has Zero Schema Markup — The Highest-Traffic Page Gets No LLM Structure
HighThe AI blog pages have excellent schema: FAQPage, TechArticle, SoftwareApplication, SpeakableSpecification. This investment was not extended to the homepage or pricing page — the two pages with the most inbound organic and direct traffic.
Google's AI Overview is most likely to trigger for a company when it finds SoftwareApplication or FAQPage schema on the main domain's root URL. Recce's homepage returns a plain HTML page with no structured data. ChatGPT's web-enabled responses pull from structured pages first; an unstructured homepage reduces citation confidence.
Fix: Add SoftwareApplication schema to the homepage with applicationCategory: "DataManagement", operatingSystem: "Web", and integration with dbt. Add FAQPage schema to the pricing page covering common buyer questions. Estimated effort: 2-4 hours.
Blog on Subdomain Splits Every Backlink Earned by Recce's Content
Mediumblog.reccehq.com ranks #1 for "Recce vs Datafold vs Elementary" and #2 for "how to run data diff on dbt PR changes." Every backlink pointing to these pages benefits blog.reccehq.com — not reccehq.com. The link equity that should be building the main domain's authority is stranded on a subdomain.
Dave Flynn's independent Medium coverage, Product Hunt listing, and GitHub references — when they link to blog posts — all flow to the subdomain. LLMs also treat the subdomain and main domain as separate entities, reducing the consolidated authority signal for Recce as a brand.
Fix: Migrate blog to reccehq.com/blog/ with 301 redirects from blog.reccehq.com/*. This is a GitHub Pages repo restructure. Estimated effort: 1-2 days of engineering. Impact is medium-term (6+ months as link equity consolidates).
No Analytics Instrumentation — Blind to Conversion Funnel Performance
MediumZero Google Analytics or Tag Manager detected on reccehq.com. Datafold and Elementary both instrument GA4 + GTM. Without analytics, there's no visibility into which pages drive trial signups, where the funnel drops, or which content pieces convert to paid customers.
For AEO purposes: GA4 data also feeds Google's understanding of user engagement signals, which indirectly influences how Google prioritizes Recce in AI Overviews vs competitors with higher measured engagement.
Fix: Install GA4 + GTM via GitHub Pages. Add GTM container, configure GA4 stream, set up conversion events for trial signups and docs visits. Estimated effort: 2-4 hours. This is table stakes for any growth program.
A 49× Keyword Gap and Zero Category Rankings
Datafold owns every search term that defines Recce's use case.
Every Core Use Case Query Is Owned by Datafold
CriticalRecce has 29 ranked US keywords — almost entirely branded. Datafold ranks for 1,430 keywords including 65 in positions #2-3. The specific queries that define Recce's use case have zero Recce presence:
- data diff dbt — no ranking
- dbt data quality — no ranking
- PR data review automation — no ranking
- dbt column lineage — ranking #12 (one position)
- data pipeline change review — no ranking
LLMs train on pages that rank for these terms. With no organic presence on category queries, Recce is invisible in the content pool LLMs draw from for tool recommendations. ChatGPT's failure to mention Recce in use-case queries isn't surprising — there's nothing in its training data to pull from.
Fix: Create a 6-month category content cluster targeting these exact queries. Start with landing pages: "Recce for dbt Data Diff" and "How to Review Data Pipeline Changes Before Merging PRs". Support with AI blog articles. This is a 6-month program, not a sprint.
| Domain | Ranked Keywords (US) | ETV | Top-3 Positions |
|---|---|---|---|
| reccehq.com | 29 | 552 | 1 |
| datafold.com | 1,430 | 3,694 | 81 |
| elementary-data.com | 286 | 1,271 | 22 |
Self-Authored Comparison Content Is Working — Scale It
Bright Spotblog.reccehq.com ranks #1 for "Recce vs Datafold vs Elementary" — a query Recce created and owns entirely. This comparison article also gets cited in Google AIO with Recce ranked #1. This is proof that Recce's content strategy works when it targets the right query type.
The pattern: self-authored comparison content targeting [Recce vs Competitor] queries produces citation-rich, high-intent pages that rank well. One article has generated AIO inclusion. The playbook needs to be scaled — 3-5 more comparison articles targeting adjacent queries ("Recce vs data-diff library", "Recce vs Great Expectations for dbt teams", "Recce vs Monte Carlo for small data teams") would build a comparison content moat.
Blog Content Blocked From Contributing to Main Domain Authority
MediumThe AI blog produces ranking content (data diff tutorials, dbt PR review guides) and earns external citations. But all of this SEO value accumulates at blog.reccehq.com — a subdomain. The main domain at reccehq.com doesn't benefit from any of this organic traction.
Google and LLMs evaluate domain-level authority. Every link, every ranking, every citation on the blog makes blog.reccehq.com a more authoritative domain in its own right. reccehq.com — the product domain — stays at 29 keywords. The content is being written; the authority is being wasted.
Strong Technical Brand. Weak Language Association.
ChatGPT thinks 'recce' means military reconnaissance. The community content to fix this doesn't exist yet.
Brand Name Collision with Military Slang Is Suppressing All LLM Association
Critical"Recce" is standard British English for reconnaissance. It appears in thousands of military training documents, wargaming communities, and tactical fiction — all of which are in LLM training data. When ChatGPT processes a query for "What is Recce?", it surfaces the dominant association: military.
This isn't a SEO problem. No amount of on-site optimization will override the volume of military-context "recce" in LLM training corpora. The fix requires building off-site citation volume that explicitly associates "Recce" (or "Recce dbt tool" / "Recce data review agent") with the software product:
- Dev.to, Hashnode, DZone: technical articles using "Recce (the dbt AI review tool)" consistently
- Reddit r/dataengineering, r/dbt: community posts using "Recce" in tool context
- G2 reviews: buyer-language descriptions of what Recce does
- GitHub Discussions and issues that reference the Recce product by name
Target: 20-30 distinct third-party sources that use "Recce" in a software/data context within 90 days.
ChatGPT Misidentifies the Product Category When It Does Respond
HighIn the direct comparison query ("Compare Recce vs Datafold vs Elementary"), ChatGPT acknowledged Recce's existence but described it as an observability/anomaly detection tool — not a data review agent or dbt PR companion. This is a training data misclassification problem: the most prominent descriptions of Recce in LLM training data are early-stage descriptions that predate the AI data review agent positioning.
Until third-party content using the correct positioning accumulates, ChatGPT will continue to misrepresent Recce's category to prospects doing AI-assisted research. A buyer comparing Recce to Datafold using ChatGPT right now gets an inaccurate product description.
Fix: All third-party content placements should consistently use the phrase "AI data review agent" and explicitly describe the dbt PR use case. This trains the LLM association over 6-12 months of new crawl cycles.
AI Crawler Optimization Is Genuinely Ahead of Category Peers
Bright SpotRecce's GitHub repository includes AGENTS.md, CLAUDE.md, and a claude-plugin — deliberate AI crawler optimization artifacts that virtually no SaaS company in this category has implemented. Combined with llms.txt and llms-full.txt on the main domain, Recce has built the infrastructure for AI-native brand discovery before most competitors have considered it.
The AI blog section's use of SpeakableSpecification schema — which explicitly marks content for voice AI and LLM extraction — is particularly sophisticated. These are the signals that will matter as AI crawler sophistication increases.
The gap: this infrastructure is ahead of the citation chain it needs to be effective. The technical readiness is in place; the third-party volume hasn't caught up yet.
90-Day Path From LLM Invisible to Category Cited
Quick schema wins in week 1. Citation chain built over 90 days.
Site Readiness Score
LLM Visibility Score
Horizon 1: Schema + Analytics (0-2 Weeks)
0-2 WeeksExtends Recce's existing schema investment to the pages that matter most. Installs the analytics foundation every growth program requires.
Add SoftwareApplication + FAQPage JSON-LD schema to the homepage (applicationCategory: DataManagement, integrations: dbt)
Add FAQPage schema to the pricing page covering the top 5 buyer questions
Add BreadcrumbList schema sitewide
Install GA4 + GTM via GitHub Pages — configure conversion events for trial signups and docs visits
Horizon 2: Build the Citation Chain (2-8 Weeks)
2-8 WeeksBegins overriding the military 'recce' association in LLM training data. Establishes third-party citation volume needed for category AI Overview inclusion.
Publish 3-5 technical articles on Dev.to, Hashnode, and DZone — consistently using 'Recce (the dbt AI review tool)' phrasing to build LLM software association
Create a G2 listing under 'Data Quality Tools' and 'DataOps' categories — directly addresses AIO absence for category queries
Launch targeted presence in r/dataengineering and r/dbt: contribution strategy using 'Recce' in tool context (not promotion — genuine problem-solving)
Scale self-authored comparison content: add 'Recce vs data-diff library', 'Recce vs Great Expectations for dbt', 'Recce vs Monte Carlo for small teams'
Horizon 3: Consolidate Authority (Q2 2026)
Q2 2026Closes the subdomain authority split. Connects the content investments from Horizon 2 to a single, authoritative domain.
Migrate blog from blog.reccehq.com to reccehq.com/blog/ with 301 redirects — consolidates all blog link equity to the main product domain
Publish 6 additional category landing pages targeting: data diff dbt, dbt data quality, PR data review, dbt column lineage, data pipeline change review
Submit Recce to Capterra and Slashdot (SourceForge) in DevOps/DataOps categories
Consider domain rename from reccehq.com to recce.dev or recceapp.com — evaluate brand equity vs homonym collision cost at 6-month mark
Agentic Marketing Systems, Built by Senior Operators
Novastacks is not an agency selling AI as a buzzword. We are senior marketing operators with decades of experience at Expedia, Tencent, Klook, and Traveloka who built enterprise-grade AI marketing systems from the ground up.
What We Do
- ▸AEO (Answer Engine Optimization) — Get your brand cited when prospects ask ChatGPT, Perplexity, and Google AI about your category
- ▸SEO Integration — Traditional search visibility that compounds with AI visibility
- ▸Custom AI Growth Systems — Agentic workflows, content engines, and data pipelines built for your business
- ▸Fractional Growth Partner — Senior strategic leadership without the full-time overhead
What You Get From Us
Solutions designed by operators who've led growth marketing and SEO at the Director/VP level. Not juniors following playbooks.
AI-powered workflows that move at machine speed. Audits, content, optimization, and reporting that would take a team weeks, delivered in days.
No bloated retainers. Scope of work tailored to your stage, budget, and goals. Start small, scale when you see results.
Ready to Be
Recommended by AI?
Book a 30-minute call with us. We'll walk through this assessment, answer your questions, and map out exactly what it takes to get Recce cited by ChatGPT and Perplexity when dbt teams are evaluating data review tools.
Book Discovery Call30 min | Free | No commitment