June 2026
•
10 min read
Proof, Not Promises: A New Bar for Behavioral AI



By Sophia Lanuzo, Melrose Tia, and Jason Albia

Article
Research and Development
This article surveys the competitive landscape in the AI-driven behavioral simulation space. The central finding: the industry is converging on internal accuracy claims while avoiding the harder question of commercial validity. Predikta's landmark backtesting study with AdSpark, the first of its kind in the behavioral simulation space, directly addresses that gap, and reframes what rigorous validation should look like for every platform in this category.
01 · The Validation Problem
What the Industry Is Not Measuring
The synthetic audience and behavioral simulation market has moved from research curiosity to contested commercial territory in under two years. From well-funded U.S. startups like Aaru to specialized European platforms like Lakmoos, vendors increasingly position validation and predictive accuracy as primary differentiation. In practice, these are grounded in measures of simulation fidelity: how closely does a simulated response match a known human response on a held-out dataset?
That is a meaningful metric — and an important first step. It demonstrates that simulation can reproduce patterns observed in existing human data. But replication alone is not enough. The real test is whether the behavioral signals translate into real-world outcomes: does a higher simulation sentiment score translate to a better-performing ad in the real world, against real audiences, on real platforms?
To our knowledge, Predikta is the first behavioral AI simulation platform to move beyond response replication, conducting a blind backtest against independently sourced historical campaign data from a third-party agency, measuring whether higher simulation scores are associated with actual Click-Through Rates (CTR) and Cost Per Click (CPC) outcomes. A simulation score that predicts human responses but cannot predict which ad drives more clicks is an interesting research result; whether it can support real marketing decisions remains an open question. The Predikta × AdSpark backtesting study was designed to close that gap.
02 · Predikta's Evidence Architecture
Two Layers, Two Different Questions
Most platforms in this space have a single layer of evidence: internal benchmarking against their own data. Predikta has deliberately built two distinct and complementary validation layers, each answering a question the other cannot.
L1 — Scientific Foundation: Rigorous Internal Validation
Published as Sentiment Simulation using Generative AI Agents (Tia et al., arXiv:2505.22125, May 2025), the study grounded agents in a nationally representative survey of 2,485 Filipinos. The agents achieved 92% alignment with original survey responses and 81–86% accuracy in predicting ground-truth human sentiment, with high stability across repeated trials (±0.2–0.5% SD) and negligible sensitivity to framing variation (p = 0.9676, Cohen's d = 0.02). Platform benchmarks reached 88% simulation accuracy at the national level and 96% behavioral fidelity in psychographic representation. The question it answers: does the simulation faithfully represent how real people think and feel?
L2 — Commercial Validation: Blind External Backtesting with AdSpark
Conducted in partnership with AdSpark and supported by 917Ventures and Brave Connective, the study evaluated real historical campaigns across multiple industries. Predikta scored ads in blind simulations, benchmarked against independently sourced KPI data across nearly 20,000 ad pairs. It achieved a 20-point advantage over random selection in identifying the stronger-performing ad, while ads in the top 20% of Predikta scores delivered 3.4× higher CTRs and 2.6× lower CPCs than lower-scoring ads. The question it answers: can simulated audience responses reliably identify winning campaigns in the real market?
The significance of the two-layer structure is its logical completeness. Layer 1 establishes that the behavioral simulation model is accurate. Layer 2 establishes that behavioral accuracy has predictive value in the market. Both questions must be answered before a marketing team can have justified confidence in using simulation scores to make pre-launch decisions — and Predikta is the only platform in this roundup that has answered both.

03 · The Backtesting Study
Design, Results, and Implications
The study was designed to avoid the methodological shortcuts that make internal validation exercises easy to dismiss. Three design choices defined its credibility. Independent data source: campaign data came directly from AdSpark's own historical records — real campaigns run for real clients, with real performance outcomes already recorded, spanning telecommunications, IT and software services, banking and finance, and others, with Predikta having no involvement in selecting which campaigns were included. Blind simulation: Predikta's behavioral scores were generated without access to the actual campaign performance outcomes. Real commercial metrics: performance was evaluated against CTR and CPC rather than proprietary engagement proxies or internal scoring.
+20%
Advantage over random chance in head-to-head ad selection
3.4×
Higher CTR for top-20% Predikta scores vs. lower-scoring ads
2.6×
Lower CPC for top-20% Predikta scores vs. lower-scoring ads
Across nearly 20,000 head-to-head ad comparisons spanning multiple industries, Predikta correctly identified the stronger-performing ad about 70% of the time for conversion ads, based entirely on simulated audience responses compared against real campaign performance. In a setting where random guessing yields 50% accuracy, a 20-percentage-point advantage achieved before any ad was ever deployed is not a marginal result. Ads in the top 20% of Predikta scores delivered 3.4 times the CTR of lower-scoring ads at 2.6 times less per click — commercially significant before a single peso is allocated to a live campaign.
The study also identified where the advantage is strongest: Awareness and Conversion campaigns, the two objective types where creative quality has the most direct influence on outcomes. In these campaigns, success depends less on platform optimization or algorithm and more on how audiences think, feel, and respond to the ad message itself — precisely the layer Predikta's psychographic simulation is designed to model.
"Testing Predikta against real-world campaign outcomes has been a long-standing priority for our team. Moving beyond replication of survey responses to evaluating Predikta's output against observed campaign performance represents a meaningful development step. Opportunities to conduct this type of commercial validation are rare, and seeing the relationships hold up against observed campaign performance data reinforced confidence in the methodological foundation behind Predikta."
— Sophia Lanuzo, Lead R&D
What This Means for the Industry
Every competitor in the synthetic audience and behavioral simulation space makes accuracy claims. None has published a blind backtesting exercise against independently sourced, real-world campaign performance data. Predikta is the only behavioral simulation platform with a published scientific foundation and an externally validated commercial proof point built on real ad metrics. That is not a marginal differentiator — it is a different category of evidence entirely, and it sets a standard the rest of the industry will now need to respond to.
04 · Research Frontier
Academic Work Shaping the Science
The backtesting results do not stand alone. The broader academic literature on psychographically grounded agent simulation is converging on findings that reinforce what the AdSpark study demonstrates in commercial terms.
Our Foundational Paper (arXiv:2505.22125)
The paper established the behavioral architecture that the backtesting study later tested in a real commercial setting. It instantiated agents from survey data covering personality, values, beliefs, and socio-political attitudes, and showed that agents could maintain stable, psychographically anchored outputs regardless of how scenarios are framed — the behavioral property that makes real-world commercial prediction possible. A model highly sensitive to stimuli framing cannot produce reliable pre-launch predictions.
Adjacent Research Worth Knowing
LLM Agents Grounded in Self-Reports (arXiv:2411.10109). Using 1,052 Americans, agents built from qualitative interview and structured survey data reached 82–86% agreement on held-out social attitude items, compared with 74% for demographics-only agents — independent support for grounding agents in rich psychographic data rather than demographic profiling alone.
AgentSociety (arXiv:2502.08691). A large-scale social simulation framework using LLM-driven agents that integrate Maslow's hierarchy of needs, emotional states, motivations, and cognitive processes — signaling where the frontier is heading: motivational agent architectures that model dynamic needs and affective states beyond static profile-based simulation.
Polarization Simulation (Springer, 2025). A 100-agent study grounded in psychometric and demographic data from Serbian social media users shows that psychographically grounded agents can produce coherent ideological clustering and differentiated behavior — adjacent support for the stability property Predikta's paper documents.
Open challenge: a 2025 meta-review notes that LLMs struggle in scenarios that more accurately reflect real-world conditions — particularly around emotional nuance, cultural context, and group dynamics. Predikta's Philippines-specific, survey-grounded dataset directly responds to the cultural-context critique, and the AdSpark backtesting results are evidence that the response is working.

05 · Product Landscape
Who Is Doing What
The space is increasingly converging around the same question: not whether synthetic respondents can be generated, but how their outputs are validated. Most visible players remain Western or globally generalized in their data orientation. To our knowledge, no other platform has published a comparable validation architecture in the Philippines or broader Southeast Asia — one that integrates local survey grounding, population-scale modeling, and externally verified commercial outcomes.
Aaru
The highest-profile funded competitor. Founded March 2024, reported to have raised a Series A at a $1B headline valuation (Redpoint Ventures), backed by Accenture Ventures with anchor partnerships at IPG and EY. Uses multi-agent AI across public and proprietary data; best known for correctly predicting the New York Democratic primary. Its validation is event-prediction based — not tied to ad creative performance — with no peer-reviewed paper and no commercial backtest against CTR, CPC, or conversion data published.
Atypica.AI
An AI consumer research and product strategy platform claiming a 300,000+ persona library built from in-depth interviews, rapid time-to-insight, and 100× cost efficiency versus traditional agencies. Validation rests on internal benchmarks and client testimonials, with no peer-reviewed accuracy figures or external ad-performance backtest published.
Evidenza
The most direct product-use-case competitor to Predikta's Campaign Simulation Lab. Creates audience-specific synthetic personas to validate brand messaging, advertising creatives, and copy variations across demographic and psychographic segments. No published peer-reviewed accuracy metrics or commercial backtesting against live campaign performance found.
Synthetic Users
General-purpose synthetic research participants for interviews, concept tests, surveys, and usability studies, using a multi-agent architecture coordinating multiple LLMs. SOC 2 compliant and strong in UX and product research, but less focused on campaign sentiment or marketing performance forecasting; validation emphasizes qualitative realism over quantitative accuracy against real-world outcomes.
Lakmoos
Differentiates on architecture — neuro-symbolic AI (neural networks combined with symbolic reasoning) rather than pure LLMs — and reports 98%+ similarity scores across 20 client benchmark studies in 2025, with a Belkin case study. Benchmark figures are client-sourced, not peer-reviewed or independently verified against real campaign performance.
Ditto / Deepsona
A cluster of specialized platforms focused on concept validation, pricing, messaging, and go-to-market research. Ditto emphasizes synthetic persona panels and fast qualitative-to-quantitative studies; Deepsona positions itself as a predictive marketing simulation platform. Both operate primarily in Western markets with no published culturally specific models for Southeast Asia.

06 · Competitive Validation Matrix
A Standard the Field Must Now Meet
The comparison below summarizes the validation landscape across five dimensions that matter when deciding whether simulation scores can be trusted for real media-budget decisions.
Public Scientific Evidence
Predikta — Yes. arXiv:2505.22125; 92% profile alignment; 88% national-level simulation accuracy.
Others — No or partial; generally no platform-specific published scientific evidence.
External Commercial Backtest
Predikta — Yes. AdSpark study; ~20,000 ad pairs; blind design.
Others — No or partial; some validation claims or use cases, but no published ad-performance backtests.
Independent Data Source
Predikta — Yes. AdSpark historical campaigns, independently sourced.
Others — Often internal benchmarks, partner datasets, or non-independent sources.
Real Ad Metrics (CTR / CPC)
Predikta — Yes. 3.4× CTR uplift; 2.6× CPC reduction; ~70% head-to-head accuracy.
Others — Public CTR/CPC validation metrics generally not disclosed.
Cultural / Market Grounding
Predikta — Philippines-specific; 2,485 real survey respondents; 70M-population model.
Others — Typically global, Western, or European; audience-based models without comparable survey-grounded population modeling.
At a high level, Predikta is the only platform in this comparison with publicly documented evidence across all five dimensions. Other platforms may offer validation claims, partner integrations, or market-specific capabilities, but publicly available evidence is generally partial, unpublished, or not directly tied to campaign-level advertising outcomes.
Sources & References
[1] Tia, M., Lanuzo, J.S., Baltazar, L.R., Lopez-Relente, M.J., Quiñones, D.M., Albia, J. (2025). Sentiment Simulation using Generative AI Agents. arXiv:2505.22125 [cs.MA]. Netopia AI / UP Diliman / UP Los Baños.
[2] Netopia AI and AdSpark (May 2026). Predikta × AdSpark Backtesting Study — Press Release. Key figures: 3.4× CTR, 2.6× CPC reduction, ~70% head-to-head accuracy, ~20,000 ad pairs, an estimated 12M additional clicks and up to ₱50M in media value.
[3] Greenbook GRIT Report (2025). AI adoption in qualitative research — 72% figure, cited via Perspective AI blog, April 2026.
[4] PyMC Labs (2026). Synthetic Consumers: A Practical Guide. pymc-labs.com. Synthetic data 50% projection by 2027.
[5] TechCrunch (Dec 2025). AI synthetic research startup Aaru raised a Series A at a $1B headline valuation.
[6] Accenture Newsroom (Mar 2025). Accenture Invests in and Collaborates with AI-Powered Agentic Prediction Engine Aaru.
[7] Marketing Dive (Aug 2025). IPG partners with Aaru for AI-powered consumer simulations.
[8] AiMultiple (Mar 2026). Synthetic Users Explained: Top 7 AI User Research Tools.
[9] Ditto (Feb 2026). Synthetic Research Platforms: The 2026 Market Map. askditto.io.
[10] arXiv:2411.10109 (Nov 2024). LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals.
[11] arXiv:2502.08691 (Feb 2025). AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents.
[12] Springer (2025). Agent-Based Simulation of Politicized Topics Using Large Language Models (RecSysLLMsP).
[13] Atypica.AI Blog (Dec 2025). 10 Best AI Market Research Tools in 2025.
Ready to know before
you launch?
Find out what works before you launch your next campaign.


