Test Methodology & Science

This page explains how Testyra tests are built, what they measure, how scoring works, and where their limitations are. Tests are reviewed periodically for accuracy and updated when research in relevant areas develops meaningfully. Questions or corrections can be sent through the contact page.

Testyra is maintained by a small team of developers and researchers with backgrounds in cognitive psychology and web development. We are not a clinical institution. That distinction matters and shapes everything on this page.

Design Principles

Every test is built around three requirements:

It must measure something specific. Tests here target defined cognitive processes — rule switching, inhibitory control, digit span, response selection — rather than broad constructs like intelligence as a single number. Specific measurements are more honest and more useful, even when they feel less impressive.
The mechanics must reflect what is being measured. A test claiming to measure sustained attention should require sustained attention to complete. Most Testyra tests are performance based for this reason, though self report formats are used where performance measurement is not appropriate for the construct.
Results should be interpretable without a specialist. Every test includes plain language explanations of what scores may suggest, alongside honest statements of what they cannot tell you.

How Tests Are Validated in Practice

We do not have access to clinical validation infrastructure. What we do instead is more modest and more honest.

Each test goes through internal review against published research on the target construct before launch. We check that the task format has precedent in peer reviewed literature, that the scoring logic reflects how the construct is actually measured in research settings, and that the language describing results does not overclaim what the test can support.

After launch, tests are monitored for user feedback patterns. Consistent reports of results feeling inaccurate or misleading trigger a content review. This is not a substitute for clinical validation but it is a real quality process, and we think it is worth describing accurately rather than overstating.

Test Categories

Memory and Attention

This category covers working memory capacity, sequential recall, visual memory, sustained attention, and cognitive flexibility. These abilities have been associated with learning outcomes and professional performance across a substantial body of research, though the size and consistency of those associations varies by study design and population. The 9 tests here are among the most straightforwardly measurable on the platform.

Tests: Adaptability, Attention Span, Attention to Detail, Cognitive Stamina, Focus Meter, Number Memory, Sequence Memory, Visual Memory, Word Memory.

Logic and Problem Solving

This category covers analytical reasoning, argument evaluation, pattern recognition, ethical judgment, risk evaluation, and decision quality. Most tests here are untimed because reasoning quality matters more than speed in most real contexts. Scoring in this category involves more interpretation than speed tests — what counts as the most defensible answer in an ethical dilemma is not the same kind of measurement as a reaction time — and results should be read with that in mind.

Tests: Critical Thinking, Decision Making, Digital Age Ethics, Ethical Dilemmas, Logic Puzzles, Logical Reasoning, Mental Math, Pattern Recognition, Problem Solving, Risk Assessment, Strategic Thinking.

Creativity and Thinking Styles

This is the category where we are most cautious. Creativity and thinking style measurement is genuinely harder than memory or reasoning assessment — the constructs are broader, the scoring is less objective, and the research is more contested. Divergent thinking tests like Alternative Uses have reasonable empirical grounding going back to Guilford's work in the 1950s. Self report assessments like Learning Style and Personality are better treated as structured reflection tools than precision measurements. If results here feel inaccurate, that reaction is worth trusting more than the score.

Tests: Alternative Uses, Creativity, Learning Style, Personality, Spatial Awareness, Story Creation, Word Association.

Speed and Focus

Reaction time, impulse control, choice speed, and visual focus tests produce the most consistent and reproducible scores on the platform. Reaction time is a relatively clean output with less interpretive uncertainty than reasoning or creativity scores. These tests are also the most sensitive to temporary factors — fatigue, caffeine, time of day, and device quality introduce more variability here than in other categories. A score taken at 11pm after coffee is measuring something real but not quite the same thing as a rested baseline.

Tests: Choice Reaction, Go/No-Go, Reaction Time, Reflex, Stroop.

Scoring Approaches

Level based — difficulty increases until performance breaks down. Score is the level reached and peak performance before errors accumulated. Used by Number Memory, Sequence Memory, Attention to Detail.
Time and accuracy combined — both speed and correctness matter together. Used by Reaction Time, Reflex, Choice Reaction.
Percentage accuracy — fixed set of scenarios with defensible best answers. Used by Critical Thinking, Decision Making, Risk Assessment. Carries more interpretive uncertainty than speed scores.
Profile based — no correct answer. Produces descriptive output rather than performance score. Used by Personality, Learning Style, Ethical Dilemmas.

What These Tests Are Not

Testyra tests are not clinical assessments and are not diagnostic tools. Nothing here should inform decisions about any cognitive or psychological condition including ADHD, learning disabilities, or dementia. A qualified clinician is the appropriate resource for those questions.

Single session scores are affected by factors unrelated to underlying ability. Sleep quality, stress, ambient noise, and format familiarity all introduce variability. Results are best treated as a starting point rather than a fixed measurement.

Research Basis

The constructs our tests draw on come from established research traditions. Brief context on each:

Baddeley, A.D. & Hitch, G. (1974). Working memory. Psychology of Learning and Motivation, 8, 47–89. — The foundational model for how short term memory and active information processing interact. Still the dominant framework in working memory research despite ongoing refinements.

Guilford, J.P. (1956). The structure of intellect. Psychological Bulletin, 53(4), 267–293. — Introduced divergent thinking as a measurable construct distinct from IQ. Basis for most creativity assessment frameworks including Torrance's subsequent work.

Kahneman, D. & Tversky, A. (1979). Prospect theory. Econometrica, 47(2), 263–291. — Demonstrated systematic patterns in how people evaluate risk and make decisions under uncertainty. Basis for behavioral economics and our decision and risk assessment test design.

Donders, F.C. (1868). On the speed of mental processes. Acta Psychologica, 30, 412–431. — Early foundational work establishing that mental processing takes measurable time. The conceptual basis for all reaction time research since.

Testyra's specific implementations have not been independently validated against clinical populations.

Last updated: June 2026