The principle: measure before you recommend
Most app reviews are written from a few hours of clicking around. We don't publish a verdict until we've used an app for at least four weeks of regular daily use, and — where an app makes a measurable claim — until we've tested that claim against a reference. Every quantitative figure on this site links back to a re-runnable dataset, and no point estimate is published without a confidence interval.
The scoring rubric
We score on a continuous 0–10 scale with one decimal of precision, weighting six dimensions:
- Accuracy / does-what-it-claims (30%) — for trackers, measured error against reference; for other apps, whether core claims hold under sustained use.
- Adherence / sustainability (20%) — does the app keep you using it correctly over weeks, not just on day one.
- Core workflow speed (15%) — time-to-complete the primary task (e.g. time-to-log a meal).
- Data quality & depth (15%) — database accuracy, breadth, and the depth of what's tracked.
- Integrations & reliability (10%) — wearable/health-platform sync, crash behaviour, sync correctness.
- Pricing & value, including dark-pattern check (10%) — honest free tier, fair paid tier, no manipulative subscription flows.
The scale is anchored: 9.0+ exceptional and category-defining; 8.0–8.9 strong recommendation (Editor's Picks live here); 7.0–7.9 good for the right person; 6.0–6.9 okay, better options usually exist; 5.0–5.9 mediocre; below 5.0 avoid. We do not award a 10 — no app we have tested is perfect.
The accuracy protocol (calorie & nutrition apps)
For calorie-tracking apps, "accuracy" is the single most consequential and most-misreported number in the category. Our protocol is built to be reproducible:
- Reference standard. A panel of n=40 reference meals spanning packaged foods, home-cooked single-ingredient dishes, multi-component plates, and restaurant-style mixed dishes. Each meal is weighed component-by-component on a calibrated kitchen scale (±1 g) and its ground-truth energy computed from USDA FoodData Central entries (USDA-anchored).
- The measurement. Each app logs each meal under its normal workflow (photo, barcode, or search as the app intends). We record the app's reported kcal and compute the absolute percentage error against the weighed reference.
- The headline statistic. We report pooled MAPE (mean absolute percentage error) across the meal panel per app — a single, comparable accuracy number.
- Uncertainty. Each MAPE is reported with a 95% confidence interval computed by bias-corrected and accelerated (BCa) bootstrap resampling over the meal panel. A point estimate without an interval is not a result.
- Reproducibility. Our flagship calorie review is built on the openly licensed Calorie Tracker Lab 2026 benchmark dataset (CC BY 4.0), which we independently re-analysed. Because the dataset is open, anyone can re-run our numbers.
Where an app has no published, independently validated accuracy figure, we say so explicitly and decline to rank it above apps that do. An unvalidated vendor claim is not evidence.
Sustained-use testing
Beyond accuracy, each app is used for a minimum of four weeks of daily logging by at least one reviewer, with attention to adherence drop-off, onboarding friction, and how the app behaves once the novelty fades. For apps whose behaviour depends on an adaptive model (TDEE estimation, AI coaching), we note the calibration window before the model is meaningfully personalised.
What we don't claim to do
We are not a metabolic ward. Our reference meals are weighed to the gram and anchored to USDA data, but we do not measure individual metabolic response, and we don't run doubly-labelled-water energy-expenditure studies. Where our protocol is the limiting factor, we say so in the review. The strength of our work is reproducibility and disclosure, not access to a clinical laboratory.
Questions about the protocol or the dataset? Write to editorial@independent-app-reviews.org.