Methodology

Technical foundation of ChiralCall: validation strategy, confidence framework, and production performance.

Overview

ChiralCall predicts the historically favored eutomer (the more pharmacologically active enantiomer) within validated compound classes, using a proprietary first-principles computational method. Predictions reflect the dominant stereochemical preference reported in peer-reviewed pharmacological literature — typically the enantiomer with superior potency at the primary therapeutic target.

The underlying classifier is not machine learning. It derives predictions from fundamental stereochemical analysis of molecular topology — no training on known chirality outcomes, no statistical fitting to activity data. A separate calibration layer (the Calibrated Confidence Score, described below) uses logistic regression to estimate per-compound reliability, but the prediction itself is deterministic and first-principles. A full methodology paper is in preparation.

The production system covers 50 compound classes with 98.5% accuracy verified against 1,051 compounds with published eutomer assignments (Wilson 95% CI: 97.7%–99.0%). Every compound class returns a prediction — each one labeled with a confidence tier so you know exactly how much validation data backs it.

Validation Protocol

Every compound class in the production table undergoes systematic leave-one-out (LOO) cross-validation. For each compound, we remove it from the validation set, and test on the held-out compound. This produces honest accuracy estimates per compound class.

Our original validation set was prospectively blind: compounds were selected and locked before any predictions were run. No training-set fitting was possible.

A note on scaffold-split validation

Experienced computational chemists will ask: “Does LOO within a compound class leak signal from close analogs?” This is the right question for QSAR or ML models, where learned features from structurally similar training compounds can inflate held-out accuracy.

ChiralCall's classifier does not learn from activity data. Predictions are derived from first-principles analysis of molecular topology — the same descriptor computation runs identically whether the compound has zero or a hundred validated analogs in the database. There is no feature vector fitted to training outcomes, so scaffold similarity between the held-out compound and remaining compounds does not create information leakage.

The strongest test of this claim is a blinded retrospective on your own internal compounds — which is exactly what the CRO pilot is designed for. If ChiralCall's accuracy holds on your unpublished analogs, the first-principles approach is validated on your chemical matter.

In short: LOO is appropriate here because the classifier is deterministic and does not train on activity outcomes. For organizations seeking additional assurance, we recommend scaffold-family and project-level holdouts using your own compounds.

Confidence Tiers

ChiralCall returns a prediction for every compound class — you always get an answer. The confidence tier tells you how much validation data supports that prediction, so you can decide how to act on it.

Tiers are computed automatically from Wilson 95% confidence intervals on per-class accuracy data. As we validate more compounds in each class, tiers are promoted accordingly.

Tier 1 — Validated production

Highest confidence. Use these predictions to guide synthesis planning and prioritize lead compounds.

  • N ≥ 10Minimum 10 validated compounds in the class
  • ≥ 90%≥90% leave-one-out cross-validation accuracy
  • CI > 80%Wilson 95% CI lower bound exceeds 80%

Tier 2 — High accuracy, building sample size

High accuracy on available data, but the sample size is still building. Good for hypothesis generation — confirm experimentally before committing resources.

  • N ≥ 6Minimum 6 validated compounds in the class
  • ≥ 90%≥90% leave-one-out cross-validation accuracy
  • CI > 54%Wilson 95% CI lower bound exceeds 54%

Tier 3 — Below validation threshold

You still get a prediction, but it comes with a specific disclaimer explaining why this class hasn't been promoted yet — whether that's limited sample size (N < 6), lower accuracy on available compounds, or scaffold coverage without enough validated examples. Tier 3 classes are actively accumulating data and will be promoted to Tier 2 or Tier 1 as validation compounds are added.

Calibrated Confidence Score (CCS)

Every prediction includes a Calibrated Confidence Score (CCS) — a numeric probability of correctness from 0 to 100, computed per compound. CCS is not the prediction itself — it is a separate calibration layer that estimates how reliable each individual prediction is likely to be, based on structural features and compound-class track record.

To be explicit about the two-component architecture: the classifier (first-principles, deterministic, no ML) produces the eutomer prediction. The confidence scorer (logistic regression, trained on 898 validated compounds) estimates that prediction's reliability. These are independent systems — the classifier would produce identical predictions with or without CCS.

How CCS is computed

CCS is produced by a logistic regression model trained on 898 compounds with known eutomer assignments. The model combines two categories of input:

Compound-class accuracy — the historical leave-one-out cross-validation accuracy for the matched compound class. Higher class accuracy increases the CCS score.

Structural complexity features — five proprietary features computed from the SMILES string that characterize the stereochemical environment of each compound. These capture the number of stereocenters, the diversity and distribution of neighboring atom types, and the overall complexity of the chiral environment. No conformer generation is required.

The logistic regression outputs a sigmoid probability, scaled to 0–100 and mapped to four confidence tiers:

≥ 90

Very High

≥ 75

High

≥ 60

Medium

< 60

Low

CCS is returned in every API response as confidence_score (numeric, 0–100) and confidence_tier (label). The score is calibrated: compounds scoring ≥90 are correct 97% of the time in our validation set.

Production Performance

1,051

Compounds with published eutomer assignments

98.5%

Accuracy (Wilson 95% CI: 97.7%–99.0%)

50

Compound classes across Tier 1, Tier 2, and Tier 3

These metrics represent validation across 50 compound classes spanning pharmaceuticals and agrochemicals. Each compound class maintains independent accuracy and confidence metrics. Verified against 100% of compounds with published eutomer assignments. Download the technical validation supplement (PDF) or the raw validation dataset (CSV) with every compound, prediction, and outcome.

Wrong Predictions — Fully Disclosed

Across 1,051 validated compounds with published eutomer assignments, ChiralCall returned 18 incorrect predictions (1.5% error rate). We disclose every one of them.

Transparency about failure modes is essential to scientific credibility. For each wrong prediction we publish the input SMILES, the predicted enantiomer, the actual enantiomer, and the confidence score at the time of prediction.

We categorize wrong predictions by compound structural features — for example, conformationally flexible macrocycles, fused polycyclic systems with adjacent stereocenters, heavily substituted dihydropyridines. We do not publish method-level failure analysis. Full data is available in the upcoming Wrong Predictions Browser.

18

wrong predictions

1.5%

error rate

100%

fully disclosed

Cite This Work

If you use ChiralCall in your research or development, please cite:

ChiralCall, Arroway Sciences. Accessed [DATE]. https://chiralcall.com/

A full methodology paper is in preparation. Academic users are welcome to cite ChiralCall in the meantime using the format above.

Researchers: Free Access with Your .edu Email

100 predictions/month + full API access. No credit card required. Lifetime free for academic use.

Create free account →

Learn More