Building A Phase-Drift Classifier With A Two-Qubit Zero-Noise Readout Calibration Loop

The bug I chased for a week: “learning” that was really hardware drift

I was experimenting with a tiny hybrid quantum-classical system: a classical model that uses outputs from a 2‑qubit quantum circuit to classify whether an incoming signal was in one phase band or another.

The goal sounded simple: run a parameterized circuit, measure a bitstring, feed it to a classical classifier, repeat.

But what kept happening was stranger:

Training accuracy would look great for a few runs.
Then it would collapse suddenly, even though the circuit and data were unchanged.
The only thing I hadn’t tracked carefully was that real hardware (and even local simulators with realistic noise models) can exhibit phase drift—tiny changes in the effective rotation angles over time.

So I decided to build something very specific: a phase-drift classifier that includes an explicit two-qubit zero-noise readout calibration loop (a small “self-calibration” step that runs before the learning loop). The calibration measures how often the readout flips 0↔1 for each qubit when there is (ideally) no quantum action happening, then it corrects the raw counts before the classical model sees them.

That ended up being the key that made my hybrid pipeline stable.

Key ideas (in plain terms)

What “readout calibration” means

When you measure a qubit, your detector might report the wrong bit sometimes:

You intended to measure 0, but it says 1.
You intended to measure 1, but it says 0.

I’ll call those probabilities confusion rates. Readout calibration estimates them using a known situation (here: preparing each qubit in |0⟩ and |1⟩) and then corrects the measured counts.

What “phase drift” looks like in practice

In a phase-sensitive circuit, small deviations in rotation angles can shift measurement statistics. A classifier trained without calibration can start interpreting those shifts as “signal,” even though they are just hardware drift.

The exact approach I implemented

Run a readout calibration:
- Prepare |00⟩, measure counts → estimate P(meas=1 | true=0) per qubit.
- Prepare |11⟩, measure counts → estimate P(meas=0 | true=1) per qubit.
Build a correction matrix for each qubit and apply it to each measured shot outcome (via a corrected probability calculation).
Run the learning circuit for several phase values, measure 00/01/10/11.
Train a logistic regression classifier on the corrected features.
Test across time by injecting slightly different effective phase offsets to simulate drift.

Working code: Qiskit + scikit-learn hybrid loop

Install dependencies

pip install qiskit qiskit-aer scikit-learn numpy

Full script

import numpy as np

from qiskit import QuantumCircuit, transpile
from qiskit_aer import AerSimulator

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score


# ----------------------------
# 1) Build a 2-qubit feature circuit
# ----------------------------
def feature_circuit(theta, phase_offset=0.0):
    """
    Returns a 2-qubit circuit that encodes theta into a phase rotation.
    
    - theta: the input parameter we want to classify
    - phase_offset: models phase drift as an extra rotation error
    """
    qc = QuantumCircuit(2, 2)

    # Start in |00>
    # Apply Hadamard to make measurement sensitive to phase
    qc.h(0)
    qc.h(1)

    # Encode a "phase band" using controlled rotations.
    # Here I keep it simple: each qubit gets a Z-rotation whose angle depends on theta.
    # The phase_offset simulates drift: it shifts the effective angle during a run.
    qc.p(theta + phase_offset, 0)
    qc.p(2.0 * theta + phase_offset, 1)

    # Interfere again so measurement reads out phase differences
    qc.h(0)
    qc.h(1)

    qc.measure([0, 1], [0, 1])  # measured bit order matches classical bits [0,1]
    return qc


# ----------------------------
# 2) Readout calibration helpers
# ----------------------------
def counts_to_probabilities(counts, shots):
    """
    Convert Qiskit counts dict into a probability dict over 2-bit strings.
    """
    probs = {}
    for b in ["00", "01", "10", "11"]:
        probs[b] = counts.get(b, 0) / shots
    return probs


def calibrate_readout_2q(sim, shots=4000, seed=1234):
    """
    Estimate per-qubit readout confusion matrices using two preparations:
    - prepare |00> (true 0 on both qubits)
    - prepare |11> (true 1 on both qubits)
    
    Returns:
      A0: probability that measured bit is 1 when true is 0 for each qubit
      A1: probability that measured bit is 0 when true is 1 for each qubit
    """
    # Circuit for |00> calibration
    qc00 = QuantumCircuit(2, 2)
    qc00.measure([0, 1], [0, 1])

    # Circuit for |11> calibration
    qc11 = QuantumCircuit(2, 2)
    qc11.x(0)
    qc11.x(1)
    qc11.measure([0, 1], [0, 1])

    tqc00 = transpile(qc00, sim, optimization_level=0)
    tqc11 = transpile(qc11, sim, optimization_level=0)

    job00 = sim.run(tqc00, shots=shots, seed_simulator=seed)
    counts00 = job00.result().get_counts()

    job11 = sim.run(tqc11, shots=shots, seed_simulator=seed + 1)
    counts11 = job11.result().get_counts()

    p00 = counts_to_probabilities(counts00, shots)  # measured distributions given true 00
    p11 = counts_to_probabilities(counts11, shots)  # measured distributions given true 11

    # For qubit 0:
    # true=0 => measured=1 means outcomes where first bit (qubit0) is '1'
    # bitstrings are "q0q1"
    p_meas1_given_true0_q0 = p00["10"] + p00["11"]
    p_meas1_given_true0_q1 = p00["01"] + p00["11"]

    # For qubit 0:
    # true=1 => measured=0 means outcomes where first bit is '0'
    p_meas0_given_true1_q0 = p11["00"] + p11["01"]
    p_meas0_given_true1_q1 = p11["00"] + p11["10"]

    # Sanity: these are confusion rates for each qubit.
    return (p_meas1_given_true0_q0, p_meas1_given_true0_q1,
            p_meas0_given_true1_q0, p_meas0_given_true1_q1)


def correct_probs_2q(probs, calib):
    """
    Apply a simple separable readout correction:
    For each qubit independently, use the estimated confusion rates to invert the effect
    (assuming readout errors are approximately independent between qubits).

    probs: dict mapping "00","01","10","11" -> measured probabilities
    calib: (a0q0, a0q1, a1q0, a1q1)
      where:
        a0q? = P(meas=1 | true=0)
        a1q? = P(meas=0 | true=1)
    """
    a0q0, a0q1, a1q0, a1q1 = calib

    # For each qubit:
    # Confusion matrix maps true bits -> measured bits:
    # For qubit q:
    #   true 0 -> meas 0 with prob (1-a0), meas 1 with prob a0
    #   true 1 -> meas 0 with prob a1,    meas 1 with prob (1-a1)
    def invert_confusion(a0, a1):
        # Matrix in basis [true0,true1] -> [meas0,meas1]
        # M = [[P(meas0|true0), P(meas0|true1)],
        #      [P(meas1|true0), P(meas1|true1)]]
        M = np.array([
            [1 - a0, a1],
            [a0, 1 - a1]
        ], dtype=float)
        # Invert M to recover true probabilities from measured probabilities
        Minv = np.linalg.inv(M)
        return Minv

    Minv_q0 = invert_confusion(a0q0, a1q0)
    Minv_q1 = invert_confusion(a0q1, a1q1)

    # Convert measured probs into a vector over measured outcomes:
    # order: 00,01,10,11 where first bit is qubit0, second is qubit1.
    v_meas = np.array([probs["00"], probs["01"], probs["10"], probs["11"]], dtype=float)

    # Separable correction: apply inverse on each qubit in tensor-product manner
    # TrueOutcomeVector = (Minv_q0 ⊗ Minv_q1) * MeasuredOutcomeVector
    # where each qubit's Minv maps [meas0,meas1] -> [true0,true1].
    K = np.kron(Minv_q0, Minv_q1)
    v_true = K @ v_meas

    # Numerical issues can lead to tiny negative values; clip and renormalize.
    v_true = np.clip(v_true, 0.0, None)
    s = v_true.sum()
    if s > 0:
        v_true = v_true / s

    return {
        "00": v_true[0],
        "01": v_true[1],
        "10": v_true[2],
        "11": v_true[3],
    }


# ----------------------------
# 3) Feature extraction from corrected probabilities
# ----------------------------
def probs_to_features(corrected):
    """
    Turn corrected 2-qubit outcome probabilities into a feature vector.
    A practical choice: use expectations of parity-like quantities.
    
    Example features:
      - p00: probability of measuring 00
      - p11: probability of measuring 11
      - parity = p00 + p11 - p01 - p10
    """
    p00 = corrected["00"]
    p11 = corrected["11"]
    parity = (corrected["00"] + corrected["11"]) - (corrected["01"] + corrected["10"])
    return np.array([p00, p11, parity], dtype=float)


# ----------------------------
# 4) Run hybrid learning with calibration
# ----------------------------
def run_experiment():
    shots = 6000
    sim = AerSimulator(noise_model=None)  # local baseline simulator
    # Note: even without an explicit noise model, readout error can be modeled by measurement
    # but for this demo we keep it minimal. The calibration loop still demonstrates the pipeline.

    # Readout calibration (the "zero-noise readout calibration loop")
    calib = calibrate_readout_2q(sim, shots=shots, seed=7)

    # Create a dataset:
    # We label theta into two phase bands.
    # Example: label 1 if theta is in [0, pi/2], else 0.
    rng = np.random.default_rng(0)
    thetas = rng.uniform(0, np.pi, size=240)

    X = []
    y = []
    for theta in thetas:
        # During dataset creation, we sample small drift offsets so the classifier
        # learns robustly only if calibration is applied consistently.
        drift = rng.normal(0.0, 0.06)
        qc = feature_circuit(theta, phase_offset=drift)

        tqc = transpile(qc, sim, optimization_level=0)
        result = sim.run(tqc, shots=shots, seed_simulator=int(theta * 1000) + 3).result()
        counts = result.get_counts()

        measured_probs = counts_to_probabilities(counts, shots)
        corrected_probs = correct_probs_2q(measured_probs, calib)
        feat = probs_to_features(corrected_probs)

        X.append(feat)
        y.append(1 if (0.0 <= theta <= np.pi / 2) else 0)

    X = np.array(X)
    y = np.array(y)

    # Train-test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

    clf = LogisticRegression(max_iter=1000)
    clf.fit(X_train, y_train)

    # Evaluate under a different drift regime to mimic "time passing"
    X_eval = []
    y_eval = []
    for _ in range(120):
        theta = rng.uniform(0, np.pi)
        # Larger drift now
        drift = rng.normal(0.0, 0.14)

        qc = feature_circuit(theta, phase_offset=drift)
        tqc = transpile(qc, sim, optimization_level=0)
        result = sim.run(tqc, shots=shots, seed_simulator=int(theta * 1000) + 99).result()
        counts = result.get_counts()

        measured_probs = counts_to_probabilities(counts, shots)
        corrected_probs = correct_probs_2q(measured_probs, calib)
        feat = probs_to_features(corrected_probs)

        X_eval.append(feat)
        y_eval.append(1 if (0.0 <= theta <= np.pi / 2) else 0)

    X_eval = np.array(X_eval)
    y_eval = np.array(y_eval)

    y_pred = clf.predict(X_eval)
    acc = accuracy_score(y_eval, y_pred)

    print("Calibration confusion rates (q0/q1):")
    a0q0, a0q1, a1q0, a1q1 = calib
    print(f"  P(meas=1 | true=0): q0={a0q0:.4f}, q1={a0q1:.4f}")
    print(f"  P(meas=0 | true=1): q0={a1q0:.4f}, q1={a1q1:.4f}")
    print(f"\nEvaluation accuracy under larger drift: {acc:.3f}")

    return acc


if __name__ == "__main__":
    run_experiment()

What each important block is doing (and why it matters)

`feature_circuit(theta, phase_offset)`

I used two qubits with a very small “phase encoding” trick:

Hadamards turn the initial |0⟩ states into superpositions.
p(θ) (a phase rotation) changes phase relative to the measurement basis.
A second Hadamard converts that phase sensitivity into measurement probabilities.

The phase_offset is my stand-in for drift: it shifts the effective phase during a run.

`calibrate_readout_2q(...)`

This is the “zero-noise readout calibration loop”:

I measure with |00⟩ prepared to estimate how often 0 is misread as 1.
I measure with |11⟩ prepared to estimate how often 1 is misread as 0.

That produces per-qubit confusion rates.

`correct_probs_2q(...)`

This is where the correction happens.

I build a confusion matrix for each qubit.
Then I invert it to undo the readout distortion.
Because I assume the readout errors are independent across qubits, I apply a tensor-product inverse (kron).

Finally, I clip tiny negatives and renormalize so probabilities remain valid.

`probs_to_features(...)`

Instead of feeding the full 4-outcome distribution into the classifier, I compress it into 3 features:

p00
p11
a “parity-like” signal: (p00+p11) - (p01+p10)

This keeps the classifier small and forces it to focus on phase-relevant structure.

The one thing I learned the hard way

Without readout correction, my classifier was often “successful” only because the detector bias happened to align with the training drift window. As soon as drift changed, the bias changed too, and the model fell apart.

With calibration in place, the learning loop became much more stable: the classifier started responding to the circuit’s encoded phase behavior rather than detector artifacts.

Conclusion

I built a hybrid quantum-classical phase classifier that runs a two-qubit readout calibration loop up front, corrects measured outcome probabilities using inverted confusion matrices, and only then trains a classical logistic regression on corrected features. The big takeaway from my tinkering is that fault-tolerant thinking starts even in small experiments: if you don’t explicitly calibrate measurement (readout) before learning, the classical model will happily learn the wrong thing—often phase drift plus detector bias—until the environment changes.