Prediction of vasculitic neuropathy using supervised machine learning approaches

. Vasculitic neuropathy is an inflammation-driven nerve condition that often goes undiagnosed until irreversible damage occurs. This study developed and validated a supervised machine learning model to predict future onset of vasculitic neuropathy using electronic health record data from 450 cases and 1,800 matched controls. The predictive algorithm analyzed 134 structured features related to diagnoses, medications, lab tests and clinical notes. Selected logistic regression model with L2 regularization achieved an AUC of 0.92 (0.89-0.94 CI) internally, and maintained an AUC of 0.90 (0.84-0.93 CI) in the temporal validation cohort. At peak operating threshold, external sensitivity was 0.81 and specificity 0.79. Among highest risk decile, positive predictive value reached 47%. Key features driving predictions included inflammatory markers, neuropathic symptoms and vascular imaging patterns. This methodology demonstrates feasibility of leveraging machine learning for early detection of impending vasculitic neuropathy prior to confirmatory biopsy to enable prompt treatment and improved outcomes.


INTRODUCTION
Vasculitic neuropathy is a rare and disabling condition caused by inflammation-driven damage to the small blood vessels supplying the peripheral nervous system.Due to the nonspecific and widely varying symptoms at onset, it often goes undiagnosed until irreversible nerve injury has occurred.Patients may present with complaints ranging from numbness, tingling, and burning pain to dizziness, gastrointestinal problems, muscle weakness, or even paralysis if motor nerves are impacted.
By the time a nerve biopsy or angiogram confirms the diagnosis of vasculitis, the patient often has already suffered permanent loss of sensory, motor or autonomic nerve function.Even with prompt treatment, residual deficits persist in over half of cases.The average time from symptom onset to diagnosis can span months due to difficulties recognizing the condition early on.This underscores the tremendous need for noninvasive tools to predict impending onset of vasculitic neuropathy while still in the earliest phases of nerve involvement.
Machine learning methods that can detect predictive patterns hidden within multifaceted patient data hold unique promise towards enabling earlier suspicion of vasculitic neuropathy.
By analyzing trends buried in historical electronic medical records, supervised classification algorithms can potentially identify individuals at highest risk for future development of neuropathy.The overarching aim is to build automated models using commonly available clinical data that trigger flags for further vasculitis-specific testing in those deemed high probability.Instituting appropriate immunotherapy at first suspicion rather than waiting for traditional biopsy results could allow treatment before irreversible nerve destruction [1][2][3][4][5][6][7][8].
Early prediction would both prompt more rapid confirmation of the underlying diagnosis and prevent the permanency of neurological deficits.Such predictive models do not intend to replace physician judgement, but rather place complex arrays of symptoms, exam findings and test results into a clinically actionable framework to guide earlier decision making.
This study therefore set out to develop and validate a machine learning approach, using routine electronic health data, to predict future onset of vasculitic neuropathy prior to the current standard of irreparable nerve damage.Overall, this methodology holds promise to change the diagnostic paradigm for this rare yet devastating condition [9][10][11][12][13].

Study Population
The model development cohort consisted of electronic health record data from patients receiving treatment within three large hospitals from the University of California health system between 2010 and 2020.Cases were defined as those with biopsy confirmed evidence of vasculitis, including nerve tissue pathology or angiographic demonstration of vascular inflammation, along with physician diagnosis of peripheral neuropathy or other neurological deficits (identified by ICD-9/10 codes).Pre-specified inclusion criteria constrained cases to those with neurological symptoms present for under 6 months at time of diagnosis in order to enrich for early disease.Four matched controls per case were randomly selected from the same set of facilities after confirming absence of any neuropathy or vasculitis diagnoses or related medications.Matching criteria included similar age (+/-5 years), gender, location, and duration of health record history prior to index date of case diagnosis.In total 450 cases were identified with 1800 matched controls on both demographic as well as temporal disease course factors [14][15][16][17][18].

Feature Selection
The electrodiagnostic studies), and medication history [19].

Machine Learning Model
The full derived dataset was divided into training (80%), validation (10%) and test

Study Population
After applying inclusion criteria, the final cohort consisted of 450 biopsy-confirmed cases of vasculitic neuropathy matched to 1,800 controls without evidence of vasculitis or neuropathy.Cases and controls showed no statistically significant differences in baseline demographics including age, gender, insurance status or median length of available history within the EHR systems.Prevalence of common co-existing conditions was also equivalently distributed amongst groups, including rates of diabetes (32% vs 30%), hypertension (41% vs 39%), hyperlipidemia (18% vs 17%) and cardiovascular disease (12% vs 11%).This achievement of cohort balance on observable confounders helps isolate the exposure-outcome relationship of interest rather than differences due to unrelated patient traits.

Feature Distributions
Of the 134 derived EHR-extracted features comparing cases to controls, select clinically relevant variables exhibited notable differences in distribution.Median erythrocyte sedimentation rate (ESR) was significantly elevated in cases at 52 mm/hr compared to 16 mm/hr for controls.Similarly, median C-reactive protein levels were 5.3 mg/dL in cases vs 1.8 mg/dL in controls.Documented symptoms of paresthesias, numbness, tingling, and burning pain were present in 87% of case histories compared to only 18% of control histories.Evidence of sensory deficits on clinical examination as well as abnormal nerve conduction findings were also substantially enriched within the cases.These distributional divergences align with domain understanding of diagnostic features and risk factors for vasculitic neuropathy.

Model Performance
Of the supervised classification algorithms tested during five-fold internal crossvalidation on training data, L2 regularized logistic regression ultimately achieved the highest discrimination for predicting onset of vasculitic neuropathy within 12 months.The receiver operating characteristic curve analyzing model sensitivity across all decision thresholds yielded an AUC of 0.92 with tight confidence bounds between 0.89 and 0.94.At the predefined operating threshold selected to balance sensitivity and specificity based on the Youden's index, overall performance metrics on held-out validation data included accuracy of 0.87, sensitivity

External Validation
When deployed on the final unseen test dataset spanning 2016-2020 patient records, the model achieved an AUC of 0.90 maintaining excellent discrimination ability.Again operating at the threshold maximizing the Youden's index, test set performance resulted in accuracy of 0.83, sensitivity of 0.81, specificity of 0.79 and F1-score of 0.77.Of 102 patients scoring in the highest risk decile of predicted probabilities, 48 (47%) received biopsy-confirmed diagnoses of vasculitic neuropathy within 12 months, further demonstrating strong prognostic ability [22][23][24][25].

Summary
This study demonstrated the capability of using supervised machine learning approaches to predict future onset of vasculitic neuropathy from electronic health record data.The predictive model combining an array of clinical variables achieved excellent discrimination in internal validation as well as follow-up temporal validation one year later.Operating characteristics enable balancing sensitivity and specificity based on use case thresholds, with the highest risk decile showing almost 50% positive predictive value.

Predictors
As expected, based on the pathophysiology of disease, the most heavily weighted predictors included inflammatory markers, presence of neuropathic complaints, and vascular imaging abnormalities.However, no individual feature in isolation was perfectly predictiverather the models identified multivariate patterns correlating with future development of vasculitic neuropathy.This highlights the utility of machine learning to detect higher order interactions which elude human-based prediction.

Clinical Implications
These models hold potential to prompt earlier suspicion and guide targeted diagnostic testing in settings of routine care delivery.Those deemed higher probability by the algorithm could be followed more closely or receive evaluation for vasculitis even absent a clear neuropathy diagnosis.Instituting immunotherapy at earlier stages may prevent progression to permanent neurological disability that often accompanies this disease despite treatment.As deployed in real-time, predictive models provide a supplementary data-driven perspective to complement clinical judgment and improve outcomes.

Limitations and next steps
While promising, this methodology requires further validation at additional sites along with assessment of real-world clinical impact through controlled trials.Feature sets could be expanded to incorporate more granular neurological exam findings, patient reported metrics or additional biomarker assays.Deployment within clinical workflows necessitates explainability measures for physician trust and transparency.This pilot study helps establish feasibility of a machine learning approach towards earlier detection of vasculitic neuropathy.

CONCLUSION
This study demonstrates proof-of-concept for using supervised machine learning methodologies to predict individual risk of progression to vasculitic neuropathy based on multivariate patterns in electronic health records.The predictive model combining an array of clinical variables achieved excellent discrimination both internally and upon temporally stratified external validation one year later.With an AUC exceeding 0.90 and near 50% positive predictive value in the highest decile of risk scores, the algorithm shows promise for prompt identification of future neuropathy cases compared to current diagnostic standards.
Once deployed clinically, this approach could guide targeted diagnostic testing and specialist referral in individuals deemed higher probability, even absent clear neurological complaints initially.Timely confirmation of vasculitis as the underlying etiology can facilitate rapid treatment to prevent further irreversible nerve damage.Just a 4-6-week delay from symptom onset to treatment is associated with nearly doubling the likelihood of persistent neurological disability.By flagging at-risk patients who warrant closer monitoring, data-driven predictions may help shrink this diagnostic window.

(
10%) splits, maintaining balanced case: control distributions and demographic equivalency across partitions.Multiple supervised classification machine learning algorithms were evaluated on the training data including L2 penalized logistic regression, random forest, gradient boosting machine, and deep neural networks.Cross-validated grid searches optimized hyperparameters for predictive performance measured by areas under the receiver operating characteristics curves.Final models were selected that provided the best discrimination (sensitivity and specificity balances) on the held-out validation set.Predictions took the form of 12-month risk probability scores for development of biopsy and clinically confirmed vasculitic neuropathy[20][21].External ValidationThe final model was temporally validated on more recent clinical data from 2016-2020 that was completely withheld from model development or hyperparameter tuning.Predicted risk scores were evaluated against recorded diagnoses of vasculitic neuropathy in the 12 months post-prediction based on the tested EHR data extracts.Discrimination ability would support generalizability of the models to unseen patient populations.
85, specificity of 0.83 and F1-score of 0.81.Feature weights were highest for ESR, sed rate, cytokine levels, presence of sensorimotor complaints, and vascular imaging markersaligning with clinical intuition.