Deep Learning to develop and validate a scalable market-ready arrhythmia detection system

  • Deep Learning Model
  • Result and Validation

Taking Deep Learning in Arrhythmia Detection to a new level

Electrocardiogram (ECG) serves a key role in understanding human health conditions and necessitates targeted treatments. Attempts to automate ECG interpretations using different algorithmic paradigms have existed for decades, more recently with machine learning. However, a robust system or an approach that can automate a wide range of arrhythmia detection for ECGs captured, or acquired from a multitude of sources remains a challenge given the signal quality and associated artifacts. The approach that we have taken is the application of Deep Learning in arrhythmia detection to a new dimension, in which it uses class discriminative visualization to improve interpretability and transparency of the deep neural network(DNN) as an additional step to validate the algorithm in an attempt to mimic “Human eye interpretation”.

Data Collection

Training Data

Validation Data

10000
ECG Strips
10000
Patients
0
Hours
0
ECG Strips
0
Patients
0
Hours
A graph on Training and Validation samples(n)

Figure 1: Training and Validation samples(n)

A 14 layered two-dimensional Convolution Neural Network(2D CNN) to classify cardiac rhythms into 21 categories was developed using 315488 single-lead 10 second ECG strips from 120063 unique patients. The model was validated on a test set of 5069 ECGs from 4780 unique patients annotated by a panel of 5 American board certified CCT reviewed by a senior CCT and Grad-Cam analysis was performed on 235 images from the test set to derive mean IoU(Intersection over Union) score.

Figure 2: Different devices in the external validation set

Data was collected from 4780 patients using different FDA approved continuous monitoring devices (Lifesignals, Bittium Faros, Apple Watch) and twelve lead ECG machines (Philips, BPL, GE etc) to make sure the dataset is built from multitude of sources.

Data pipeline and model

Data Pipeline

In order to handle complexities associated with processing data from diverse sources with different characteristics such as gain, speed, sampling rate, hardware-specific calibration values(physical and digital min/max values), and the scale of data, a series of preprocessing steps were performed to generate the input image fed to the deep learning model.

1. Signal standardization for the model: To standardize ECG data in different hardware-specific scales to mV signal

2. Suitable lead selection: To choose a suitable data channel if channel information is ambiguous or unavailable

3. ECG strip generation and dynamic cropping: 1D to 2D transformation of raw data for classification

Figure 3: Data Pipeline

Figure 4: Signal Processing – Noise Removal and Baseline Correction

ECG signals are very much susceptible to artifacts which can be power line interferences, motion artifacts, muscle movements, or even hardware-level issues. Such noises will be extreme in the case of cardiac monitoring patches and wearables. For noise removal and baseline correction, a zero-phase Butterworth bandpass filter of 0.5 – 40 Hz of order 4 is applied. This removes high-frequency noises and low-frequency baseline drifts keeping the morphological features intact. The blue signal represents the raw sample with significant noise and baseline drift whereas the red is the processed signal.

Lead Selection

For a standard 12 lead ECG machine or 12 lead holter devices, identifying limb leads are straightforward since data always contain appropriate lead information but in the case of continuous monitoring sources like ECG patches, standard lead name conventions may not be followed or will be ambiguous or may not be following standard lead positions. While encountering a data with inadequate or improper lead information, a suitable limb lead detection algorithm is introduced into the pipeline to reduce the false positive as currently, the deep learning model has never encountered precordial leads or such look-alike signals while training.

A typical V1, V2, or V3 precordial leads have a very different morphology compared to other lead patterns and their peculiar ECG morphology of deep S waves with small R peaks, and a relatively high T wave amplitude is utilized to distinguish these leads from others. The detection algorithm begins by identifying the QRS peaks on a filtered signal. By extracting QRS beats, R to S amplitude ratio is calculated on each and every individual beats to identify the overall nature of the signal and suitable leads are chosen for ECG strip generation

Figure 5: Beat by beat detection and R/S ratios for typical precordial lead(V1, V2, V3) and limb lead

ECG strips and preprocessing

The lead data is segmented into 10 seconds strips and plotted following standard gain(10mm/mV) and speed(25mm/sec) along with proper ECG grid background. This is to ensure that the model won’t miss any amplitude or temporal details of the signal mimicking exactly the same way a cardiologist begins his/her assessment. After capturing these details, the ECG images are preprocessed to remove the grid background using HSV(Hue, Saturation, Value) thresholding for the model to easily focus on signal morphology.

A. ECG strips with a grid background

B. Preprocessed ECG strips

Figure 6: Pre-processing of ECG images

Model Architecture

Convolutional Neural Networks(CNN) is an extensively used architecture for image analysis. CNN has already demonstrated superior performance on various medical diagnostics and image analysis and a similar approach was taken to build a deep learning model. The ECG images in the dataset were preprocessed to remove the grid background using HSV(Hue, Saturation, Value) thresholding. The convolutional neural network used for training the model has Effificentnet B3 architecture as a base followed by a series of custom layers. Dropout regularization is added between the average pooling layer and the fully connected layer to prevent overfitting. The model output is given by a softmax activation function.

Hyperparameter optimization was done by conducting a series of experiments with the different hyperparameter combinations to find the set which gives the best result. The model can classify ECG data into 21 different arrhythmias. The idea of having such a fine-grained level of granularity in arrhythmia categories is important from both a clinical and model perspective. Fine-grained class labeling can help improve the neural network optimization and generalizability as it facilitates the model to learn more features. AUC and F1 scores were used to evaluate the performance of the model.

Figure 7: Model Architecture

Model Validation - an attempt to mimic “Human eye interpretation”.

Figure 8: Grad-CAM visualization of arrhythmias a) 10 second ECG strip. b) The grad-CAM visualization of the model. c)  Bounding box of the region of interest

 

In order to understand and visualize the important regions in the ECG image used for prediction, we have performed Grad-Cam analysis on a dataset of 235 images. 

The Grad-CAM algorithm which works as a wrapper over trained model was applied on the dataset to generate a 299 x 299 image representing a heat map of class activation mappings. The regions with high activations in the heat map were localized by extracting coordinates of  pixels with value in the R channel(of RGB color space) greater than 230 . A bounding box is then drawn around the region with high activation mapping. The coordinates of the bounding box were then mapped to the original image with a bounding box for better visualization of the result. Over the annotated external dataset, {x number} AFIB, {x number} PVCs, {x number} Ventricular Tachycardias and  {x number} concurrent arrhythmias({contained arrhythmias}) were considered and Grad-CAM algorithm was applied for visualization of important regions in the image

Results

Arrhythmia Test Set Scores

Recall

PPV

F1 Scores

AUC Scores

Normal Sinus Rhythm
0.99
0.99
0.99
0.99
3rd Degree AVB
0.85
0.98
0.95
0.92
Paroxysmal SVT
0.88
0.71
0.79
0.93
Atrial flutter
0.91
0.89
0.90
0.92
Atrial fibrillation
0.79
0.91
0.86
0.89
Atrial run
0.98
0.87
0.92
0.98
Severe bradycardia
0.91
0.79
0.83
0.95
Ventricular tachycardia
0.97
0.91
0.93
0.98
VEB
0.91
0.84
0.84
0.95
Severe tachycardia
0.90
0.87
0.93
0.95
pause
0.97
0.83
0.87
0.97
Atrial bigeminy
0.94
0.88
0.93
0.96
Ventricular trigeminy
0.99
0.89
0.94
0.99
2nd degree type 2
0.90
0.80
0.84
0.95
Multifocal PVC
0.96
0.98
0.98
0.97
2nd degree type 1
0.86
0.85
0.98
0.93
Ventricular bigeminy
0.95
0.97
0.97
0.97
VPE
0.86
0.90
0.91
0.93
Advanced heart block
0.94
0.76
0.83
0.96
Junctional rhythm
0.81
0.77
0.78
0.90
Artifact
0.95
0.83
0.91
0.91

Table 1: Scores in the external validation set

The overall model accuracy was 92% where the precision was about 91% for almost all classes except PSVT, junctional rhythm and advanced heart block which had an overall precision of  74.6%. Of all the tachycardias 96% of the files were classified correctly by the model. 90% of the files with various types of AV blocks were correctly classified by the model whereas out of 152 artifact samples, 88% of them were detected correctly. Critical arrhythmias like atrial fibrillation, atrial flutter, ventricular tachycardia and PSVT have a recall of 0.81, 0.89, 0.95 and 0.85. The test data was collected using more than 5 different hardwares and the performance model was satisfactory considering the signal quality and artifacts involved during the acquisition. The preprocessing steps involved backed by source independent behaviour of the model have improved the scalability of the model very much in a practical scenario.

Be it a direct digital data feed or a paper records, Deeprrhythmia™ makes it incredibly easy to get results quickly with a high degree of accuracy ensuring right diagnosis at the right time.  

By implementing an end to end solution with a robust pipeline to streamline the data from multitude of sources and feeding a deep learning model, we aimed to automate arrhythmia detection with an accuracy greater than an average cardiologist. The proposed image classification model can detect 21 different arrhythmias and conduction abnormalities across a diversified set of medical devices, wearables(wristbands and smart watches) and other continuous monitoring devices(like holters and ECG patches) which are well established in the industry today. Signal to noise ratio of wearables that collect limb lead or look alike signals from wrist or finger tips should be definitely considered thus an artifact category was also included in the model. Although Deep Learning has been adopted and its use in healthcare is moving at an exponential pace, lack of annotated data and validation strategies have limited the ability towards conclusive interpretation of ECG detection models. So, apart from external validation, we have also used Gradient-weighted Class Activation Mapping (Grad-CAM)[23] to generate visual explanations for model predictions so that the model will no longer remain as a blackbox.