A Modular Framework for the Interpretation of Paper ECGs Sara Summerton1, Nicola Dinsdale2, Tuija Leinonen3, George Searle4, Matti Kaisti3, David C Wong5 1 University of Manchester, Manchester, UK 2 University of Oxford, Oxford, UK 3 University of Turku, Turku, Finland 4 University College London, London, UK 5 University of Leeds, Leeds, UK Abstract Despite advances in digital storage of electrocardio- grams (ECGs), paper print outs are still common place in clinical practice. The digitization and interpretation of paper ECGs is therefore of high utility. We describe the creation of a modular pipeline to achieve both of these tasks. The solution was created by the Easy Geese for the Digitization and Classification of ECG Images: George B. Moody PhysioNet Challenge 2024. Methods: The pipeline accepts an image of a 12-lead ECG in any common format. It first extracts the area of interest using YOLO, and then segments pixels that consti- tute the ECG signals using a ResUnet. The resulting mask is rotated, and contiguous signal pixels are joined within the area of interest. In the last part of digitization, the sig- nals are scaled, separated by lead, and checked for errors. Finally, the digitized 12-lead signals are input into an SE- resnet classifier to provide clinical interpretation. Results: Our ResUnet had a Dice score of 0.997. On the test set, our digitization pipeline had an average signal-to- noise ratio (SNR) of −5.272; our ECG classifier had a macro F -measure of 0.082. This entry was not ranked in the official phase but in the hackathon, where we ranked 2/2 and 1/1 on digitization and classification, respectively. 1 Introduction Despite the prevalence of large digital ECG databases for reseach, paper ECGs remain a stalwart of standard clin- ical care. While there have been multiple attempts to dig- itize images of paper ECGs, often these require user in- put to determine convert the raw signal into a clinically meaningful format. For instance, Fortune et al.’s approach requires a user to identify each ECG lead manually [1], while Santamonica’s solution requires the location of ref- erence pulses and the lead for any rhythm strips [2]. The 2024 George B. Moody PhysioNet Challenge was to digitize and classify ECGs directly from images, with- out human intervention. The task and dataset are described in detail in [3] and [4] and are provided via Physionet [5]. 2 Methods Training images for this challenge were generated from signals from the PTB-XL database using the ECG-Image- Kit generator [6, 7]. The generator provided images in a format similar to a paper ECG. The generator was also modified to provide an ECG image mask – ECG signals with no grid, text, or other augmentation. Clinical labels were available directly from PTB-XL. Labels were pre- processed into a reduced set of 11 classes for the challenge. We created additional training examples in two ways. First, we printed these generated images on paper and scanned them to recreate a digital copy. We created mul- tiple digital copies with physical augmentations including wrinkling and writing on the paper, and taking photos of the paper from different angles. Second, we sourced addi- tional copyright-free scans of paper ECG images from the internet (https://www.ecgguru.com/). For a small subset of these images, we also created ECG masks manu- ally by masking the ECG signal in image editing software. We developed a modular pipeline that takes in a raw im- age and outputs a digitized 12-lead ECG signal and classi- fication of the signal. An overview of the pipeline is shown in Figure 1 and is described in more detail below. 2.1 Determine number of ECG rows (step A.); Image Classification (step B.) We used a fine-tuned Yolov7 [8] to predict the bound- ing boxes of where individual ECG leads are present in an image. We use the bounding boxes to crop the image to the useful area and to determine the number of rows of ECG signal, which is used in E. Pixel Joining. We also classified the image as real or generated using a ResNet- 18 model [9]. This step produced a binary flag that was used in the subsequent steps. 2.2 ECG segmentation (step C.) The task of binarising the ECG image was treated as a two class segmentation problem, and approached follow- Computing in Cardiology 2024; Vol 51 Page 1 ISSN: 2325-887X DOI: 10.22489/CinC.2024.118 Figure 1. ECG digitization and interpretation pipeline. ing the ResUNet method proposed by Li et al. [10], with the blocks at each stage formed of a 3 × 3 convolution, batch normalization and LeakyReLU non-linearity. The model was trained using a combination of dice and focal loss. The ResUNet was trained on 4000 256×256 px patches. The model was also trained with augmentation designed to simulate the range of variation likely in the test set: colour jitters, blurring, random noise, rotations and scaling. Given the very different characteristics of the real and simulated ECG images, it was not possible to produce a single model that performed well across both domains. Therefore, separate domain-specific models were trained for the real and simulated data respectively. 2.3 Rotation (step D.) We considered the simplified scenario in which raw ECG images could be rotated, but not subject to a more general perspective transform. Our solution assumed that the output mask of C. ECG segmentation was mostly ac- curate. The optimum rotation of the mask was that which minimized the number of columns containing ECG pixels. 2.4 Pixel Joining (step E.) The historical approach to ECG digitization, as imple- mented by [2] among others, assumes that an image con- tains a contiguous selection of pixels that correspond to an ECG signal, and that at least one ECG pixel is present per column of the image. Dynamic programming is then em- ployed to find a minimum cost path across the image. This approach can fail when the image is noisy and when ECG signals overlap when adjacent leads have large amplitudes. We developed an alternative simple forward-search, de- picted in Figure 2. It takes a binary Unet output mask with labels background, ECG as input. We assume that we know the coordinates of an initial ECG pixel in the left- most column (a), and that we have a rough estimate of the baseline (red line). From the initial pixel, all contiguous ECG pixels in the column selected (b). We then consider in the next column and select all ECG pixel with minimum manhattan distance to the existing set of ECG pixels (c). If more there is more than one candidate pixel, we select the pixel closest to the baseline. If there are no candidate pixels in the next column, then skip a column (d). This process is repeated until the end of the image. Finally, the pixel with maximum distance to the baseline is retained to produce the ECG signal in pixel units (e). 2.5 Lead reconstruction (steps F-I.) F. Rhythm Lead detection and identification The rhythm lead is a single ECG lead that is recorded for a full ten seconds, and typically spans a whole ECG record- ing. Given the challenge assumption of a standard 3x4 grid layout of short ECG segments, any additional rows were assumed to be rhythm leads. We assumed that lead name of each rhythm strip was fully deterministic, depending on the total number of rhythm leads: lead II if one rhythm strip, leads II, V5 if two, and leads V1, II, V5 if three. F. Layout detection Real ECG images can vary in both lead order and lead layout on an image. We used correla- tions between leads to detect the alternative Cabrera format for lead order. Given the limitations of the generator, this was only applied to real images. G. Reference Pulse Detection The reference pulse is a square wave denoting 1 mV in amplitude and 0.2 seconds in duration. While it is commonly to the left of ECG, it can also appear to the right, or may not be present. Detection 1. informs ECG signal baseline and scaling parameters, and 2. ensures it is not mistaken for part of the ECG. We detect the location of a reference pulse by searching for square waves of equal width in the same position in at least three lines of the digitized signal. I. Signal Quality Assessment Challenges in reconstruc- tion occurred when E. Pixel Joining failed to correctly identify the start or end of a signal, or returned the same Page 2 Figure 2. Pixel joining algorithm to extract ECG signal from u-net output mask. (a) Initially, any ECG pixel is selected in the left-most column. (b) All adjoining ECG pixels in the column are added. (c) ECG pixels in the next column are considered; pixel with minimum distance to previous column is added. (d) Where no candidate pixels exist, column is skipped. (e) One pixel per column is selected to generate the ECG signal. set of overlapping pixels for a single signal. To mitigate the impact of errant reconstructions on overall SNR, we returned NaNs for a single lead that both 1. overlapped with another and 2. departed from its baseline for a pro- longed period. Furthermore, if rows of the reconstructed signal had different temporal (length) scaling factors, we assumed pixel joining had not been completed success- fully, and the entire reconstruction output was set to NaN. 2.6 Vectorization (step J.) The final digitization stage was to convert the ECG in pixel units into real units of mV and seconds. To do this, we first remove the reference pulse. We then scale the sig- nal, using the provided duration of the ECG signal, and the fact that 0.2 s of length corresponds to 5 mV in amplitude (height). Finally, in instances where a reference pulse is available, we set the bottom of the reference pulse to be the baseline of 0 mV. Otherwise, we use the median value (in pixels) of a row of ECG to be 0 mV. 2.7 ECG Interpretation (step K.) We interpreted the resulting 12-lead signal using a mod- ified version of Zhao et al’s SE Resnet [11,12]. This model output a set of probabilities corresponding to 11 labels. We retrained the model using data with missingness cor- responding to a standard 12-lead recording (i.e. with 2.5 s for most leads, instead of the original 10 s). We trained the model three times under different starting weights, and ensembled the models using the mean probabilities. To account for partially successful digitization, we trained an additional model with missing leads. A flag from I. was used to switch between the two models at inference time. To convert output probabilities to labels, we selected the label with the highest probability to be the primary class. Secondary classes were then selected if their probability exceeded a heuristic threshold of 0.3, determined by maxi- mizing F -measure on a held-out subset of the training set. Task Score Rank Digitization SNR: -5.272 N/A Classification F -measure: 0.082 N/A Table 1. Signal-to-noise (SNR) ratio and macro F - measure of our team’s model on the hidden data for the digitization and classification tasks, respectively. 3 Results The U-net performed well on generated signals, with a Dice score of 0.997. An example input image with shadow, corresponding output mask with pixel joining, and the re- constructed signal are shown in Figure 3. From visual in- spection, we concluded that the U-net performed worse on real images, likely due to the very limited training data. Our scores are given in Table 1. Our team was not ranked during the official phase, but was successfully eval- uated in the hackathon. 4 Discussion We developed an analysis pipeline that automatically in- terprets 12-lead ECG from an image. Our approach as- sumed that digitization was necessary to retrieve then clas- sify the informative signal. One significant weakness of our approach was that er- rors in individual modules could compound. We were particularly reliant on the performance of the ECG seg- mentation step; minor errors here led to large changes in SNR. Our use of a different segmentation model for real and generated images meant that, for the test set of primar- ily real images, our pipeline reverted to the less accurate ‘real’ model, leading to poor overall performance on both tasks. Another shortcoming of our approach – and the im- age generator – was that it did not consider perspective transforms of an image, nor local non-linear deformation from paper creases. In future work, we intend to unify the segmentation models. We will also investigate imputation of unreconstructed data using cross-lead information. Page 3 Figure 3. Example output of the digitization process (steps A-J). (a) shows the input image including noise and shadow, with detected areas of interest. (b) shows the unet output ECG mask (step C), and the output of D. Pixel Joining in colour. (c) shows the output signal in comparison to the PTB-XL original signal. Unplotted portions of the signal are not shown. References [1] Fortune JD, Coppa NE, Haq KT, Patel H, Tereshchenko LG. Digitizing ECG image: a new method and open- source software code. Computer Methods and Programs in Biomedicine 2022;221:106890. [2] Santamo´nica AF, Carratala´-Sa´ez R, Larriba Y, Pe´rez- Castellanos A, Rueda C. ECGMiner: A flexible software for accurately digitizing ECG. Computer Methods and Pro- grams in Biomedicine 2024;246:108053. [3] Reyna M, Deepanshi, Weigle J, Koscova Z, Elola A, Seyedi S, Campbell K, Clifford G, Sameni R. Digitization and Classification of ECG Images: The George B. Moody Phy- sioNet Challenge 2024. In 2024 Computing in Cardiology, volume 51. 2024; 1–4. [4] Reyna MA, Deepanshi, Weigle J, Koscova Z, Campbell K, Shivashankara KK, Saghafi S, Nikookar S, Motie-Shirazi M, Kiarashi Y, Seyedi S, Clifford GD, Sameni R. ECG- Image-Database: A Dataset of ECG Images with Real- World Imaging and Scanning Artifacts; A Foundation for Computerized ECG Image Digitization and Analysis, 2024. [5] Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: compo- nents of a new research resource for complex physiologic signals. Circulation 2000;101(23):e215–e220. [6] Wagner P, Strodthoff N, Bousseljot RD, Kreiseler D, Lunze FI, Samek W, Schaeffter T. PTB-XL, a large publicly avail- able electrocardiography dataset. Scientific Data 2020;7(1). [7] Shivashankara KK, Clifford GD, Reyna MA, Sameni R. ECG-Image-Kit: a synthetic image generation toolbox to facilitate deep learning-based electrocardiogram digitiza- tion. https://github.com/alphanumericslab/ ecg-image-kit, 2024. [8] Wang CY, Bochkovskiy A, Liao HYM. YOLOv7: Train- able bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proc. of the IEEE/CVF Conference on Comp. Vision and Pattern Recognition. 2023; 7464–7475. [9] He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. CoRR 2015;abs/1512.03385. [10] Li Y, Qu Q, Wang M, Yu L, Wang J, Shen L, He K. Deep learning for digitizing highly noisy paper-based ECG records. Computers in Biology and Medicine 2020; 127:104077. [11] Zhao Z, Fang H, Relton SD, Yan R, Liu Y, Li Z, Qin J, Wong DC. Adaptive lead weighted resnet trained with dif- ferent duration signals for classifying 12-lead ECGs. In 2020 Computing in Cardiology. IEEE, 2020; 1–4. [12] Zhao Z, Murphy D, Gifford H, Williams S, Darlington A, Relton SD, Fang H, Wong DC. Analysis of an adaptive lead weighted ResNet for multiclass classification of 12-lead ECGs. Physiological Measurement 2022;43(3):034001. Address for correspondence: Sara Summerton Kilburn Building, Oxford Road, M13 9PL, Manchester, UK sara.summerton@manchester.ac.uk Page 4