MosMedData: data set of 1110 chest CT scans performed during the COVID-19 epidemic

Cover Page

Abstract


With the ongoing COVID-19 pandemic decreasing availability of polymerase chain reaction with reverse transcription and the snowballing growth of medical imaging, especially the number of chest computed tomography (CT) scans being performed, methods to augment and automate the image analysis, increasing productivity and minimizing human error are of particular importance. The creation of high-quality datasets is essential for the development and validation of artificial intelligence algorithms. Such technologies have sufficient accuracy in diagnosing COVID-19 in medical imaging. The presented large-scale dataset contains anonymized human CT scans with COVID-19 features as well as normal studies. Some studies were tagged by radiologists using binary pixel masks of regions of interest (e.g., characteristic areas of consolidation and ground-glass opacities). CT data were acquired between March 1, 2020, and April 25, 2020, and provided by municipal hospitals in Moscow, Russia. The presented dataset is licensed under Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0).


Full Text

BACKGROUND

During the COVID-19 pandemic, most countries encountered a huge increase in the burden on health structures. More than ever, this situation required the careful use of financial and human resources. Unfortunately, the preventive measures taken in health facilities are not always sufficient to avoid the loss of health workers. The loss of trained specialists in emergency care, radiology, etc. is of particular concern. Computed tomography (CT) is considered to be the key tool for the diagnosis of pneumonia and the assessment of its progression. CT is performed in outpatient settings and is intended for patients with acute respiratory symptoms, as well as for those initially diagnosed with viral pneumonia requiring follow-up, and capable of recovering at home (under observation using telemedical technologies).

In in-patient facilities, CT is used for making the primary and differential diagnosis, assessing disease progression, and determining whether a patient should be admitted to the intensive care unit or discharged [1,3,4]. The increasing use of CTs is placing a heavy burden on the health care system. For example, in Moscow, the network of municipal outpatient CT centers is conducting approximately 90 examinations per CT scanner per day (with up to 163 examinations per day). Therefore, to standardize and streamline the clinical decision-making, specialists developed a classification model that, along with other symptoms, evaluates the severity of pulmonary tissue anomalies observed on CT scans (see Table 2). This classification according to the pulmonary parenchyma lesion volume in chest CT allows to predict lethal outcomes in COVID-19 [9]. Professional burnout and high risks of death among health professionals require image analysis automation, which will increase productivity and minimize errors [8]. Preliminary data show that artificial intelligence (AI) algorithms have sufficient accuracy for diagnosing COVID-19 (sensitivity: 90%, specificity: 96%, AUC: 0.96, overall accuracy: 76.37–98.26). [6,10].

MATERIALS AND METHODS

Chest CT was performed on 42 CT scanners of the same model Toshiba Aquilion 64 (Canon Medical Systems, Japan). All examinations were performed according to the standard methods and protocols recommended by the manufacturer (Table 1):

One examination refers to a single patient and includes one three-dimensional reconstruction. The inclusion criteria were as follows: patient visit to an outpatient clinic, reorganized as Outpatient Computed Tomography Center during the pandemic as well as referral for a chest CT from the general practitioner under the obligatory health insurance.

 

Table 1. Methods of scanning, reconstructing images, and saving the database

Parameter set

Feature

Meaning and comment

Equipment

CT-scanner

Toshiba Aquilion 64 (Canon Medical Systems, Japan)

Number of slices

64

Patients

Patient positioning

Gantry centered at the thorax

Table height and alignment are adjusted such that the middle clavicular line is in the isocenter

Hands above the head

Instructions for breathing

Patient education and breathing instruction before scanning

Clothing and foreign objects

All foreign objects should be removed from the scan area, including jewelry and chains around the neck.

Underwear is acceptable.

Patients

Localizer/scout

- Was conducted at the chest level to limit the scanning to the lung range.

- Was performed to find additional foreign objects at the scan level that could impair the image quality.

- Breath-hold scan at breathing depth.

Scanning range

The entire volume of the lungs, including 5 cm above and 5 cm below the lungs.

Breathing phase

CT scan with breath-holding at inspiration depth.

Field of view (FOV)

- Not less than 1 cm from the ribs (from 350 to 500 mm).

- The breasts were included in the scanning area, but could be partially excluded from the field of view

Medical staff

Technician

He was in the control room and not in contact with the patient. Face-to-face contact with the positioning assistant was minimized for safety reasons.

Stacker

The positioning assistant is a medical officer of the Radiology Department who was transferred from the mammography X-ray technicians to the CT room in the form of additional personnel during the epidemic according to the order of the Moscow Department of Health.

He was located in the scanner room (assisting with patient and table positioning) and in the corridor (during scanning). He was in contact with the patient.

Scanning
protocol and image reconstruction, viewing, and interpretation

Gentri tilt

no

Scan duration

≤ 10 seconds

Contrast enhancement

no

Oral contrast

no

Voltage

120 kV

Current

Automatic power modulation system «Sure exp.3D», built into the CT manufacturer. The system automatically adjusted the current strength to achieve a noise level of 10 HU for 5.0 mm-thick slices thick in the range of 80–500 mA.

XY modulation - on

Rotation speed

0,5 s

Pitch

95,0

Recon process

QDS+

Scanning protocol and image reconstruction, viewing, and interpretation

Number of CT series reconstructed

2 (with pulmonary and soft tissue kernel3)

Convolution kernel for soft tissues

FC07 or FC18

Convolution kernel for lungs

FC51

Slice thickness

1.0 mm (same for both kernels)

Increment

0.8 mm (same for both kernels)

Iterative reconstruction

AIDR 3D was availible in only 5 tomographs, the rest - without iterative reconstruction algorithms - used FBP (filtered back projection).

Software used for CT interpretation

AGFA Enterprise 8.0 Vitrea FX

Maximum Intensity Projections (MIP), Minimum Intensity Projections (MIP), and Multiplanar Reconstruction (MPR)

Maximum Intensity Projections (MIP), Minimum Intensity Projections (MIP), and Multiplanar Reconstruction were used

Artificial Intelligence Algorithms

They were used, but not for all examinations.

In the case of machine learning, algorithms created an additional image series for the radiologist, highlighting the COVID-19 lung lesion. COVID-19 was shown as red rectangles, attracting the attention of the doctor. In addition, a summarized three-dimensional reconstruction of the lungs with red regions of interest was available. Quantitative information to estimate the degree of lung damage was not presented.

Report turnaround time

from 10 min to 3 hours.

In rare cases 24 hours.

Protocol standartization

The structured report template was formed and regulated in the methodical recommendations, as well as implemented in the Unified Radiological Information Service, used for study reporting in the outpatient clinics.

COVID-19 classification

Classification by the CT0-CT4 scale was used (see table).

Second opinion

For 90% of all CT examinations from outpatient clinics, a second reading was performed.

Effective dose calculation

DLP data from the automatically created DoseReport CT series were used. In the Russian Federation, according to the methodological guidelines (MU 2.6.1.2944-11) «Control of Effective Patient Doses during Medical Radiology», the effective dose is calculated by multiplying DLP by 0.017 (anatomic location-based index).

Dataset

Data acquisition

Unified Radiological Information Service, including AGFA Enterprise 8.0

Initial data collection format

DICOM 3.0

Plane

Axial

Data base

Slice thickness

1.0 mm

Increment

8.0 mm (as every 10th slice is saved)

Export file extenstion

NIfTI

Annotation software in the form of binary masks with lung lesions

MedSeg® (© 2020 Artificial Intelligence AS)

Notes: CT — computed tomography; CT-1 – CT-4 — the degree of lung damage based on CT results; RR — respiratory movements rate; FiO2 — oxygen concentration; SpO2 — blood oxygen saturation.

 

The criteria for exclusion from the study included pregnancy and age under 18 years. Patients with blood oxygenation less than 93%, identified before the CT scan, were removed from the study and sent to be hospitalized by the ambulance service.

The dataset was developed in five stages as discussed below.

DATA COLLECTION

Data collection was conducted in the period from March 1 to April 25, 2020 in the outpatient clinics of Moscow City Health Care (Table 3).

 

Table 2. Lung lesion grading in COVID-19 and routing rules

Severity

CT category

Clinical Data

Decision

Zero

CT-0 Not consistent with pneumonia (including COVID-19).

Inform the attending physician. Refer to a specialist.

Mild

CT-1 Ground-glass opacities. Pulmonary parenchymal involvement =<25% OR absence of CT signs in the presence of typical clinical manifestations and relevant epidemiological history.

A. t <38.0ºС
B. RR <20/min
C. SpO2 >95%

Follow-up at home using telemedicine technologies (mandatory telemonitoring)

Moderate

CT-2 Ground-glass opacities. Pulmonary parenchymal involvement 25–50%

A. t >38.5ºС
B. RR 20–30/min
C. SpO2 95%

Follow-up at home by a primary care physician

Severe

CT-3 Ground-glass opacities. Pulmonary consolidation. Pulmonary parenchymal involvement of 50–75%. Lung involvement increased in 24–48 hours by 50% with respiratory impairment per the follow-up studies.

One or more signs on the background of fever:
A. t >38,5ºС
B. RR ≥30/min
C. SpO2 ≤95%
D. Partial pressure of oxygen (PaO2)/ Fraction of inspired oxygen (FiO2) ≤300 mmHg
(1 mmHg=0,133 kPa)

Immediate admission to a COVID specialized hospital. In a hospital setting: immediate transfer to the intensive care and resuscitation unit. Emergency computed tomography (if not done before).

Critical

CT-4 Diffuse ground-glass opacities with consolidations and reticular changes. Hydrothorax (bilateral, more on the left). Pulmonary parenchymal involvement >=75%.

Signs of shock, multiple organ failure, and respiratory failure.

Emergency medical care. Immediate admission to a specialized hospital for patients diagnosed with COVID-19. In a hospital setting: immediate transfer to the intensive care and resuscitation unit. Emergency computed tomography (if not done before and when patient status allows for it).

Notes: CT — computed tomography; CT-1 – CT-4 — the degree of lung damage based on CT results; RR — respiratory movements rate; FiO2 — oxygen concentration; SpO2 — blood oxygen saturation.

 

Table 3: List of medical organizations where CT data was collected

Municipal Hospital (MH) № 19 Department of Health Care of Moscow

MH № 214

MH № 52

MH № 23

MH № 6

Diagnostic Center № 5

MH № 3

MH № 209

MH № 9

MH № 62

Diagnostic Center № 4

MH № 218

MH № 175

MH № 212

MH № 170

MH № 191

MH № 8

M. P. Conchalovsky hospital
(outpatient and in-patient care)

MH № 195

MH № 64

MH № 134

MH № 115

Pediatric Diagnostic Center № 1

MH № 67

Diagnostic Center № 121

MH № 36

MH № 68

Diagnostic Center № 2

MH № 11

MH № 180

MH № 45

MH № 5

MH № 5

MH № 2

Moscow Research and Practical Center for Tuberculosis Control of the South-East Moscow District

MH № 46

MH № 166

Moscow Research and Practical Center for Tuberculosis Control of the Central and West Moscow Districts

MH № 12

MH № 220

MH № 66

Diagnostic Center № 3

 

This dataset (1110 studies) contains anonymized human lung CT scans (CT scans) with signs of COVID-19 (CT1-CT4) and without signs of COVID-19 (CT0) (Figure 1). Sample characteristics: 1110 individuals, of whom 42% were males, 56% females, 2% other/unknown; aged 18 to 97 years old, median age 47.

 

Figure 1: The order of forming a dataset.

Note: CT — computed tomography.

 

Figure 2: Examples of chest CT scans of patients with varying degrees of COVID-19 severity. Left to right, upper row: axial CT slices of patients with COVID-19 from mild (CT-1) to critical (CT-4) severity. Left to right, lower row: same CT data after tagging.

 

Figure 3: Data storage structure in the dataset.

 

At the first stage, all the examinations (n=1110) were distributed into five categories according to the classification (Table 2). The number of cases by categories: CT-0, 254 (22.8%); CT-1, 684 (61.6%); CT-2, 125 (11.3%); CT-3, 45 (4.1%); and CT-4, 2 (0.2%). Second, each study was saved in the NIfTI format and archived in the Gzip archive. During this process, only every 10th image (Instance) was saved in the final study file.

A small number of the CT scans (n = 50) was tagged by specialists from the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health. During the markup, positive (white) pixels on the corresponding binary pixel mask were selected for each of the images. The obtained masks were saved in NIfTI format and then converted to the Gzip archives. MedSeg® annotation software (© 2020 Artificial Intelligence AS) was used to create the binary masks.

This software was used to tag only COVID-19 lesions, including ground-glass opacities, consolidation, small vessels, and bronchioles. The density thresholds for tagging were from −700 HU to −130 HU, but it could differ depending on the breathing depth. We excluded large vessels and bronchi, visually unchanged pulmonary parenchyma, motion artifacts (respiratory due to cough and respiratory failure), gravitational changes (if it was possible to reliably differentiate them), calcifications, and pleural effusion.

All chest CT scans used in the dataset have passed an independent external audit by radiologists from the Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies of the Moscow Department of Health, the opinion of which was accepted as final to assess the severity of COVID-19 lung damage according to the adopted classification (CT0-CT4). These data were available in URIS in a structured form to constitute the final table of assessment results. Thus, all the studies were evaluated by at least two specialists. In addition, 50 studies were evaluated by three specialists, as they were annotated using the external MedSeg software.

The data set is intended for training, calibration, and the independent evaluation of AI algorithms (computer vision) [7]. The COVID-19 AI algorithms (computer vision) will help in the fight against this disease:

  1. Examine patients in outpatient facilities for fast and consistent routing (including those based on CT0-4 criteria).
  2. Prioritize studies with COVID-19 features in a worklist.
  3. Perform a rapid and qualitative assessment of abnormal changes by comparing several studies.
  4. Minimize the risk of errors and missed anomalies.

Currently, there is a wide range of publicly available COVID-19 data sets [2,5]. However, this should not be seen as an obstacle, since the development of artificial intelligence algorithms requires large amounts of qualitative clinical information that are representative of real patient populations. In addition, Artificial Intelligence algorithms should be tested using new data sets that were not used in the training and calibration stages. The more data available in open sources, the better for developers. The available data sets are relatively small and rarely contain additional information such as tags and/or binary masks for regions of interest (ROI).

How to use the dataset

Permanent link: https://mosmed.ai/datasets/covid19_1110. This data set is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) license.

ADDITIONAL INFO

Funding. The study had no sponsorship.

Conflict of interest. The authors declare no conflict of interest regarding this publication.

Authors contribution. S.P. Morozov — concept of research; A.E. Andreychenko — study design, data set formation; I.A. Blokhin — data markup, manuscript editing; P.B. Gelezhe — search for publications on the topic of the article, data markup; A.P. Gonchar — data markup, expert assessment of information; A.E. Nikolaev — data markup, expert assessment of information; N.A. Pavlov, V.Yu. Chernina, V.A. Gombolevsky — manuscript writing, preparing the dataset. All authors made a significant contribution to the study and preparation of the article, read and approved the final version before publication.

Acknowledgements. The authors express their gratitude to all doctors of the Moscow Health Department who are fighting the epidemic.

About the authors

Sergey P. Morozov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Email: morozov@npcmr.ru
ORCID iD: 0000-0001-6545-6170
SPIN-code: 8542-1720

Russian Federation, Moscow

MD, PhD, Professor

Anna E. Andreychenko

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Email: a.andreychenko@npcmr.ru
ORCID iD: 0000-0001-6359-0763
SPIN-code: 6625-4186

Russian Federation, Moscow

MD

Ivan A. Blokhin

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Email: i.blokhin@npcmr.ru
ORCID iD: 0000-0002-2681-9378
SPIN-code: 3306-1387

Russian Federation, Moscow

MD

Pavel B. Gelezhe

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Email: gelezhe.pavel@gmail.com
ORCID iD: 0000-0003-1072-2202
SPIN-code: 4841-3234

Russian Federation, Moscow

MD, PhD

Anna P. Gonchar

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Email: a.gonchar@npcmr.ru
ORCID iD: 0000-0001-5161-6540
SPIN-code: 3513-9531

Russian Federation, Moscow

MD

Alexander E. Nikolaev

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Email: a.e.nikolaev@yandex.ru
ORCID iD: 0000-0001-5151-4579
SPIN-code: 1320-1651

Russian Federation, Moscow

MD

Nikolay A. Pavlov

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Email: n.pavlov@npcmr.ru
ORCID iD: 0000-0002-4309-1868
SPIN-code: 9960-4160

Russian Federation, Moscow

MD, MPA

Valeria Yu. Chernina

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Email: v.chernina@npcmr.ru
ORCID iD: 0000-0002-0302-293X
SPIN-code: 8896-8051

Russian Federation, Moscow

MD

Victor A. Gombolevskiy

Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies, Department of Health Care of Moscow

Author for correspondence.
Email: g_victor@mail.ru
ORCID iD: 0000-0003-1816-1315
SPIN-code: 6810-3279

Russian Federation, Moscow

MD, PhD, MPH

References

  1. Ai T, Yang Z, Hou H, et al. Correlation of chest CT and RT-PCR testing in Coronavirus Disease 2019 (COVID19) in China: a report of 1014 cases. Radiology. 2020;296(2):E32–E40. doi: 10.1148/radiol.2020200642
  2. Handbook of COVID-19 Prevention and Treatment. Ed. by T. Liang. Zhejiang University School of Medicine; 2020. 68 p.
  3. Huang Z, Zhao S, Li Z, et al. The battle against Coronavirus Disease 2019 (COVID-19): emergency management and infection control in a Radiology Department. J Am Coll Radiol. 2020;17(6):710–716. doi: 10.1016/j.jacr.2020.03.011
  4. Morozov SP, Gombolevskiy VA, Cherninа VY, et al. Prediction of lethal outcomes in COVID-19 cases based on the results chest computed tomography. Tuberculosis and Lung Diseases. 2020;98(6):7–14. (In Russ.) doi: 10.21292/2075-1230-2020-98-6-7-14
  5. Morozov S, Guseva E, Ledikhova N, et al. Telemedicine-based system for quality management and peer review in radiology. Insights Imaging. 2018;9(3):337–341. doi: 10.1007/s13244-018-0629-y
  6. Li L, Qin L, Xu Z, et al. Using artificial intelligence to detect COVID-19 and community-acquired pneumonia based on pulmonary CT: evaluation of the diagnostic accuracy. Radiology. 2020;296(2):E65–E71. doi: 10.1148/radiol.2020200905
  7. Ucar F, Korkmaz D. COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images. Med Hypotheses. 2020;140:109761. doi: 10.1016/j.mehy.2020.109761
  8. Vremennye metodicheskie rekomendatsii “Profilaktika, diagnostika i lechenie novoi koronavirusnoi infektsii (COVID-19). Versiya 9” (utv. Ministerstvom zdravookhraneniya RF 26 oktyabrya 2020). Available from: https://base.garant.ru/74810808/
  9. Morozov SP, Protsenko DN, Smetanina SV, editors. Radiation diagnostics of coronavirus disease (COVID-19): organization, methodology, interpretation of results: guidelines. Series “Best practices of radiation and instrumental diagnostics”. Issue 65. Moscow; 2020.
  10. Morozov SP, Vladzymyrskyy AV, Klyashtornyy VG, et al. Clinical acceptance of software based on artificial intelligence technologies (radiology). Series “Best practices in medical imaging”. Moscow; 2019. Issue 57.
  11. Cohen JP, Morrison P, Dao L. COVID-19 Image Data Collection [Internet]. 2020 [cited 2020 Mar 25]. Available from: https://arxiv.org/abs/2003.11597
  12. Jun M, Cheng G, Yixin W, et al. COVID-19 CT lung and infection segmentation dataset. Verson 1.0. 2020. doi: 10.5281/zenodo.3757476

Supplementary files

Supplementary Files Action
1.
Figure 1: The order of forming a dataset.

Download (212KB) Indexing metadata
2.
Figure 2: Examples of chest CT scans of patients with varying degrees of COVID-19 severity. Left to right, upper row: axial CT slices of patients with COVID-19 from mild (CT-1) to critical (CT-4) severity. Left to right, lower row: same CT data after tagging.

Download (304KB) Indexing metadata
3.
Figure 3: Data storage structure in the dataset.

Download (263KB) Indexing metadata

Statistics

Views

Abstract - 170

PDF (Russian) - 44

PDF (English) - 5

PDF (简体中文) - 2

Cited-By


Article Metrics

Metrics Loading ...

PlumX

Dimensions


Copyright (c) 2021 Morozov S.P., Andreychenko A.E., Blokhin I.A., Gelezhe P.B., Gonchar A.P., Nikolaev A.E., Pavlov N.A., Chernina V.Y., Gombolevskiy V.A.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies