Medical academic journalMedical academic journal1608-41012687-1378Eco-Vector1773710.17816/MAJ17737Original ArticleSAS Enterprise Guide 6.1 for physicians: correlation analysisBunenkovNikolay S.<p>aspirant, Department of Faculty Surgery</p>bunenkov2006@gmail.comhttps://orcid.org/0000-0003-4331-028XBunenkovaGulnara F.<p>Resident, Department of Hospital Therapy</p>gulnara533@gmail.comKomokVladimir V.<p>PhD, cardiac surgeon, Department of Cardiac Surgery No. 2</p>vladimir_komok@mail.ruhttps://orcid.org/0000-0002-3834-7566GrinenkoOleg A.<p>Doctor of Medical Science, Vice-Rector</p>klinika@spb-gmu.ruNemkovAlexander S.<p>Doctor of Medical Science, Professor, cardiac surgeon, Chief of Department of Cardiac Surgery No. 2</p>nemk_as@mail.ruhttps://orcid.org/0000-0002-5152-0001Pavlov First Saint Petersburg State Medical University2206202020151561311201931032020Copyright © 2020, Bunenkov N.S., Bunenkova G.F., Komok V.V., Grinenko O.A., Nemkov A.S.2020<p><strong><em>Objective:</em></strong> to develop algorithm of correlation analysis of prospective non-randomized clinical trial AMIRICABG (ClinicalTrials.gov Identifier: NCT03050489) data using SAS Enterprise Guide 6.1.</p>
<p><strong><em>Materials and methods.</em></strong> Data collection was performed according prospective non-randomized clinical trial AMIRICABG (ClinicalTrials.gov Identifier: NCT03050489) in Pavlov First Saint Petersburg State Medical University, Saint Petersburg, Russia between 20162019 years with 336 patients. There is database with clinical, laboratory and instrumental data. Correlation analysis was performed with SAS Enterprise Guide 6.1.</p>
<p><strong><em>Results.</em></strong> There was developed algorithm of correlation analysis data of prospective non-randomized clinical trial AMIRICABG (ClinicalTrials.gov Identifier: NCT03050489). This algorithm could be useful for physicians and researchers for data analysis.</p>
<p><strong><em>Conclusion.</em></strong> Presented algorithm of correlation analysis could make easier and improve efficient data analysis with SAS Enterprise Guide 6.1.</p>SAS Enterprise Guide 6.1statistical analysisclinical trialscorrelation analysisPearson coefficient correlationSpearman coefficient correlationSAS Enterprise Guideстатистическая обработка данныхстатистикаклинические исследованиянаучные исследованиякорреляционный анализкоэффициент корреляции Пирсонакоэффициент корреляции Спирмена<h2>Introduction</h2>
<p>One of the aims of scientific research is to establish a relationship between studied parameters; for example, a marker in blood plasma and a drug concentration. It is widely believed that to identify the relationship between two variables, a correlation should be evaluated. The word relationship is often replaced by the word correlation. However, the presence of a correlation does not imply a direct causal relationship, just as the absence of a correlation does not exclude the existence of a relationship between two variables, including a causal one [3]. One of the question of the clinical trial at the cardiac surgery center of the Pavlov First Saint Petersburg State Medical University was whether there is a relationship between duration of aortic cross-clamping during coronary artery bypass grafting (CABG) and the troponin I level after procedure?</p>
<p>In our example, we used the following variables for analysis: TnIEndOp as the troponin I level at the end of procedure, TnI1 as the troponin I level on day 1 after CABG, and AoClamp as the duration of aortic cross-clamping in minutes. Patients were allocated into the three groups: group 1 off-pump CABG, group 2 on-pump CABG, and group 3 CABG with parallel cardiopulmonary bypass.</p>
<p><strong>This study aimed</strong> to develop an algorithm for processing the database of the AMIRICABG prospective nonrandomized clinical trial (ClinicalTrials.gov Identifier: NCT03050489) using the SAS Enterprise Guide 6.1 software package.</p>
<h2>Materials and methods</h2>
<p>The AMIRICABG prospective nonrandomized clinical trial (ClinicalTrials.gov Identifier: NCT03050489) was performed at the Research Center for Cardiovascular Surgery of the Pavlov First Saint Petersburg State Medical University between 2016 and 2019 and enrolled 336 patients with coronary heart disease who had indications for coronary revascularization surgery. A database with the results of clinical, laboratory, and instrumental studies was created. Statistical database processing was performed using SAS Enterprise Guide 6.1 licensed software.</p>
<h2>Results and discussion</h2>
<p>The type of distribution of the studied variables was determined prior to the correlation analysis [1]. If the distribution differed from normal, it can often be transformed to a normal distribution, which has been described previously [1]. In our case, the distribution of the variable TnIEndOp (level of troponin I at the end of procedure) was became normal after logarithmic transformation. The variables TnI1 (level of troponin I on day 1 after procedure) and AoClamp (duration of aortic cross-clamping) were not normally distributed. The correlation was calculated for the normally distributed variables using Pearsons correlation coefficient analysis. Spearmans correlation coefficient analysis was used for the non-normally distributed variables. It should be noted that Pearsons correlation coefficient enables identifying a linear relationship. If the relationship is nonlinear, then this method will reveal the absence of a correlation [2].</p>
<p>Then a code which is presented can be used to determine the nature of the distribution as well as the distribution can be brought to normal if it differs from normal.</p>
<p>ods graphics on;</p>
<p>/* The normality of the distribution is tested according to the Kolmogorov-Smirnov test or the Shapiro-Wilk test, if p0.05 then the distribution is normal. */</p>
<p>Proc UNIVARIATE DATA=WORK.20_06_2019 workn normaltest plots;</p>
<p>where CPBType=2;</p>
<p>VAR TnIEndOp TnI1 AoClamp;</p>
<p>run;</p>
<p>/* Bringing to the normal distribution using the logarithm */</p>
<p>DATA NEWDATASET; /* Create a new data table */</p>
<p>SET WORK.20_06_2019 workn;</p>
<p>/* Transfer all our data to the new table */</p>
<p>LGTnI1=LOG10(TnI1); /* Logarithm to normalize */</p>
<p>LGTnIEndOp=LOG10(TnIEndOp);</p>
<p>LGAoClamp=LOG(AoClamp);</p>
<p>RUN;</p>
<p>/* TnIEndOp - troponin level at the end of the surgery</p>
<p>TnI1 - troponin level on the 1st day after the surgery</p>
<p>AoClamp - duration of the aortic cross-clamping; */</p>
<p>We created a scatter plot after determining the nature of the variable distributions (Fig. 1). The scatter plot shows whether there was a correlation between the increase in troponin I level at the end of surgery and on day 1 after procedure.</p>
<p></p>
<center>
<div class="preview fancybox" style="text-align: center;"><a title="Fig. 1. Scatter plot for correlation analysis of troponin I level at the end of operation and on the 1st postoperative day" href="/files/journals/21/articles/17737/supp/17737-72079-1-SP.jpg" rel="simplebox"><img style="max-height: 300px; max-width: 300px;" src="/files/journals/21/articles/17737/supp/17737-72079-1-SP.jpg" /></a></div>
</center>
<p><strong>Fig. 1. Scatter plot for correlation analysis of troponin I level at the end of operation and on the 1st postoperative day</strong></p>
<p><strong>Рис. 1. Скатерограмма зависимости уровня тропонина I на первые сутки после операции от уровня тропонина I к концу операции</strong></p>
<p></p>
<p>/* We evaluate the nature of the relationship between the level of troponin I at the end of the surgery and the level of troponin I on day 1 after the surgery */</p>
<p>proc sgplot data=NEWDATASET;</p>
<p>WHERE CPBTYPE=2;</p>
<p>title Scattergram;</p>
<p>scatter x=LGTnIEndOp y=LGTnI1;</p>
<p>ellipse x=LGTnIEndOp y=LGTnI1;</p>
<p>label LGTnIEndOp = troponin I level at the end of the surgery;</p>
<p>label LGTnI1 = troponin I level on day 1 after the surgery;</p>
<p>run;</p>
<p>Fig. 1 shows that the points tended to be distributed along an oblique straight line; therefore, the dependence of the two variables was linear and a linear correlation analysis can be applied.</p>
<p>TITLE Correlation of the initial values of troponin I at the end of the surgery and on day 1 after the surgery;</p>
<p>proc corr DATA=WORK.20_06_2019 workn pearson spearman kendall hoeffding fisher;</p>
<p>WHERE CPBTYPE=2;</p>
<p>var TnI1;</p>
<p>with TnIEndOp;</p>
<p>run;</p>
<p>Notes.</p>
<p>pearson indicates to calculate Pearsons correlation coefficient (two variables under study are normally distributed);</p>
<p>spearman indicates to calculate Spearmans correlation coefficient (one or two of the variables under study have a distribution different from normal);</p>
<p>kendall indicates to calculate Kendalls correlation coefficient;</p>
<p>fisher indicates to calculate the confidence interval of the correlation.</p>
<p>Click the <strong>Run</strong> button (Fig. 2).</p>
<p></p>
<center>
<div class="preview fancybox" style="text-align: center;"><a title="Fig. 2. Correlation analysis" href="/files/journals/21/articles/17737/supp/17737-72080-1-SP.jpg" rel="simplebox"><img style="max-height: 300px; max-width: 300px;" src="/files/journals/21/articles/17737/supp/17737-72080-1-SP.jpg" /></a></div>
</center>
<p><strong>Fig. 2. Correlation analysis</strong></p>
<p><strong>Рис. 2. Расчет корреляции</strong></p>
<p></p>
<p>The following table was generated as results (Fig. 3).</p>
<p></p>
<center>
<div class="preview fancybox" style="text-align: center;"><a title="Fig. 3. Correlation analysis results" href="/files/journals/21/articles/17737/supp/17737-72081-1-SP.jpg" rel="simplebox"><img style="max-height: 300px; max-width: 300px;" src="/files/journals/21/articles/17737/supp/17737-72081-1-SP.jpg" /></a></div>
</center>
<p><strong>Fig. 3. Correlation analysis results</strong></p>
<p><strong>Рис. 3. Результаты корреляционного анализа</strong></p>
<p></p>
<p>As one of the variables had a distribution that differed from normality, we obtained the results from the Spearmans correlation statistics section (see Fig. 3). The correlation value was 0.88 (strong correlation), the 95% confidence interval was 0.810.92, and the significance level was <em>p</em> 0.0001. The coefficient of determination was 0.88<sup>2</sup> = 0.77, so one variable explained 77% of the variability of the other, which indicates a strong relationship between the two variables.</p>
<p>Thus, the level of troponin I on day 1 after procedure strongly correlated with the level of troponin I at the end of surgery. This result was expected based on the pathophysiology of the myocardial ischemic reperfusion injury.</p>
<p>We performed a correlation analysis to identify the relationship between the duration of aortic cross-clamping (AoClamp) and the troponin I level at the end of surgery (TnIEndOp). The scatter plot was generated as follows:</p>
<p>TITLE Dependence of troponin I level on the duration of the aortic cross-clamping;</p>
<p>proc sgplot data=NEWDATASET;</p>
<p>WHERE CPBType=2;</p>
<p>title Scattergram;</p>
<p>scatter x=AoClamp y=LGTnIEndOp;</p>
<p>ellipse x=AoClamp y=LGTnIEndOp;</p>
<p>label AoClamp = aortic compression time;</p>
<p>label TnIEndOp = troponin I level at the end of the surgery;</p>
<p>run;</p>
<p>We obtained a scatter plot (Fig. 4) of the dependence of the troponin I level at the end of surgery on duration of the aortic cross-clamping.</p>
<p></p>
<center>
<div class="preview fancybox" style="text-align: center;"><a title="Fig. 4. Scatter plot aorta clamping time troponin I level" href="/files/journals/21/articles/17737/supp/17737-72082-1-SP.jpg" rel="simplebox"><img style="max-height: 300px; max-width: 300px;" src="/files/journals/21/articles/17737/supp/17737-72082-1-SP.jpg" /></a></div>
</center>
<p><strong>Fig. 4. Scatter plot aorta clamping time troponin I level</strong></p>
<p><strong>Рис. 4. Скатерограмма между временем пережатия аорты и уровнем тропонина I к концу операции</strong></p>
<p></p>
<p>The wide scatter of points from the boundaries of the ellipse is noteworthy, so it is rather difficult to draw a line along where most points were located, and the ellipse tended to be in the contour of a circle (see Fig. 4). The scatter plot revealed no correlation between duration of the aortic cross-clamping and the increase of troponin I at the end of procedure.</p>
<p>We calculated Spearmans correlation coefficient using the following:</p>
<p>TITLE Correlation of troponin I level at the end of surgery and duration of the aortic cross-clamping;</p>
<p>proc corr DATA=NEWDATASET pearson spearman kendall hoeffding fisher;</p>
<p>where CPBTYPE=2;</p>
<p>var AoClamp;</p>
<p>with LGTnIEndOp;</p>
<p>label LGTnIEndOp = troponin I level at the end of the surgery;</p>
<p>run;</p>
<p>The result obtained is presented in Fig. 5.</p>
<p></p>
<center>
<div class="preview fancybox" style="text-align: center;"><a title="Fig. 5. Correlation analysis aorta clamping time troponin I level" href="/files/journals/21/articles/17737/supp/17737-72083-1-SP.jpg" rel="simplebox"><img style="max-height: 300px; max-width: 300px;" src="/files/journals/21/articles/17737/supp/17737-72083-1-SP.jpg" /></a></div>
</center>
<p><strong>Fig. 5. Correlation analysis aorta clamping time troponin I level</strong></p>
<p><strong>Рис. 5. Корреляция между временем пережатия аорты и уровнем тропонина I к концу операции</strong></p>
<p></p>
<p>The correlation between duration of the aortic cross-clamping and the level of troponin I at the end of procedure was 0.4 (weak). The 95% confidence interval was 0.190.57 which is relatively wide, but the significance level was <em>p</em> = 0.0002 (correlation is statistically significant). The coefficient of determination was 0.4<sup>2</sup> = 0.16. Based on the calculations, one variable explained only 16% of the variability of the other variable, which is extremely low. Thus, we concluded that duration of the aortic cross-clamping does not affect the increase in the level of troponin I at the end of procedure. <em>r </em>values 0.4 are considered as a weak correlation; 0.4 <em>r</em> 0.8, the shows medium strength of correlation, and <em>r </em> 0.8 is a strong correlation.</p>
<p>As was mentioned above, the absence of a correlation may indicate the absence of a linear relationship between the two variables, but a nonlinear association is also possible. The scatter plot presented in Fig. 4 tended to be uniform, which indicates the absence of linear and nonlinear relationships. The correlation analysis showed the absence of correlation in the case of nonlinear dependence; therefore, scatter plot should be created before correlation analysis. An example of a nonlinear relationship between two variables is presented in Fig. 6.</p>
<p></p>
<center>
<div class="preview fancybox" style="text-align: center;"><a title="Fig. 6. Nonlinear relationship" href="/files/journals/21/articles/17737/supp/17737-72084-1-SP.jpg" rel="simplebox"><img style="max-height: 300px; max-width: 300px;" src="/files/journals/21/articles/17737/supp/17737-72084-1-SP.jpg" /></a></div>
</center>
<p><strong>Fig. 6. Nonlinear relationship</strong></p>
<p><strong>Рис. 6. Нелинейная связь между двумя переменными</strong></p>
<p></p>
<p>Correlation analysis is widely referred to in the Russian literature, but knowledge of biostatistics is necessary to understand the limits of its applicability and to interpret the results [2].</p>
<p>Despite the absence of a correlation, the presence of a relationship between the studied variables was not excluded. Perhaps it is worth trying to bring one of the variables under study to a binary form and apply logistic regression.</p>
<p>The correlation analysis algorithm developed is presented in such a way that a researcher who has never worked with the SAS Enterprise Guide 6.1 software product can process a database. One of the advantages of SAS is the ability to calculate the confidence interval for the correlation coefficient. Term Fisher should only be specified in the correlation analysis procedure, while this is not possible in other software products, and confidence intervals for the correlation coefficients must be calculated manually [2]. This article discusses the issues of practical application of a correlation analysis in the SAS software package; the theoretical aspects are reviewed only to a small extent, as they are represented well in the literature [25].</p>
<p>Thus, SAS Enterprise Guide 6.1 provides a complete set of modern data processing methods necessary for the medical researcher.</p>
<h2>Conclusion</h2>
<ol>
<li>SAS Enterprise Guide 6.1 enables quick and convenient correlation analyses, which makes this software package useful for researchers.</li>
<li>The correlation analysis algorithm developed can be used by researchers to process various databases from scientific studies and clinical trials.</li>
</ol>
<h2>Additional information</h2>
<p><strong>Source of funding.</strong> The study was conducted under the state task on the topic Assessing the regenerative potential of a patient during heart surgery.</p>
<p><strong>Ethical considerations.</strong> The study protocol was approved by the local Ethics Committee of the Pavlov First Saint Petersburg State Medical University.</p>
<p><strong>Conflict of interests.</strong> The authors declare no conflicts of interest.</p>[Гржибовский А.М. Корреляционный анализ // Экология человека. – 2008. – № 9. – C. 50–60. [Grjibovski АM. Correlation analysis. Ecology, human. 2008;(9):50-60. (In Russ.)]][Гржибовский А.М., Иванов С.В., Горбатова М.А. Корреляционный анализ данных с использованием программного обеспечения STATISTICA и SPSS // Наука и здравоохранение. – 2017. – № 1. – C. 7–36. [Grjibovski AM, Ivanov SV, Gorbatova MA. Correlation analysis of data using statistica and spss software. Nauka i zdravookhranenie. 2017;(1):7-36. (In Russ.)]][Гржибовский А.М., Иванов С.В., Горбатова М.А. Экологические (корреляционные) исследования в здравоохранении // Наука и здравоохранение. – 2015. – № 5. – C. 5–18. [Grjibovski AM, Ivanov SV, Gorbatova MA. Ecological (correlation) studies in health sciences. Nauka i zdravookhranenie. 2015;(5):5-18. (In Russ.)]][Унгуряну Т.Н., Гржибовский А.М. Корреляционный анализ с использованием пакета статистических программ STATA // Экология человека. – 2014. – T. 9. – C. 60–64. [Unguryanu TN, Grjibovski AM. Correlation analysis using STATA. Ecology, human. 2014;9:60-64. (In Russ.)]]