In silico approach: Anthocyanin derivatives as potential inhibitors of the COVID-19 main protease

In silico approach has been carried out for the determination of drug candidates from anthocyanin derivative as inhibitors of the COVID-19 main protease. Geometry optimization has performed using the DFT/B3LYP/6-31G(d,p) method as an initial step to prepare candidate ligand. The results of molecular docking showed that candidates C5 and C6 had promising results with a grid score smaller than the ligand reference (X77)


Introduction
The new generation coronavirus (SARS-Cov-2) which infects the respiratory tract was first detected in December 2019 in Wuhan, China [1][2][3]. Recorded on 11 March 2020 the World Health Organization (WHO) established the coronavirus outbreak as a COVID-19 pandemic with a total of 118,000 cases from 110 countries [4,5]. The spread of this virus occurs until the world division of various countries has increased very rapidly in the past 9 months. Besides, the latest data in the last 9 months of July 2020 shows that the spread of the COVID-19 virus has increased quite rapidly, especially in countries in the Americas and Europe. Where, WHO has confirmed the grand total of COVID-19 cases in the world on september 09, 2020, showing a total of 27,486,960 cases with total death cases were 894,983 cases [6]. The spread of the virus is increasing and uncontrolled making research 3 groups in various countries make contributions to prevent and develop drugs and vaccines to reduce the increasing number of COVID-19 cases [7].
The SARS-CoV-2 is a single-stranded positive-RNA virus that belongs to the beta-corona virus group but is different from MERS-CoV and SARS-CoV [8]. Where the coronavirus is covered by a positive RNA strand with the largest RNA genome of 30-32 kb [9]. In addition, recent studies have shown that CoV contains at least six ROFs where ORF1 and ORF2 are responsible for producing two pp1a and pp1ab polypeptides [10]. Then the polypeptide is processed by the main protease (M Pro or 3CL Pro ) which is responsible for the coronavirus replication sequence [11]. Therefore, this enzyme is often used as one of the very promising drug targets in inhibiting coronavirus activity [12,13].
The development of drug candidates from natural ingredients has promising potential as an antiviral, including anthocyanin derivatives [14]. Around 17 Anthocyanin derivatives have been found in nature and only 6 major anthocyanin derivatives are widely distributed, namely pelargonidin, delphinidin, petunidin, cyanidin, peonidin, and malvidin [15]. The placement of different functional groups in the basic structure of anthocyanin will determine the activity and characteristics of anthocyanin compounds [16]. Some studies report that anthocyanin in the flavonoid group has biological activity as antiviral [17,18]. Recent studies have shown that flavonoid derivatives with the same basic structure as anthocyanins have the potential as SARS-CoV antivirals against COVID-19 main protease [19]. Several anthocyanidin derivatives that bind to carbohydrate groups such as rhoifolin and pectolinarin have been known to have SARS-Cov 3CL pro inhibitory 4 activity with IC50 values of 27.45 μM and 37.78 μM, respectively [19]. Additionally, anthocyanin-derived compounds that have a hydroxy group in their structure are expected to be able to inhibit the activity of the virus by binding to the target protein at the molecular level. These have been reported by previous studies of adding hydroxy groups to candidates giving effective results against the toxicity and inhibition of SARS-CoV2 [20,21].
This article reports of studies of anthocyanin activity as an anti-SARS-Cov-2 used in silico approach. The study of antiviral activity using the in silico approach is one of the most effective and efficient alternatives in studying and predicting the interaction of drug candidates with protein targets [22][23][24][25][26]. In addition, the in silico approach is able to predict the physicochemical and biological activity of a drug candidate before a wet laboratory test is carried out [27][28][29]. The combination of several computational techniques used such as molecular docking, and molecular dynamic simulation is expected to be able to provide accurate prediction results through complex calculations [30]. Besides, the quantum mechanics approach using density functional theory (DFT) on small molecules especially anthocyanin derivative molecules shows promising results in molecular modeling [31,32]. Therefore, the selection of an in silico approach is able to offer fast, effective, and accurate alternatives in finding a SARS-CoV2 drug or vaccine.

Materials and Method
Computational Resource and Data Set 5 The operating systems and hardware used in this study are Windows operating systems (Intel Core i5-9300H, 2.40 GHz, GPU Nvidia GTX 1650, and 8.0 GB RAM) and Linux operating systems (Intel Core i7-8700, 32 GB RAM, NVidia GPUs GTX 1080 Ti 11GB and SSD 500GB SATA). Where the Windows operating system is used for preparation and analysis. Meanwhile, the Linux operating system is used to perform heavy computational chemical calculations. The software used in this study, namely Gaussian 09W, ChemOffice 2016, Chimera 1.13, Open Bubble GUI, Putty, WinSCP, Dock6, AMBER18, and Discovery Studio 2019. The target protein used in this study as a receptor is COVID-19 main protease (PDB: 6W63) obtained from the Protein Data Bank website (http://www.rcsb.org/structure/6W63). Where the target protein is included in the classification of viral protein with a resolution of 2.10 Å using the XRD method. The candidate compounds modeled in this study were 6 anthocyanin derivatives, namely pelargonidin, delphinidin, petunidin, cyanidin, cyanidin-3-rutinoside, and cyanidin 3,5-Odiglucoside.

Modeling and Preparation of Data Set
Modeling anthocyanin derivatives using the DFT/B3LYP/6-31G(d,p) method to determine the geometry optimization using Gaussian 09W (Gaussian 09) [33]. Anthocyanin geometry optimization using density functional theory [34] with Becke 3-parameter Lee-Yang-Parr [35] hybrid functional based on considerations from previous studies that showed good correlation results with experimental results on spectroscopic modeling and chemical properties [36]. The energy calculation from the process of optimizing the geometry of molecular data sets, namely anthocyanin derivatives aims to obtain optimal 6 geometrical conformation [37]. Meanwhile, receptor preparation and reference ligands are carried out using the ff14SB force field with AM1-BCC ligand charge through the Chimera 1.13 package. Besides, some handlers of the force field and charge aims to prevent charge imbalances in the system when performing molecular dynamic simulations.

Study of Molecular Docking
Molecular docking is done using the dock6 package through the calculation command in parallel using the Linux operating system. The molecular docking stage includes the validation stage and the candidate docking stage into the active side of the receptor. The selection of cluster spheres at the receptor is carried out within a 10 Å radius with a grid spacing of 0.3 in determining the grid box system used. Additionally, the type of conformation used in the docking validation process uses two types of conformation, namely rigid conformation and flexible conformational. The conformation with the smallest grid score will be selected as the reference ligand at the candidate docking stage. Also, the ligand validation stage was stated to comply with the criteria if the reference ligand had RMSD value ≤ 2.0 Å [38]. The candidate docking stage is run with a grid-based score function with a fast calculation which aims to obtain the initial coordinates of each candidate on the active side of the receptor.

Molecular Dynamic Simulation
The initial coordinates of the reference ligands and candidate obtained from the results of molecular docking continued as a basis for building the topology of each complex using tleap. Where the force field used the ff14SB force field and a solvated box used the TIP3P 7 box with a distance of 12 Å. The stages of MD simulation are carried out in several stages such as minimization, heat, density, equilibration, and production. All stages are carried out for 200 ns which is used as the basis for the analysis of several bonding variables between ligands and receptors. Analysis of several variables measured in this study uses output files and trajectory files generated during the simulation process. Besides, the analysis using the MM-GBSA method is also used to calculate the binding free energy of each complex based on the number of frames produced during the simulation when the system has reached stability [39].

Molecular Docking Validation
The molecular docking validation stage is performed using a grid-based cluster sphere selection as a score function ( Figure 1A). The validation process is carried out to determine the active site or ligand-binding site of the receptor [40]. The validation process by redocking the reference ligand (code: X77) aims to find the initial coordinates of the ligand. The results show that the reference ligand pose is very promising with an RMSD value ≤ 2.0 Å for each conformation, namely XRD: yellow, rigid: orange, and green: flexible ( Figure 1B). Based on the value of RMSD shows that flexible conformation has a smaller RMSD value compared to rigid conformation. Thus, flexible conformation is chosen as the reference ligand in determining the initial coordinates of molecular docking.
The active site of the receptor shows that there are amino acid residues responsible for 8 binding to the reference ligand such as Thr26, Leu27, His41, Met49, Leu141, Asn142, Gly143, Cys145, Met165, Glu166, and Pro168 with different types of interactions ( Figure   1C and Figure 1D). interaction between X77 and residues, and (D) Fleksibel conformation: 2D interaction between X77 and residues. 9 The results of molecular docking validation of the two types of conformation also show grid score values that are not too much different from the difference of ~ 0.90 kcal/mol. Several variables are also generated in molecular docking using the dock6 package, namely Van der Waals energy (EVDW), electrostatic energy (Ees), and internal energy repulsive (Einter). However, flexible conformation shows more promising results because it has a smaller grid score compared to rigid conformation (Table 1). Thus, consideration of selection as a reference ligand is getting stronger. This consideration is caused by two key parameters that hold the key to molecular docking success such as a smaller RMSD value and a smaller grid score [41,42].

Molecular Level Interactions of Anthocyanin Derivatives with COVID-19 Main
Protease 10 The anthocyanin derivatives modeled in this study are secondary metabolites and derivative compounds that give a blackish-purple color to some tropical fruits [43,44].
Variations in the structure of different anthocyanin derivatives are expected to be able to provide different activities as antivirals, especially in inhibiting the expression of the COVID-19 main protease ( Figure 2). The hydroxy group contained in each anthocyanin derivative compound is expected to provide good results in binding to amino acid residues on the active site of the receptor as a hydrogen bond donor. Geometry optimization of anthocyanins using the DFT/B3LYP/6-31G(d,p) method was expected to provide optimal geometry results as shown by previous studies [45]. Several variables such as Van der Waals's energy and electrostatic energy play an important role in determining the interaction energy between ligands and receptors expressed by grid scores [46]. The results show that the smaller the Van der Waals energy and electrostatic energy values, the smaller the grid score (Table 2).   have many hydroxy groups that are polar so that they decrease their hydrophobicity, especially on the carbohydrate groups of each candidate [48]. MD simulation results show that temperature, total energy, pressure, and density show good graphs with no significant fluctuation changes ( Figure 6). This indicates that during 15 the simulation time the stability of temperature, total energy, pressure, and density was achieved in each system. So, the stability at each stage shows good results so that it can be continued on the trajectory analysis of each system.  Figure 7A). Where the graph shows the stability of the system in each complex reached during the simulation at trajectories 100 ns until 200 ns. 16 The stability of the RMSD plot is an important parameter for analyzing other parameters by looking at the stability of the system which is characterized by RMSD deviations that are not too significant [49]. Particularly, the C5 system showed very good stability with no significant deviation. Meanwhile, the C6 system began to reach its stability point at trajectory 100 ns until 200 ns with no significant deviation.
Root mean square fluctuation of complex (RMSF) analysis is also performed on each complex by looking at the fluctuations that occur in the backbone receptor ( Figure 7B).
The RMSF of each complex was carried out during the last 20 ns when the system stability was achieved. This aims to save calculation time but not sacrifice calculation accuracy. Fluctuations that occur in each complex show the greatest fluctuations in C5 and the lowest fluctuations occur in C6. This identifies that the bonds that occur between receptors and C6 are more flexible than the bonds between C5 and receptors during simulation time when system stability is achieved.
The interaction between the ligands to the receptors in each system showed a decrease in contact during the initial simulation of 1 ns until 6 ns dramatically ( Figure 7C)  free energy gas (ΔGas), free energy solvent (ΔSolv), and binding free energy (ΔG) [50]. Data shows that each energy component shows a close relationship in its contribution to each complex in the system during the simulation. Where the role of solvents in the system is indeed a very crucial role. This is due to the fact that Generalized Born energy and energy of contribution of nonpolar of solvent contribute to free energy solvent [51]. Meanwhile, Van der Waals energy and electrostatic energy contribute to the value of free energy gas.
Therefore, the contribution of each energy component will affect the final result of total energy or free binding energy which is an important parameter in determining the energy of interaction between ligands and receptors. The results show that the free binding energy of C5 is more promising compared to C6. It can be seen that the binding free energy value of C5: -42.77 ± 0.37 kcal/mol is smaller than C6: -23.52 ± 0.47 kcal/mol (Table 3). Additionally, C5 can be considered as the COVID-19 main protease inhibitor because it has a binding free energy value that is smaller than the value of the binding free energy reference ligand: -42.37 ± 0.41 kcal/mol with a deviation: ~ 0.40 kcal/mol. 19 Energy calculation using the MM-GBSA method can see the energy distribution of each residue through an analysis of the energy decomposition of each complex (Figure 9). The results show that amino acid residues that have values below -1.00 kcal/mol that interact with ligands are residues that are present at the active receptor site [52]. Specifically, C5 has more amino acid residue interactions (12 amino acid residues) with energy values below -1.00 kcal/mol. The energy decomposition results show good suitability with the results of free binding energy. Therefore, the binding free energy value of C5 is smaller than C6 and X77.   Table 4).

The study of the number of H-bonds, H-bonds interactions, and H-bond lifetime is
presented clearly in this article through molecular dynamics simulations ( Figure 10). so fast with a decrease during the simulation on each complex dramatically. This is because the residues analyzed are specific residues. Where the residue analyzed is the residues obtained from the initial coordinates when performing molecular docking.
However, it is hoped that the results of the analysis provide a clearer picture of hydrogen bond properties.

Study of Bioavailability and Drug-Likeness Screening
ADMET prediction (absorption, distribution, metabolism, excretion, and toxicity) used the admetSAR service website (http://lmmd.ecust.edu.cn/admetsar1/predict/). Prediction of several important variables of the candidate's biological activity as a drug in the body needs to be done as preliminary data on bioavailability and drug-likeness [54]. The absorption prediction results show that C5 and C6 do not penetrate the blood-brain barrier which, identifies that the candidate will not affect the central nervous system especially the brain [55]. Besides, variables such as human intestinal absorption show good results can be absorbed well. One important indication in absorption is the solubility of each candidate which shows very good results, namely LogS C5: -2.63 and LogS C6: -2.10.
The hydroxy group in each candidate will be increasing the nature of its solubility. The metabolism stage shows that both candidates are non-inhibitors and non-substrates of cytochrome isoenzymes (CYP). This identified that both candidates were very promising as drugs because they did not inhibit or interfere with the activity of the enzyme which is the main enzyme in the process of metabolism [56]. As a result, C5 and C6 are expected to have no side effects on the body.
Variable toxicity calculated in this study showed good results for each candidate. Where, it can be said that C5 and C6 are not toxic because the results show Weak inhibitor of the human-related gene, non-AMES toxic and non-carcinogens. Thus, over all the results show that each candidate has ADMET properties that are promising to be considered as drug candidates that can be considered (Table S1).

Conclusions
The combination of computational chemistry techniques between molecular docking and molecular dynamic simulation is very promising in predicting drug candidates in silico by stage is a fundamental consideration to be studied more deeply about the bioavailability and drug-likeness properties of C5 and C6 candidates. Over all the results show that each candidate has ADMET properties that promise to be used as a drug candidate that can be considered. The MD simulation studies were also continued to study ligand-receptor interactions and free binding energy during the 200 ns simulation time using the ffSB14 force field. The results show C5 candidate has the binding free energy value lower than the reference ligand with an energy deviation of ~ 0.40 kcal/mol. Overall, the C5 candidate showed good predictive results using the in silico approach and deserves to be considered as a COVID-19 main protease inhibitor to be continued experimentally in the future.