Creation of a groundwater quality index for an aquifer belonging to the São Francisco River Basin

Analysis. Groundwater. Abstract Understanding how and where anthropic activities influence the water quality of an aquifer is extremely important for the monitoring and management of these, however the tools used for this purpose are scarce. The objective of this study was to create and spatialize a specific groundwater quality index to evaluate the water quality of an aquifer. Thus, through a standardization of the data, as well as a scaling according to its concentration or measurement to create the sub-indexes and based on the multivariate statistics, we established the weights of each of the variables involved in this study. It should be noted that the variables used in this study are secondary data provided by IGAM, collected in the year 2017. In addition, after the creation of this index, it was spatialized in order to visualize the zones that suffer the greatest anthropic influences. Moreover, based on the spatially created index, it was possible to identify that the area in the northern most portion of the map is where the best water quality was found. Thus, it was observed that nitrate is the most important variable in this study. Therefore, it was possible to conclude that the index created is capable of assisting in the monitoring and management of


INTRODUCTION
Water is an infinite natural resource, indispensable for the existence of terrestrial life and has currently undergone intense changes in its qualitative characteristics, thus interfering in the availability of its different uses (BILGIN; KONANÇ, 2016;NAVEEDULLAH et al., 2016;NEISIET al.2018).
According to Braga et al. (2015), both water resources quantity and quality are modified as a result of natural or anthropogenic causes. Among the natural causes are the processes of weathering and sediment transport. For the anthropic factors that contribute to the scarcity of water, there is the intense consumptive use and the consequent pollution generated (BILGIN; KONANÇ, 2016;YANG et al., 2015).
Consumption uses include domestic and industrial supplies, and agricultural activities, which are expanding more and more due to population and urban growth. The large-scale application of fertilizers and the use of pesticides and insecticides for agricultural production have potentially increased pollution scenarios in watercourses (VAROL; DAVRAZ, 2014;VOZA et al., 2015). Moreover, according to Monica and Choi (2016), such disordered consumptive uses can have effects not only on the aquatic ecosystem, but also on human health, since the water body is the result of activities in a river basin. Keywords: WQIBHSF. Factorial Analysis. Groundwater.

Abstract
Understanding how and where anthropic activities influence the water quality of an aquifer is extremely important for the monitoring and management of these, however the tools used for this purpose are scarce. The objective of this study was to create and spatialize a specific groundwater quality index to evaluate the water quality of an aquifer. Thus, through a standardization of the data, as well as a scaling according to its concentration or measurement to create the sub-indexes and based on the multivariate statistics, we established the weights of each of the variables involved in this study. It should be noted that the variables used in this study are secondary data provided by IGAM, collected in the year 2017. In addition, after the creation of this index, it was spatialized in order to visualize the zones that suffer the greatest anthropic influences. Moreover, based on the spatially created index, it was possible to identify that the area in the northern most portion of the map is where the best water quality was found. Thus, it was observed that nitrate is the most important variable in this study. Therefore, it was possible to conclude that the index created is capable of assisting in the monitoring and management of water resources.

Artigos
All these concerns regarding water quality reflect the essentiality of this renewable resource on our planet, since for each classification of fluvial courses can be set parameters to reach water quality levels compatible with the classification of the source.
In order to assess the impacts generated on water quality, it is necessary to analyze their temporal and spatial variations, as well as the physical, chemical and biological processes that occur dynamically in a river basin (BILGIN; KONANÇ, 2016;LO-BATO et al., 2015;MONICA;CHOI, 2016).
However, when the physical, chemical and biological parameters are analyzed separately it is not possible to obtain a global understanding of the water body, mainly by professionals from other areas of knowledge, thus needing tools capable of analyzing them simultaneously (GOMES et al., 2014).
According to Olsen, Chappell and Loftis (2012), multivariate statistical methods can be used in data collected over time and at various sites within a river basin to better understand the relationships between monitored parameters.
This technique, the FA, provides a means of analyzing the structure of the correlations of a sample with numerous variables, thus making known the sets of highly correlated variables that are called factors, which are treated as representing dimensions within the data (HAIR et al., 2009;MINGOTI, 2013). By finding complex structures that can only be directly observed, the FA aims to simplification, thus allowing a better understanding of the analyzed data (LANDIM, 2011).
According to Mingoti (2013) after the identification of factors, their numerical values (factor scores) can be found for each of the variables under analysis. Normally these values serve as a basis for the use of other statistical analyzes, as well as to compose water quality indices (LIBÂNIO, 2010;MINGOTI, 2013).
There are also the Water Quality Indexes (WQI), which, like multivariate statistics, can be used to evaluate pollution reduction programs in watercourses and provide information that is easy to be interpreted by society. Aiming at the practicality in its use, especially about the monitoring of water quality for the purposes of human and animal dredging.
Thus, several authors have been adapting the WQI, developed by a group of researchers coordinated by Robert M. Brown in 1970, with the support of the National Sanitation Foundation (NSF) of the United States, called Brown Index (WQINSF); Compound of the parameters: Dissolved oxygen, pH, coliforms, specific conductance (TDS), alkalinity, chloride and the extract of carbon chloroform (CEE). Toledo and Nicolella (2002) in cluded in the WQI for an urban drainage basin in the interior of São Paulo other parameters of water quality, such Total and dissolved phosphorus, pH, dissolved oxygen, ammonia, nitrate, electrical conductivity, turbidity, suspended shadows and chlorophyll. Davies-Colley e Smith (2001) based on the Delphi methodology and created a raw water quality index (WQIB) for recreation with primary contact in New Zealand, and the developed WQIB was composed of eight parameters of water quality: Manganese, Cyanobacteria, Escherichia coli, Algae, Iron, Apparent Color, pH and Turbidity. Moretto et al. (2012) adjusted the WQINSF weights based on CONAMA 357/2005, for the Pardo river basin, in South Brazil, using multivariate statistical analysis. Naveedullah et al. (2016) developed a WQI to evaluate one of Siling's main reservoirs in Zhejiang Province, China. The developed index was composed of ten parameters of water quality, some of these index of water quality of the Environmental Company of the State of São Paulo (WQICETESB), and more ammonium ion. In addition, it presents a distinction between the compounds of coliforms and contains chlorophyll. The differential is related to the concentrations of the parameters involved, which were normalized, based on the mass of data in each parameter (giving rise to qi), instead of using water quality curves. The weights of the parameters were obtained based on literature reviews. Zhao et al. (2016) created an WQI to evaluate a reservoir in China, using twelve parameters, the index being calculated from a series of mathematical expressions, including the frequency, amplitude and number of variables analyzed in each parameter.
Jian-Hua, Yue and Hui (2011) assessed the groundwater quality in Pengyang County based on an improved water quality index. An information entropy method was introduced to assign weight to each parameter. For calculating WQI groundwater quality, total 74 groundwater samples to comprehensive physicochemical analysis. WQI 14 parameters were chosen including chloride, sulphate, pH, chemical oxygen demand (COD), total dissolved solid (TDS), total hardness (TH), nitrate, ammonia nitrogen, fluoride, total iron (Tf), arsenic, iodine, aluminum, nitrite, metal silicic acid and free carbon dioxide. Vasanthavigar et al. (2010) applied of water quality index for ground water quality assessment: Thirumanimuttar sub-basin, Tamilnadu, India to quantify overall water quality for human consumption. ˇtambuk-Giljanovic (1999) reports the creation of aWQI both for surface waters and groundwater and the results of its application for water evaluation in Dalmatia, Croatia. Soltan (1999) to indicate the quality of ground water from ten artesian wells located nearthe Dakhla Oasis in the Egyptian Western. Abessi and Meraji (2010), study a simple methodology based on multivariate analysis is developed to create a groundwater quality index (GWQI), with the aim of identifying places with best quality for drinking within the.
There are few references to water quality indexes. Brazil has the natural index of groundwater quality developed by researchers from the Federal University of Bahia, although until has limitations on composition parameters. Therefore, several authors have created or adapted a specific one for its source for further details (Oliveira et al, 2006(Oliveira et al, , 2007. As can be mentioned also the Groundwater Quality Index (IQAS) (Oliveira et al, 2004) and Groundwater Quality of Use Index -e-IQUAS (Almeida, 2012). Oliveira (2018) In order to evidence the concentration of pollutants and contaminants in the groundwater, the methodology of water quality index was tested for aquifer areas influenced by industrial organic percolates of the Camaçari region, included in the North Recôncavo of Bahia.
However, some indexes are not comprehensive for all aquifer. Therefore, this study aims to create a groundwater quality index of an aquifer belonging to the San Francisco river basin, aiming at its use for the purposes of initial environmental monitoring perspectives. The object of this study was the aquifer of the São Francisco river, which, today, plays an important role in the country's development, depending on its developed eco-nomic activities, of the of energy and the significant resident population.

Characterization of the area
The wells (station) used in this study are part of a groundwater quality monitoring that has been carried out by a Projeto Águas de Minas do Instituto Mineiro de Gestão das Águas (IGAM), called Águas de Minas, since 1997. In this work, data for the 2017 sampling were used as the most recent data available. These station belong to the SF6 Rio Jequitaí and SF10 Rio Verde Grande sub-basins, both of which are located in the mesoregion of Northern Minas Gerais, in the São Francisco River Hydrographic Basin, which is located in the hydrogeological domain of the Bambuí aquifer (Figure 1).

Figure 1 -Location map of the study area
It should be noted that the climate of the region is defined as hot tropical of the semi-arid subtype, with dry periods equal to or greater than six months, with the wettest months being from December to February, with an annual rainfall of 800 mm and mean annual temperature of 26.6 ° C (PATRUS et al.,2001).
The municipalities of Jaíba and Verdelândia account for about 40% and 50% of the gross domestic product destined to agriculture, and the main culture of the region is banana, but also highlight the mango and lemon crops as well as the of sugarcane. Therefore, cattle raising predominates, followed by chickens and pigs. It is worth mentioning that another important activity is the vegetal extraction to produce charcoal. These activities end up affecting the quality of the water of this manancia, in addition these waters are used in a great part for human and animal watering.

Compilation of Data
Secondary data from 53 wells, made available by IGAM, were used in this study. The following parameters were considered: Electric Conductivity, Alkalinity, Total Chloride, Total Hardness, Ionized Fluoride, Total Phosphorus, Nitrate, Total Sulphate, Oxygen consumed, Turbidity, Total Calcium, Total Iron, Lithium, Total Potassium, Dissolved Silicon, Dissolved Sodium, Total Sodium and Escherichia coli.

Statistical analysis of data
The identification of the most important variables, as well as the determination of their weights, occurred through a few steps: firstly, the water quality data were normalized, later a data matrix expressed by X = (xi, j ), where i = 1 ... n samples (72) ej = 1 ... p variables (13). Then, the original data matrix was transformed into an array of correlations [R] (pxp), where "p" corresponds to the water quality variables.
Through FA / PCA, the original set of observed variables was transformed into a new set of variables, called principal components (PC). According to Hair et al. (2009), the first three components should explain the maximum total variability of the data, the first one is not correlated with the second and the second is not correlated with the third, and this is not correlated neither with the first nor with the second, and so on, until the PCs explain more than 70% of the total variance of the data. Therefore, only the parameters with values greater than or equal to 0.7 were used. Because of this factorial load, it is possible to affirm that the parameter in question has a great relevance for that source. Therefore, the parameters that presented a factorial load lower than 0.7 were disregarded from the study. The software R was used for multivariate analyzes (R CORE TEAM, 2013).
It should be noted that the selected variables, from FA, were used to compose the Water Quality Index of the São Francisco River Basin (WQIBHSF). It should be emphasized that the methodology developed in this work for the creation of water quality indexes was called WQIBHSF.
From the decomposed matrices, the factorial scores (or factorial weights) were also used, which were considered to construct the index, identifying the individual importance of each variable in its composition.

METHODOLOGY FOR THE CREATION OF WQIBHSF
After the determination of the water quality parameters by the FA, the data were normalized through the standard deviation and the mean of each data mass (Equation 1), aiming to construct the quality indexes (qi), which were calibrated in function of its total value, so that the values varied from 0 to 100. Thus, these results were similar to the curves established in the WQINFS, which simulates the functional relationship between the water quality note and the value of the parameter being represented. It is important to emphasize that there was no creation of curves. Thus, the results were obtained only by the transformation and application of the main component analysis. The WQI was created specifically for this case study, following only, after this transformation and definition of the weights, the mathematical formulation of the multiplicative agglomeration used in the WQINFS. However, it is possible to adapt this index to other studies, both underground and surface resources, as long as they have a consistent historical series. (1) Where: XSC= Value resulting from standardization; Xdat= Gross value of the variable; mean= Mean of the data mass of the variable; std= Standard deviation of the variable data mass.
For the association of the parameters and for the final agglomeration the multiplicative agglomeration method was used. This method was applied in several developed WQIs (BROWN, 1970;LIBÂNIO 2010;SPERLING, 2005), calculated according to Equation 2, whose values vary from 0 to 100 (2) Where: WQIBHSF: Water Quality Index for the aquifer belonging to the São Francisco river basin between 0 and 100; qi: quality of the i-ésimo parameter, obtained from the standardization of the data, based on Equation 1, as well as a scaling according to its concentration or measurement, so that the number was between 0 and 100; wi: factorial weight (factorial score), corresponding to the iésimo parameter, a number between 0 and 1, attributed as a function of its importance for the overall conformation of quality and; n: number of variables used in the calculation of WQIBHSF.
It should be noted that the spatialization of the index created was performed through QGis software version 2.18.18, for the WGS reference system in the UTM projection system, in the South 23 spindle.

Extraction of the springs for the parameters that compose the WQI
The number of factors to be extracted was determined by analyzing the percentage of variance explained (MINGOTTI, 2013). The first five factors were extracted for the aquifer studied, which together explained 73.79% of the total data variance (Table 1). It should be noted that according to Hair et al. (2009), the selection of the variance used to represent a total set of data should be above 70%, to provide a variance representative.  Belkhiri and Narany (2015) used analysis of the percentage of explained variance, to extract the main components of their study in groundwater in the Ain Azel plane, in Argélia, and required two components to explain 85% of the total data variance and thus to identify sources of pollution. Already Yang el at. (2015) required three main components to assess and identify sources of pollution in a coastal aquifer in southern China. Furthermore, the two authors mentioned above used a smaller number of parameters than the one employed in this study.
However, through the five factors generated, it was possible to obtain a systemic view of the representativeness of each water quality parameters involved in this study. Table 2 shows the factorial scores extracted from the factorial analysis, containing the 18 parameters of groundwater quality. After the extraction of the factorial scores, the score with the highest factorial load was selected for each parameter of groundwater quality, which are highlighted in red in Table 2.
After the selection of the highest factorial scores, the calibration was done so that they varied from 0 to 1, as required for the multiplicative agglomeration method established by WQI NSF. The results found can be seen in Table 3.  Table 3, it is verified that according to the established scores for each water quality parameter, nitrate is the most important parameter for this aquifer, since its factorial score is the one of the highest factorial load, the other, we have the parameters total sulfate, oxygen consumed, lytic and total calcium as parameters of second importance.
The concentration of nitrate is possibly related to the agricultural activities in the basin, which influence through the application of fertilizers rich in nitrogen, therefore the other parameters were already expected as important variables, since they are strongly related to the geological formation of the soil.
Although in this study the weight of E Coli was not considered to be the most statistically significant, this parameter is important for the quality of a water resource because, once present in the water, it can cause problems such as gastroenteritis or urinary tract infections to humans. Therefore, this parameter is important in this study and it is therefore pertinent to maintain it.
Thus, after applying the multivariate statistics it was possible to calculate the WQIBHSF that can be observed in Table 4.
However, to obtain an integrated view of WQIBHSF, the indexes created were spacialized (Figure 2).  Figure 2, it is possible to observe the spatial variability of the WQIBHSF, thus the places where the water quality presents with a better quality is in the northernmost part of the map.
Being that it tends to diminish the quality as it approaches more of the central portion of the map what can be related to agriculture and extraction of coal in the region, so this activity diminishes more to the south what makes the quality of this water has more time to self-purify and consequently meets a slightly better quality.

CONCLUSION
Based on this study it was possible to conclude that the multivariate statistical tools can help in the creation of a groundwater quality index since it is possible to know a little of the interaction between the variables of water quality and thus to know the influence of each variable on the same aquifer.
Besides the creation of the index was required, as there is in the literature an index able to cover the specific parameters of this study. Also suitable weights for each parameter should be set so that it represents more realistically the overall groundwater situation.
The practicality of the method and ease of use have resulted in a data index model developed for each groundwater, but this requires a historical series of consistent and reliable water quality data.
Finally, based on the specialized index created, it was possible to identify the areas where the quality of the water is better or lower quality, which is useful when managing and monitoring an aquifer.