Reduction based on similarity and decision-making

Reduction of attributes in an information system (IS) is a basic step for IS analysis. The original rough set model of Pawlak depends on an equivalence relation which is strongly constraints. This paper aims to use similarity classes and similarity degrees to obtain a reduction of IS and indicate an approach by using an example from biochemistry to get a quantitative structure activity relationship (QSAR). Moreover, signs of each attribute and degrees of memberships are computed to give a decision by using the degree of similarity. The suggested approach gives an increase in decision-making and decision accuracy.

knowledge base for X, and indication of symptoms for a fixed disease can be seen through the topology [15]. Different notions of a membership function based on rough sets were introduced and studied in [16], [17], and [18]. A QSAR [19] constructs a mathematical model interconnected in its biological activity a set of structural descriptors of a set of chemical compounds. The main purpose of this paper is to study the reduction based on similarity. We give a comparison between a reduction by similarity and some other types of reduction with some different examples. An application on QSAR of AAs will be studied. We introduce an algorithm to reduce a membership function and illustrate it graphically. Some properties of relations and membership functions will be investigated. A new method is introduced to study the correlation between attributes and a decision through a similarity relation.
Definition 1 [1] The IS or approximation space is a system (U, A, f ) where U is the universe of finite set of objects and A is a set of attributes which is featured or variables. Each a ∈ A defines an information function f a : U −→ V a , where V a is the set of all values of a, say the domain of attribute a.

Definition 2 [5] For every B ⊆ A, the indiscernibility relation on B, denoted by Ind(B), is defined to be two objects x i , x j ∈ U which are indiscernible by the sets B if b(x i ) = b(x j ) for every b ∈ B. Ind(B) is the smallest indiscernible groups of objects and so the equivalence class of Ind(B) called elementary set in B. [ x i ] will denote to the equivalence class of object x i in the relation Ind(B).
Definition 3 [5] For every B ⊆ A, a membership function of an object x i ∈ U with respect to B is given by The IS is represented in Table 1 with a set of attributes A = {p, q, r, s} and a set of objects U = {a, b, c, d, e}. We study the effect of attributes on decision, in other words, the correlation between attributes and the decision in the following statements.

Original Pawlak method
In Table 1, we compare between attributes of objects. There is no similarity between attributes of objects. So, we have the class of objects C = {{a}, {b}, {c}, {d}, {e}} and two sets of decisions D 1 = {a, d, e} and D 2 = {b, c}. The Pawlak membership is given by | , for any object x and A ⊆ U. Therefore: This means any object correlates with the decision only by 0 and 1.

Pawlak method with coding
The IS in Table 1 can be coded by choosing intervals as in Table 2. The IS with coding is given by Table 3. The class of objects in Table 3 This means that any object correlates with the decision only by 0, 1 2 and 1.

Similarity without degree
The similarity matrix between objects in Table 3 is given by

Uncertain QSAR information system
The basic form of Pawlak depends on equivalence relation. This expands the application circle for objects and decision rules. Because if we apply Pawlak rules for some ISs, it is not possible to have equal attributes for two different attributes. So as in the following problem, that we are giving a method of solving depends on a similarity relation. Problem A modeling of the energy of unfolding of a protein (tryptophane synthase an alpha unit of the bacteriophage T 4 lysozome), where 19 coded amino acids (AAs) were   each introduced into position 49 [20]. The AAs are described in terms of seven attributes: a 1 = PIE and a 2 = PIF (two measures of the side chain lipophilicity), a 3 = DGR = G of transfer from the protein interior with water, a 4 = SAC = surface area, a 5 = MR = molecular refractivity, a 6 = LAM = the side chain polarity, and a 7 = Vol = molecular volume. In [14], the authors used the form of Pawlak [1] to make decision rules. The IS of quantitative attributes {a 1 , a 2 , a 3 , a 4 , a 5 , a 6 , a 7 } and a decision attribute {d} can be represented by Table 5. The condition attributes are coded into four qualitative terms, such as very low, low, high, and very high, whereas the decision attributes is coded into three qualitative terms, such as low, medium, and high. The qualitative terms of all attributes are coded by integer numbers. The problem is to illustrate that there are some objects that coded with medium energy of unfolding in [14] in respect of an attribute would be for the high energy of unfold.

Algorithm
Step 1: Construct a similarity matrix for each attribute a by M a =[ w ij ] which will be 19 × 19 matrix, 7 is the number of attributes, i ∈ {1, 2, 3, · · · , 19} will denote to the row of matrix, and j ∈ {1, 2, 3, · · · , 19} will denote to the column of the matrix.
Step 2: Calculate the similarity degree of attributes through a definite relation, say R, on the our IS which will be denoted by QSAR where a max and a min denote the minimum and maximum values of attribute a, respectively. It is clear that R is reflexive and symmetric.
Step 3: Classify the data deduced from step 2 via The value d will be chosen by an expert. Then, there are two cases d ij > d and d ij < d.
Step 4: Define a membership function of every object for the similarity matrix in step 3 as follows: For any arbitrary element x k R = C k ∈ U/R and for y ∈ C k , the membership function of y with respect to any subset X of U, μ Step 5: Present functions of the least, most extreme, and normal weighted participation for every y by: Step 6: Choose a set X ⊂ U. Evaluate the rough membership μ for each object z by determining the A = z∈C k C k and calculate μ = |A∩X| |A| , for every object.
Step 7: Determine a maximum rough membership μ from the last column of each classification. Now, we apply the algorithm on the problem of QSAR.
(1) Similarity matrix for an attribute a 1 . Since a 1 (max) = 1.85 and a 1 (min) = −0.77, then, a 1 (max) − a 1 (min) = 2.62, then we have the matrix M x 1 . We have:  (2) Take d = 0.7, we have the similarity classes.   (3) We choose some various sets and compare the membership classification through the following two cases.
Case 1: Let X = {x 9 , x 10 , x 12 }. Then, we have: X (x 9 ) = 0.44 μ C 10 X (x 10 ) = 0.45 (2020) 28:22 Page 7 of 12 Now, we evaluate the rough membership for each object which gives a preferable value for the data. The object belongs to more than one class, so it has three membership minimum, maximum, and average. This can be shown in Table 6 and Fig. 1.
In [14], the decision rules are coded into three qualitative terms, such as low, medium, and high. X is the set of objects that protein has high energy of unfolding.
Now, we evaluate the rough membership for each object which gives a preferable value for the data. This value will be evaluated via minimum, maximum, average, and weighed membership. This can be shown in Table 6 and Fig. 2.
(2020) 28:22 Page 10 of 12 Y is the set of objects that protein has a medium energy of unfolding. From Table 7, one can show that the objects x 9 , x 10 , x 12 have a rough membership value 1; this means that a protein has high energy of unfolding. The object x 19 in Pawlak reduction had a medium energy of unfolding [14], while from our procedure, x 19 has a high energy of unfolding. In the same manner, we can take a set Z of a protein, which has a low energy of unfolding. Through the correlation between objects and decision, there is at least one object in Z which has the medium or the high energy of unfolding. Therefore, the significance of each attribute and degrees of memberships is more precise from Pawlak's reduction. We can evaluate analogously the quantitative for attributes {a 2 , a 3 , a 4 , a 5 , a 6 , a 7 }.

Some properties on a similarity relation
The given algorithm depends on a similarity matrix, general binary relation, and rough membership. So, we study some properties of these notions.

Proposition 1
If R 1 and R 2 are two different relations, then the degree of similarity is the same.
Proof A similarity measure between x i and x j is given by deg a (x i , x j ) = 1 − |a(x i )−a(x j )| |a max −a min | . Since each of R 1 and R 2 depends on d, then |x i R 1 | = x k ∈C k d ij = |x i R 2 |. Therefore, the degree of similarity is the same.
where d ij and d ij are the similarity degrees with respect to R 1 and R 2 , respectively.

Proposition 3 For an IS (U, A), where U is the set of objects and A is the sets of attributes.
Then, μ C k X (x) = μ C k X (y), for every two different objects x, y ∈ A, every class C k , and every nonempty set X ⊆ U.
Proof Directly from μ  (U, A). If μ C k X (x) = μ C m X (x), k = m, it is not necessary that C k = C m . This is obvious in the problem of our study.

Conclusion and discussion
Chemical data sets have been analyzed using similarity relations. The results are more precise in comparison with the original rough set theory. The description of objects is different from the study in [14]. For example, the decision concerning element x 19 in our study has the high energy of unfolding, of which in [14], the energy was the medium of unfolding. This opened the way for applying similarity models in IS which give discrete structure and coincide with the classical case. The model of QSAR of similarity reduction can be applied to a finite set of objects. The approach used here can be applied in any IS with quantitative or qualitative data. Consequently, they are very significant in decisionmaking [21][22][23][24]. The introduced techniques are very useful in application because they open a way for more topological applications from real-life problems.
Abbreviations IS: Information system; QSAR: Quantitative structure activity relationship; AAs: Amino acids