# a fuzzy information retrieval method using fuzzy-valued concept networks[精选推荐pdf]

A Fuzzy Information Retrieval Method Using Fuzzy-Valued Concept Networks Yih-Jen Horng“, Shyi-Ming Chen , and Chia-Hoang Lee** *Department of Electronic Engineering National Taiwan University of Science and Technology, Taipei, Taiwan, R. 0. C. **Department of Computer and Information Science National Chiao Tung University, Hsinchu, Taiwan, R. 0. C. Abstract In this paper, we present a fuzzy information retrieval method based on fuzzy-valued concept networks, where the relevant degree between any two concepts in a fuzzy-valued concept network is represented by arbitrary shapes of fuzzy numbers. There are two kinds of relevant relationships between any two concepts in a fuzzy-valued concept network (i.e., fuzzy positive association and fuzzy negative association). In order to reduce the time of fuzzy inference, the relevant matrices and the relationship matrices are used to model the fuzzy-valued concept networks. The elements of a relevant matrix represent the relevant degrees between concepts. The elements of a relationship matrix represent the relevant relationships between concepts. Furthermore, we also allow users queries to be represented by arbitrary shapes of fuzzy numbers and to use the fuzzy positive association relationship and the fuzzy negative association relationship for formulating their queries in order to increase the flexibility of fuzzy information retrieval systems. We also present a fuzzy information retrieval method based on the network-type fuzzy-valued concept network architecture in the Intemet environment. Keywords: Fuzzy-Valued Concept Network, Fuzzy Information Retrieval, Fuzzy Number, Information Retrieval System, Network-Type Fuzzy-Valued Concept Network. 1. Introduction Most of the existing information retrieval systems are based on the traditional Boolean logic model [16]. The information retrieval systems based on the Boolean logic model all assume that the documents and the users queries could be represented by index terms exactly. This makes these systems restricted in practical usage especially in the circumstance where the information has uncertainty or fuzziness. In order to improve the drawbacks of the traditional Boolean logic model, some models like the probability model, the fuzzy set model, and the vector space model have been presented [16]. Since the fuzzy set model can properly represent the inexact and uncertain knowledge of human beings, many researches are devoted to use the fuzzy set model in the designing of fuzzy information retrieval systems. Moreover, many information retrieval techniques have been presented, such as [l], [6], [lo], In [ll], Lucarella et al. present am information [111, [14, [W, [14l, [1% and [ W . retrieval method that uses f u z z y concept networks for knowledge representation. A fuzzy concept network consists of nodes and links. Each node in a fuzzy concept network represents a document or a concept (i.e., an index item or a topic of documents). Each link in a fuzzy concept network connects two concepts and is labeled with a real value between zero and one, which represents the relevant degree between two concepts. By means of fuzzy inference through fuzzy concept networks, the information retrieval systems are developed. Since the fuzzy inference through the fuzzy concept network is time consuming, the information retrieval method proposed by Lucarella et al. is not efficient enough. In [3], Chen et al. used concept matrices to model fuzzy concept networks and perform 0-7803-5214-9/98/$10.00 0 1998 IEEE. 104 fuzzy inference thmugh concept matrices instead of fuzzy concept networks. Since the fuzzy inference through concept matrices can be done more quickly, the fuzzy information retrieval systems can be more efficient. However, the Fuzzy concept networks used by Lucarella et al. [ll] and Chen et al. [3] are restricted because the relevanl degree between concepts must be real values between zero and one and the concepts must be linked with a fuzzy positive association relationship. Thus, the information retrieval systems based on this kind of fuzzy concept networks are also restricted. If we can allow the relevant degree between concepts in a fuzzy concept network to be represented by arbitrary shapes of fuzzy numbers and allow the concepts in a fuzzy concept network to be linked with a fuzzy positive association relatiomhip or fuzzy negative association relationship, then there is room for more flexibility. In this paper, we present a fuzzy information retrieval method based on fuzzy-valued concept networks. A fuzzy-valued concept network consists of nodes and links, each node represents a document or a concept, and each link between two nodes associated with a tuple represents the relevance between two concepts, where the tuple represents the relevant degree and the relevant rdationship between two concepts, respectively. The values of the relevant degree between two nodes not only can be real numbers between zero and one, but also can be arbitrary shapes of fuzzy numbers. Moreover. the relevant relationship between two concepts not only can be fuzzy positive association, but also can be fuz;g negative association. In order to reduce the time of fuzzy inference, we will use the relevant matrices and the relationship matrices to model the fuzzy-valued cclncept network. The elements of a relevant matrix repi esent the relevant degrees between concepts. The elements of a relation matrix represent the relevant relationships between concepts. Furthermore, we also allow users’ queries to be represented by arbitrary shapes of fuzzy numbers and to use the fuzzy positive association relationship and the fuzzy negative association relationship for formulating their queries for increasing the flexibility of fuzzy information retrieval systems. Furthermore, because of the trend of the Internet, the documents required by users should not be bound on a single host computer. A smart information retrieval system might help the users to get the documents required by the users on different computers through the Internet when the required documents can’t be found on the computers where the users submit their query expressions. Thus, we also expand our fuzzy-valued concept network architecture to the network-type fuzzy- valued concept network and present an information retrieval method in the Internet environment based on the network-type fu:ezy-valued concept network. 2. Fuzzy-Valued Concept Networks Firstly, we briefly review the concepts of fuzzy numbers [5] and the concepts of fuzzy positive association relationship [9] and fuzzy negative association relationship [9]. A fuzzy number F is a fuzzy set defined on the universe of discourse U that is both convex and normal. A fuzzy set A is convex if and only if for all U,, u2 in U, - . . fA(4 +(1-4u*) 2MWf,(u,),f,(u,)) (l) where ” A ” is the minimum operator, ” v ” is the maximum operator, and a E [0, 1 1 . In the following, we briefly review the concepts of the fuzzy positive association relationship and the fuzzy negative association relationship from [9]: (1) Fuzzy positive association: It relates concepts that have in some contexts a similar meaning (e.g., person t) individual) or which are typically used in the same context (e.g., person t) address). It relates concepts which are complementary (e.g., male t) female), incompatible (e.g., unemployed t) freelance) or antonyms (e.g., small t) large). A fuzzy-valued concept network can be represented as FVCN (S, K), where S is a set of nodes, and each node stands for a concept or a document; K is the set of all links between nodes. If k E K, then k can be represented by the tuple (j!?,FR ), where is the degree of linking strength and its value is a fuzzy number; FR is the relationship between two concepts linked by k, and FRE {P, w, where P stands for the fuzzy positive association relationship, N stands for the fuzzy negative association relationship. (2) Fuzzy negative association: 3. Relevant Matrices and Relationship Matrices In the following, we introduce the definitions of relevant matrices and relation matrices to model fuzzy- valued concept networks. The definitions of the transitive closure of relevant matrices and the transitive closure of relationship matrices are also presented. Definltlon3.1: The relevant matrix V is a fuzzy matrix, where the element v,, represents the relevant degree between concept c, and concept c , in a fuzzy- . . N N valued concept network, where is a fuzzy number. If v = 0 , then it means that the relevant degree between concept c, and concept e, is not given by the experts in the fuzzy-valued concept network. NN Definltlon: Assume that Vis a relevant matrix, r- - -1 106 i=l. 0 . n ( r i m ) i;F,n(r,e9r,2) . i=l,,n (rieD.,“> where “Ip1“ is the operator of choosing the fuzzy relationships whos,e priority is the highest. In this paper, we give the first priority to the fuzzy negative association relationship (N), the fuzzy positive association relationship (P) gets the second priority, and the relationship (Z) gets the lowest priority (i.e., N > P > Z). “n is the operator of choosing the combination of twcl relationships according to Table 1. Then, there exists a positive integer p, p S n - 1, such that RP = ~ p + l = RJ +~ = . . . . Let L = RP , then L is called the transitive closure of the relationship matrix R. Table 1. The combination of fuzzy relationships. I 4. Fuzzy Query Processing for Document Retrieval Based on Fuzzy-Valued Concept Networks Fuzzy matrices also can be used to represent the links between documents and concepts. In the following, we will, introduce the definitions of the document descriptor relevant matrix and the document descriptor relationship matrix. -4.1: Let D be the set of documents in the fuzzy-valued concept network, D = {d], d,, . . ., dm}, and let C be the set of concepts in the fuzzy-valued concept network, C = {c,, c,, ., c,}. Then, the document descriptor relevant matrix E is shown as follows: . . CI c2 . C“ NN N where m is the nuniber of documents, n is the number of concepts, e stands for the relevant degree between document di and concept c, , e,j. is a fuzzy number, 1 I i I mi, and 1 I -1 I n . N N . . Definltlon: The document descriptor relationship matrix F is shown as follows: CI c1 1.- Cn dl fll A, . A“ F = d, 1 f :T f l l ; * ; f:] d m f m , f m l fm where f, stands for the fuzzy relationship between document dl and concept c,, and {P, N, Z}. However, the experts may forget to set the relevant degrees and relationships between some documents and some concepts. Since the implicit relevant degrees and relationships between concepts can be obtained from the transitive closure T of the relevant matrix V and the transitive closure L of the relationship matrix R, we can use the transitive closure T of the relevant matrix Vand the transitive closure L of the relationship matrix R to get the implicit relevant degrees and relationships between documents and concepts. Let E = E 0 T, then E includes the implicit relevant degree between documents and concepts. Let F“ = F d L, then F“ includes the implicit relevant relationships between documents and concepts. E“ and F“ will then be used as a basis for similarity measures between queries and documents. Each row of E“ can be thought as a document descriptor relevant vector and each row of F“ can be thought as a document descriptor relationship vector. The user s query Q can be represented by a query descriptor relevant vector qv and a query descriptor relationship vector qr shown as follows: - - “ N - qv = , (V = 919 Y23 .,Yn>, - where 2, means the relevant degree between desired documents and concept c,, .“, is a fuzzy number, and 1 I i I n; y, means the relationship between desired documents and concept c,, and y, E {P, N}. If yl = P, then the desired documents should contain concept c,; ify, = N, then the desired documents should contain the complement of the concept c,. Moreover, if the user doesn t set the values of and y, , then concept c, is thought as been neglected by the user and andy, will be labeled as “-“. That is, the users “don t care“ whether the retrieved documents contain concept c, or not. Assume that there are two tuples, i.e., , where 2 and ? are fuzzy numbers, and B E {P, N, Z}, and D E {P, N, Z}, then the degree of similarity between and can be - 107 calculated by Host 1 where a E [O, 11 and Y(, , . Host n . Host 2 Host 3 dr, = +,,, r,2, . . ,, r,,,>. Then, the degree of satisfaction that document d, satisfies the user s query Q can be evaluated by - where ~ ( j ) is thejth element of the query descriptor relevant vector qv , qr(j) is thejth element of query - - descriptor relationship vector qr , 1 I j I n , RS(dJ E [0, 13, and k is the number of concepts not neglected by the user s query. The information retrieval system would display every document which has the degree of satisfaction greater t h a n a threshold value h, where h E [0, 11, in a sequential order from the document with the highest degree of satisfaction to that with the lowest one. 5. Fuzzy Query Processing Using Fuzzy- Valued Concept Networks in the Internet Environment Since the prevalence of the Intemet [4] and [18], the information about the documents needed by the user should not be bound on 2 single host computer. When the users queries can t be satisfied on the local computer, the information retrieval system should expand its searching capability to other computers on the Intemet until the required documents are either found or are declared non-existent. In this section, we will present the network type fuzzy-valued concept networks architecture as the basis for fuzzy information retrieval in the Intemet environment. The architecture of the network type fuzzy-valued concept network is shown in Fig. 2. Valued Concept Valued Concept Valued Concept Valued Concept Intemet Fig. 2. The architecture of the network type fuzzy-valued concept network. From Fig. 2, we can see that each host is linked to the Internet by the bold black lines. Each host has a local fuzzy-valued concept network as the knowledge base of the documents and concepts. Substantially, the local fuzzy-valued concept networks inside these hosts are the same as the ones introduced in the previous sections. That is, the fuzzy-valued concept networks inside these hosts allow the values of the relevant degrees between concepts to be arbitrary shapes of fuzzy numbers, and the relevant relationships between nodes to be the fuzzy positive association relationship or the fuzzy negative association relationship. Since the local fuzzy-valued concept networks inside these hosts are the same as the ones