MJay

A Case Based Reasoning Algorithm for Enterprises’ Integration of Informatization and Industrialization 본문

Cloud Computing/AI

A Case Based Reasoning Algorithm for Enterprises’ Integration of Informatization and Industrialization

MJSon 2017. 6. 2. 17:24

CBR.pptx


Abstract


The integration of informatization and industrialization is depth integration of informatization and industrialization in many fields, which is the new developing way for informatization and industrialization. 

It is a complicated process of integration for enterprises, which is necessary to learn from the experience of other enterprises to improve efficiency

But facing numerous cases, it is hard for enterprises to pick up the right cases, because of shorting of effective recommended algorithm in academic field and practical application. 

Base on the studying the existing recommend algorithm, this paper try to design new algorithm with analyzing the structure and characters of the cases. 

First of all, the case structure is analyzed and are classified and the information of cases are classified and graded according to the degree of integrity; 

secondly, AHP is used to give weights to different attributes; then, the recommended algorithm is illustrated; finally, the cases similarity are calculated and the high similarity cases are outputted to the users.

AHP (Analytic Hierarchy Process)는 가중치를 산정할 때 더없이 좋은 도구(tool)이다.
난 이 기법이 전문가 의사결정에 빠질 수 없는 것이라고 생각한다. 과학적/수학적 근거가 튼튼하여 보고를 해도 자신감을 실어준다.

AHP는 두뇌가 단계적 또는 위계적 분석과정을 활용한다는 사실에 착안하여 Thomas L. 이 고안한 계산모델이다. 이것은 의사결정의 전 과정을 다단계로 나눈 후 이를 단계별로 분석 해결함으로써 최종적인 의사결정에 이르는 방법으로, 굳이 한글로 번역하여, '계층 분석 과정'이라고 불리기도 한다.

AHP는 행렬을 이용한 가중치 산정법이라고도 볼 수 있다.그 절차를 예를 들어서 4개의 스탭으로 설명해보려 한다.
나는 현재 3개의 학교(A, B, C)를 놓고 어느학교로 진학할지 고민하고 있다... 재밌는 것은 의외로 많은 사람이 그렇듯이 본인이 정말 뭘 원하는지 정확하게 모르는 상태라는 것이다.
뭔가 하나를 꼭 찝어서 결정해야 하는데, 이걸보면 이게 좋고  저걸 보면 저게 좋은 이순간! 나의 뇌는 생각을 조각내어 Bottom up방식으로 접근하게 된다.
그걸 쭉 적어보니 학업, 교유, 학교생활, 학비, 진로, 위치 라는 요소들이 나의 학교에 대한 만족도를 결정짓는 요소(Factor)인 것 같다. 3개의 학교에 대해 6개의 요소를 고려해야 하는 상황이다. 경우의 수는 18가지가 된다.
이 상황을 정리해보면 Figure 1.과 같다.



하지만, Factor 두 개를 상호 비교하는건 쉽다. 이게 이름하여 쌍대비교이며, AHP의 방식이다. 
학업에 대해서 3개 학교를 쌍대비교 해보자. 학업적 측면에서 'A가 B보다 좋은가?', 'A가 C보다 좋은가?', 'B가 C보다 좋은가?'의 세가지 쌍대비교가 존재한다. 이 각각에 대한 답을 내는건 어렵지 않다.
이런식으로 6개 Factor에 대해 모두 수행하면, 어느덧 모든 Factor와 학교를 다 다루게 된다. 훨씬 쉽다.
A와 B를 쌍대비교하게되면, 자연스럽게 B와 A도 비교결과는 나와버린다. A가 B를 2배만큼 좋아하면 B는 A보다 1/2만큼만 좋은 것이다. 즉, 역수를 취하면 된다. 이런 원리로, 쌍대비교 결과는 행렬로 정리된다. 그 내용을 정리하면 Figure 2.와 같다.

쌍대비교 행렬은 자체적으로 행렬 곱을 연산해서, 상호 중요도를 수학적으로 도출할 수 있는 매력을 가진 행렬이다.
계산법은 Figure 3.에서 나타낸 바와 같이 (1) 두 행렬을 곱하여, (2) 행간을 더한 행렬을 구한 다음, (3)전체 합에서 각 행의 비율을 산정하는 것이다.

1. Introduction 

In China, the integration of informatization and industrialization is the Chinese characteristic way to develop industry and information technology, which is mutual promotion and win-win process.

 Similar to early information construction, the process of the integration is systematic and complicated. 

It easily lead to the failure, if any deviation happens. 

So that it will be useful to draw on the successful experience of the integration by enterprises for the construction. 

But there are great differences between enterprise in different industries, whether in the management process or in the production process

It is hard to pick up the right cases from numerous cases in database. 

Additionally, a single case or the cases from different industries can’t meet the enterprises’ needs. 

Hence, it is urgent to design the case reasoning algorithm of enterprise’ integration of informatization and industrialization

According to their self-characteristics and needs, enterprises offer the key words and the algorithm will calculate similarity of cases in the database. 

The highest similarity of cases will be recommended to the enterprise. 

Because of unstructured data and numerous cases, the accuracy and speed of the case similarity calculation is the key to the algorithm. 

So the case retrieval algorithm has high research value. 

This paper is organized as follows. 

In the following section, literature review is given to show the current research in this filed and pointed out what theories are used in this paper. 

Subsequently, the characteristics and the structure of the cases are analyzed and the cases are classified according to the degree of integrity. 

The process of case reasoning and the design of case reasoning algorithm are given in the future section. Finally, a data set is used to test the algorithm and the result show that it is effective and efficiency.

2. Literature Review 

Case-based Reasoning method is one of the important solution in the field of artificial intelligence and a way to solve the problem by imitating human thought

It can take the solution for the old problem as the way to solve new problem. 

Because of no need to store prior knowledge, it can eliminate the difficult problem that the knowledge is difficult to obtain in the general knowledge system. 

CBR originated from cognitive science research by Roger Schank in AI Laboratory of Yale University in the 1980s. 

In the 1982s, Schank proposed the dynamic memory theory with memory organization packet as the core, which considered the earliest thought of CBR in the artificial intelligence field [1]. 

The first case based reasoning system is CYRUS, developed by Janet Kolodner (1983), which is a basic question answering system with the all the travel and meeting information for former U.S. Secretary Cyrus Vance.

 The case store model for the system became the basis for later case-based reasoning system, including MEDIATOR (Simpson, 1985), PERSUADER (Sycara, 1988), CHEF (Hammond, 1989), JULIA (Hinrichs, 1992) and CASEY (Koton, 1988) [2]. 

In aspect of case representation, Doyle [3], at the University of Berlin, proposed a casebased reasoning language based on XML. 

Wang Yue [4] proposed a representation method based on Simulation and structure of case through analyzing the description of the case structure and the problems they faced in case reasoning. 

Kai Bo Zhou [5] combined XML and object oriented technology to puts forward a kind of object - oriented case representation method. 

Case retrieval algorithm is the core part of case reasoning. There are three kinds of general methods of case retrieval: 

the nearest neighbor method, 
the inductive index and 
the knowledge guidance. 

With the expansion of the application of case reasoning, many scholars in specific areas, put forward some new methods, such as case clustering analysis, neural network based case retrieval model, rough set theory combined with the fast case retrieval model and so on. 

In order to improve the efficiency of retrieval, Ma [6] used clustering analysis to classify the cases before case retrieval. 

Meng Yanni, [7] studied the model based on neural network. 

Haibo Zhou [8] compared the different data types of case attributes and proposed the similarity calculation method according to several different types of interval data. 

In aspect of setting case attribute weights, Zhansi Jiang [9] uses the similarity deviation method to give attributes weight by calculating the minimum values of the sum of similarity deviation square. 

Xiaoyan Wei [10] uses the analytic hierarchy process (AHP) to study the weight setting problem of the multi-dimensional optimization algorithm. 

Gareth [11] study the application of genetic algorithm in the problem of setting the case attribute weight. 

Gu and Li propose a new case retrieval method called FRAWO, in which emphasis is put on the problems of similarity calculation of fuzzy and interval attributes of cases using trapezia based fuzzy set and the dynamic weight of a case is adjusted by adopting PULL&PUSH strategy [12]. 

The basic theory of case based reasoning algorithm has few improvement and the innovation focus on the new application in different fields. 

In the practical application aspect, the research of the application of case based reasoning in decision support system is given by Albert [13]. Zhang [14] has done research on the application of case-based reasoning on the disaster relief auxiliary decision support system. 

Some domestic scholars research the theoretical framework of emergency decision making based on case based reasoning support system. 

Yanchao Feng, who integrates the CBR into the budget management, manages the new project budget according to the information of matching case [15]. 

Case based reasoning is also integrated with complex medical diagnosis

3. Case Characteristics Analysis 

3.1. Case Content 

There are four main parts in a case, including enterprise profile, the construction situation, the construction program, achievements and experience. 

Do not use abbreviations in the title or heads unless they are unavoidable.



3.2. Case Characteristic 

The structure of the case includes three parts: 

the first part is the basic information of the case, which is shown to the readers after the searching, but it is set as the identification parameters rather as the inference parameters;

 the second part is the basic parameters, which is divided to the basic information of the enterprise and the integration. 

These information is used as the inference parameters; 

the third part is the program of the integration, including the integration program and the information of supporting enterprises. 

This part is what users want to learn for integration construction. Also they are parts of important parameters for searching.




3.3. Case Classification

The case level can be classified to ten levels according to the case completeness. 

In order to calculate the similarity conveniently, the levels are assigned to 1, 0.95, 0.9, 0.85, 0.75, 0.7, 0.65, 0.6 and 0.45.

4. The Reasoning Algorithm

4.1. Case Retrieval Process 

In this study, the searching strategy uses a combination of inductive method and the nearest neighbor algorithm, which is the nearest neighbor method with index. 

In searching, the process of case searching is divided to two parts. 

When there are a large number of cases, the inductive retrieval method is used according to the directory index until the number of cases is less than threshold vale. 

This step is equivalent to a classification of the decision-making problems and make out that which category of the cases belong to. 

In the searching results, the similarity is calculated by using the nearest neighbor method. 

The cases are sorted by the similarity and the highest ones are recommended to the users. 

According to the user needs, the case searching is divided to two parts: 

basic search and advanced search. 

Based on the four attributes of Industry category, enterprise scale, the ending time of integration and the case classification, the similarity is calculated in basic search. 

Advanced search needs eight more attributes, includes aggregate investment, information business, program origin, hardware and software equipment origin, the kinds of information systems, the number of staffs, the suppliers and related service enterprises. T

he process of case-based reasoning, as show in Figure 1:



4.2. Weight Setting 

Because that each case attributes have different effect on the description of the case, so that different weights are arranged to the attributes. 

The key attribute has a great influence on the retrieval result, and the secondary attribute has less influence on the retrieval result.

The retrieval result objectively reflects the similarity degree between the target problem and the existing cases in the database, and the accuracy of the retrieval is improved. 

The expert evaluation is taken for the weight setting, then the expert opinion is analyzed by the AHP (analytic hierarchy process method).



The steps of using AHP to assign weight, as follows: 

Establishing judgment matrix: It assumes that there are n attributes and the judgment matrix A is composited by 
mutual contrast between every two attributes. 

It indicates the importance of the first I attribute relative to the j attribute



4.3. Case Attribute Classification 

Before discussing the similarity of each attribute, we must firstly classify the attributes. 

Different types of attributes have different definitions of similarity. 

There are three types of attributes: 

(1)Numerical continuous attributes: Attribute values are represented by continuous numerical value, for example: the finishing time of the fusion case, the number of total investment and information related personnel are all belong to this type, which are known as numeric attributes. 

One method of normalization is to standardize the numerical value and to make the characteristic value of the mean value equal to the variance. 

The mean value of the attribute X is m and the standard deviation is S.



(2)Enumerative attributes

The characteristic value is not only the grade order relation, but also has the attribute of the quantity relation. 

For example, the geographical location of the fusion case, the source of hardware and software. 

경도,위도 .  
(Id1, Id2, Id3 ... );

(3) Ordinal attributes: 

The desirable feature value has a sequence of relation, but there is no quantitative relation. 

It is called hierarchical attribute, such as a case rating, which belong to this type. 



To facilitate the calculation of similarity, this paper has assigned the 10 grades of the case rating, and transformed into numerical attributes, so the similarity calculation is the calculation method of numerical attributes.

4.4. Similarity Calculation between the Attributes

(1) Numerical continuous attributes Since that all numerical attribute values are positive in the case, so after the standardization of the value, it is possible to apply minimum value method to calculate the similarity between numerical attributes, namely:



(2) Similarity calculation of enumerative attributes: 

For the similarity calculation of the enumeration type attribute, the method of direct matching is used to determine the similarity between the two attribute values according to the matching of the string. 

In addition, due to the user in the formulation of the fusion schemes, one may not make just one decision, such as the kinds of information system in a program is impossible only OA or ERP. 

Enterprises may choose kinds of systems like OA, ERP, OM in a program at the same time.



x = all means that the user fill in “all”, i y =null means that the characteristic value of the attributes are null in the source cases. 

In particular, similarity calculation of industry classification attributes needs classified: 

If the user submits the industry classification for the first class industry, then use the formula (9) to calculate the similarity; 

otherwise, if the user submitted the industry classification for the two industry, then use the formula (10) to calculate the similarity. Specific formulas are as follows:



Sim( i x , i y ) = 0, if primary industry matching; Sim( i x , i y ) = 0.6, if primary industry matching, the second industry does not match; Sim( i x , i y ) = 1, if the second industry.

(3)Similarity calculation of Ordinal attributes 

For hierarchical attribute in the case, in order to facilitate the calculation similarity between the scale of the enterprise, as the size of large, medium, small and micro assign respectively 4, 3, 2, 1, and the similarity calculation method is based on the distance:



4.5 Case Similarity Calculation 

Target case X is consist of 4 attributes. X=( 1 x , 2 x , 3 x , 4 x ), represents the one attributes of the target case. 
Y= ( 1 y , 2 y , 3 y , 4 y ), represents the one attributes of the target case. 

According to the nearest neighbor algorithm, the similarity function between the target case and the source case is:




In the formula, the weight value of every attributes are equal to Wi . 

Target case X is consist of 15 attributes, X=( 1 x , 2 x ,……, 15 x ), i x is one attribute of the target case, Similarity, Y= ( 1 y , 2 y ,……, 15 y ), i y represents the one attributes of the target case. 

According to the nearest neighbor algorithm, the similarity function between the target case and the source case is:



In the formula, the weight value of every attributes are equal to Wi .


5. Empirical Study 

The threshold is set to number 50, 100 and 150 and the number of cases is set to number1000, 2000 and 3000. Through five times searching, five cases with the highest similarity are recommended and the searching time and average searching time are recorded.



6. Conclusion 

In this paper, based on the user's needs, case based reasoning is divided into basic and advanced searching. 

The searching process is divided into two retrieval process for case based reasoning: 

The first retrieval search all cases in the case library by filtering the input conditions and the second search calculate the similarity of the target case between the cases based on the results of the first search and output the cases with more high value than the threshold. 

Meanwhile, the searching process, procedure and threshold are illustrated. 

Firstly, in order to complete the similarity calculation of the second retrieval, AHP method is used to assign the weight of the attributes; 

Secondly, for the two forms of numerical attributes, standard normal distribution formula is improved and the method of similarity calculation is discussed; 

at last, empirical study is used to test the algorithm.