Induction Via ID3

Expert Systems CIS 718 Li Yao

Paper:

Durkin, J., Induction Via ID3, AI Expert, Vol. 7, No. 4., pp48-53, April 1992

Other reference material:

Durkin, J., Designing an Induction Expert System, AI Expert, Vol. 6, No.12, pp29-35, Dec. 1991

Induction Systems

Extracting knowledge from a human expert has been recognized as the most time-consuming effort in development of expert systems. It is a challenge in instances when the expert is unable to communicate the knowledge or unaware of the knowledge used. It is even more difficult for applications where no real expert exists. For example, in problem such as weather forecasting or race prediction, knowledgeable individual predicts the outcome of a given event rely on the past events to aid their prediction. This type of inherent problem with knowledge acquisition process has motivated expert-system designer to search for alternative techniques. One of these techniques is learning from examples. This is known as induction, the process of reasoning from a given set of facts to general principles of rules.

There are several induction algorithms have been developed. The most popular one used today in design of expert systems is known as ID3 developed by Quinlan in 1979. ID3 takes a set of examples about some problem and induce a decision tree or a set of rules that captures the decision-making knowledge about the problem. An example is a combination of decision factors, decision factor values and actions specific to that problem. The highlight features of ID3 algorithm are as follows:

It chooses most important issue first. – The algorithm uses a heuristic approach to generate the decision tree, which place attributes in the nodes of the tree in a manner that can minimize the search effort in locating a solution. In general, ID3 places the most important issues near to the root of the decision tree.

No-data result – When the examples do not support the situation leading to a specific result, no-data solution will occur.

Excludes irrelevant factors - The algorithm can determine if some attributes is irrelevant for predicting the final result and drop it from further consideration.

The ID3 algorithm is a descendant of Hunt’s Concept Learning system ("CLS") (Hunt, E.B., Marin, J. and Stone, P. "Experiments in Induction" New York: Academic Press). The CLS algorithm begins with an empty decision tree and iteratively builds the tree by adding decision node until the tree correctly classify all of the training examples ("C"). CLS algorithm proceeds as follows: -

If all examples in training examples in "C" are positive, it creates a Yes Node

If all examples in training examples in "C" are negative, it creates a No Node

Otherwise, select an attribute A with values V1, V2, …Vn and create decision node.

Partition the training examples in "C" into subset C1, C2, …Cn according to the values of V.

Apply the algorithm recursively to each of the sets Ci.

ID3 closely follows the CLS Algorithm with several modifications. While CLS requires all training examples to be available during step 1, which place a limit on number of examples can be solved, ID3 algorithm, on other hand, can work with subsets of the examples to solve more complex problems involving a large number of examples. The ID3 algorithm follows the following major steps:-

Select a random subset of size W (window) from the entire set of training examples

Apply the CLS algorithm to form decision tree

Scan the entire set of example to find exceptions to the current rule

If there is some exception, insert some of them into the window and repeat step 2; Otherwise stop and display the latest rule.

This algorithm iteratively converges to a final rule that captures the concept.

Induction offers several advantages to the user.

Induction systems can discover possible unknown rules from a set of example.

It offers a technique whereby the system’s knowledge can be acquired directly from examples

It can produce new knowledge even though the expert may not explicitly aware of the decision making knowledge

It can uncover critical decision factors and eliminate irrelevant decision factors.

The disadvantages associated with induction methods are that:-

It is often difficult to choose good decision factors for which the system effectiveness is depended on.

The result is in a form of decision tree. It is difficult to understand the decision process for a complex problem by tracing through a decision tree.

In conclusion, when designing an expert system, if one find it is difficult to extract the knowledge from an expert, find the expert is unaware of the decision-making knowledge, or find that no real expert exists, induction approach should be considered. The induction may be an effective solution if prior problem examples exist. The most common induction algorithm is ID3. It extracts knowledge from a set of past examples that can be used to make decision or predictions on future events. The technique relies on heuristic approach that has been proven effective in many applications.