Background Literature Review

1.1 Topic Of Choice

This research is based on the nursery data set provided by the Centre for Machine Learning and Intelligent Systems. The nursery database was derived from one of the hierarchical; decision models originally developed to help rank the application for the nursery schools. The data sets were primary categorical in terms of their attribute characteristics, but multivariate. The school is a nursery setting which makes me asocial place with various attributes. The nursery database was originally used in the 1980 where excessive reenrolment was registered in number of school in Ljubljana Slovenia. During these data’s, most of the applicants were rejected because of various reason and this lead to many problems because the rejected applicants demanded an expansion to their rejections.

The main criteria used to determine the results of any application were the occupation of the parents the child’s nursery, the family structure of the applicant, and the financial standing. The social and health picture of the family was also considered important. The data sets were categorised on the basis of the sub problems. For example, below is the conceptual structure of the nursery application

Nursery Evaluation of applications for nursery schools

EMPLOY Employment of parents and child’s nursery

Parents Parents’ occupation

Has_nurs Child’s nursery


Family structure

children Number of children

finance Financial standing of the family



Social and health picture of the family

social conditions

health conditions

The main values under consideration include:

Parents: usual, pretentious, great pretender

Has_nurs: proper, less proper, improper, critical, very critical

Form: complete, completed, incomplete, foster

Children: 1, 2, 3, more

Housing: convenient, less convenient, critical

Finance: convenient, inconvenient

Social: non-problematic slightly problematic, problematic

health: recommended, priority, not recommended

Since it was donated to the UCL in 1997-06-01, it has be used over 12960 time. The dataset has n missing values. The model was primarily developed within an expert system shell for decision making and has since been used t for other purposes pertinent to nursery application and admission.

1.2 Previous Research

According to Olave, Rajkovic, &, Bohanec, (1989), the data can be very useful when analysing the application, and admission in the public school system ad this is basically in relations to the expert system in public administrations. The researcher argued that, the hierarchical model can be also being useful for machine learning by function decomposition. The HINT (HIerarchy INduction Tool) can completely reconstruct the hierarchical model into a simpler mode that can easily be used to evaluate the use of simple public system. The original hieratical model used in 1980 is as shows below:

Because of the hierarchical structure of the database, the Nursery Database is specific about the key eight attributes used. In this case, the main input attributes include parents, has_nurs, form, children, housing, finance, social, health. It is also important to note the fact that the nursery data base can be used even without the structural information because it relates to the main tributes. According to ..There is a number of underlying concept structures that makes the nursery database useful for testing the constructive induction. Michalski, (1986, pp. 3-25)used the nursery database with the structure discovery method.

There are many researches that have been based on the Nursery Database, for example, intelligent system used with the public administration of scarce resources has relied on the datasets because the data sets has the same attributes as any social setting. On the other hand, in machine learning, the hierarchy induction learning tolls are very useful and are mostly used because of the overall resemblance to the public administration systems (Curtis. 1962; Luba, 1995, pp. 256-261).

1.3 Statistics

According to Biermann, Fierfield, &, Beres, (1982, pp. 635-648), the number of instances that completely cover the attribute space was estimated to be 12986, for all the eight attributes. The class distributing based on the above statistics including the percentage is as shown below



















Machine learning by function decomposition

The learning by function decomposition was the work of Zupan et al (1997) that was used to establish the overall decomposition approach to the machine learning system. From experience, it is quite clear that the machine learning system in artificial intelligence can rely on the signature table that can be used it evacuate the data. Just like in the nursery datasets, the main problems lie in the difficulty of deriving the structure of the concepts. Currently the decomposition approach for incompletely specified switching functions is an improvement in the decomposition approach because it used the multi-value switching functions and each multi value variables is primarily encoded, but this time they are done in Boolean variables. One of the main benefits of function for decomposition for machine learning identified by Zupan, Bohanec, Bratko, &, Demsar (1997, pp. 421-429) is that it is robust and can only be compared to the C4.5 decision tree inducer. The constructive induction system mainly uses a set of the a valuable attributes and other predefined constructive operators because this is the only way to derive the required attributes.

Section 2: Neuron Network Design Choice

2.1 Design Choice

According to Ashenhurst, (1952, pp. 541-602), the neuron network design choice adopted is determined by the architecture of network, the overall structure of artificial neurons and the required learning rules. The neural network design help by providing very clear and detailed information about the fundamental neural network architecture as well as the predetermined learning rules. Both the mathematical analyses of the nursery data networks as well as the methods of training the networks. The networks are applied in solving social problems in patterns recognition, and control system (Biermann, Fierfield, &, Beres, 1982, pp. 635-648).

This project is grounded on performance learning and back propagation. In as much as back propagation is useful in determining the patterns, the use of simple building blocks will make it easy to understand the associative and adaptive networks and competitive networks. The various design choice include the Network architecture:

2.2 Hidden Layer Depth

The project will involve 5 hidden layers. The project uses multiple layers because they are relatively powerful as compared to the single layers. The five layered network is trained to help in approximating the function considering the classifications of the data and the number of attributes under considerations. In this case, the most effective way to do this is to use the multilayer. Additionally, the number of inputs variables and the output required. The external variables used as inputs are parents, has_nurs, form, children, housing, finance, social, health.Zupan, et al, 1997, pp. 421-429).

2.3 The Number Of Processing Elements Per Layer

The design chose in this project is based on the multilayer perception because of the ease of completing the experiment using the NeuroSolution system. The most important part of the system is the NeuroSolution for excel makes it easy to train networks, and test for classification performance. The project will also involve testing of the Neuronetwork after in which the test data will be loaded with the nursery data

2.4 Genetic Optimisation On: Input Space/No. Of Processing Elements/Learning Rate/Momentum

The optimization procedure of the neurons to predict the possible outcome of the selection process of students in public schools using the evolutionary artificial neural network for the resource allocation is very challenging because the resou5ce allocation is very non linear and is not certain also, considering fact that the genetic algorithms cannot perform the direct calculations, the use of neurosolution for excel provides a unique system in which training of the data is achieved within the add-on. In this case the linear threshold ca not is used by the genetic algorithm.

Even though the traditional artificial neural networks that are primarily founded on the back propagation algorithms tend to have a few drawbacks, they are modified by fixing the architecture of the artificial neural network. Additionally, it is all-important to note that the derivative of the errors function learning algorithm is also improved to reduce the degree of error and to avoid being stuck in the systems local optima. In this project the fundamental algorithm is considerably simple because the main network factors especially the learning rate makes it easy to solve the school allocation problems. It is also important to note that this is enabled by the other factors such as number of units in each hidden layer, and the number of layers (Olave, Rajkovic, &, Bohanec, 1989).

2.5 Neuron Optimization Of The Hidden Layers

From experience, the optimization of the hidden layer indicates that the hidden neurons in the first hidden layer is 12, and while the hidden layers optimum number of the neurons in the second hidden layer is 2.this progression continues till the values are all at the fixed weight matrix. At this stage the genetic operator are implemented and the parameter values are analysed at the fixed weight matrix and the genetic operators are left out of the neural network design, when a genetic algorithm is embedded to the networks weight space, there are new neural networks architecture that are added into the pool. This way, it is easy to extract new parameters from the existing evolutionary operators into the new neural networks and numerical values with the minimum ad the average and the statistical result are collected or each and every evolutionary performance.

2.6 The sequence of the network is as follows: usual,proper,complete,1,convenient,convenient,nonprob,recommended, recommend. Such that for every problematic parent, the children are not recommended, while for the problematic parents the students are not recommended. This is iteratively done. For the less convenient, for slight inconvenient, for the problematic are not recommended. It is quite clear that many students will not be offered places in the schools because either of the parents with is problematic, inconvenient. This also means that there are other strategies that can be followed to ensure that each school gets the maximum number of students and each student gets the optimal opportunity for being admitted into a school. The idea is to find and optimal number for both students and school inconsideration of their parent’s behaviours and their financial situations.

Neural network performance

2.7 The Time Taken To Learn

Learning process is easy using the resolution for excel is very easy as the add-on runs smoothly. In a matter of seconds the learning process is completed. However, there are instance in which error prompt is given. Training data sheet to be used was created with the: “input” and “desired”, and “training” tags. The network was trained four times without varying the parameters. Then the genetic algorithm was also trained. Tagging the data was easy but training the data means raging the columns and the rows into various variables. The most important thing about the neurosolution is its iterative process that makes it difficult to perform other operations before completing one operation.

The learning rate was determined based on the need to compare the links to the biases and the direction of the link. The design choice is very important because there are cots and quality implication such that it at all cost is minimised, quality and accuracy is compromised. The main idea is to use an optimal system in which the neuron is trained optimally. The main features- (8 features) are used and the inputs to the neural network for each of the 8 features resulted into 24 total inputs. For this data tagging is conducted for the first 20 columns as inputs while the last columns are the desired variables. Each of the first was are then tagged and labelled training. In this case, a five hidden layer MLP is used to represent the neural network model

Training is carried by running the tagged data, by running the process and the trained network produces 6 times giving 20 epochs each time the run process is carried out. And the trainmultpipe report is produced that provided the completely summarized reports

Section 3 – Neural Network Performance

3.1 Network Performance, Limitation, And Optimization

This section describes the neurosolution desired performance data. It is quite clear that the increase in number of layers increased the overall duration time taken to train and run the data. For example, when there was only on layer, the time taken (rate was only) 5 seconds. However as the layer were increased to four and then five, the rate of execution decreased leading to longer execution speed

Additionally, genetic optimization highly impact time and accuracy. For example, whenever evolutionary optimization is carried out, the time for execution is highly increased, but the overall accuracy of the neural networks improves. This means that in as much as neurosolution is effective in solving social problems such as the public schools allocation, there are other factors that must be taken into consideration before making conclusion about the optimization problem. For example, it is important to note that for improved accuracy, more time should be taken completing many runs. On the other hand, quality, and accuracy are exchanged and the interplay between the two leads to an optimal neural network.

3.2 Multi-Layer Perceptron (MLP)

Multi Layer perceptron (MLP) is the latest feedfoward neural network which has more than one hidden layer. The system run through the conventional system in which the input is converted into output. While this is used for classification, prediction as well as recognition and approximation, the MLP was used to predict the possibility of a students being admitted into the public schools. It is trained using the back propagation algorithm; the MLP is only used because the single layer can only generate the decision boundary, while the MLP can generate boundaries for linearly separable problems where each unit is made up of a perceptron. The first layer only draws the required linear boundary while the second layer helps in combining the boundaries, ands the third layer helps in the arbitrary generation of complex boundaries.

The network training was done till the desired criteria for stopping was realised. The complete presentation of the epoch. The batch mode was used in the network training to determine the weight changes of every pattern in the training set, and then total change was calculated after the individual changes were summed up. The advantages of the batch learning are that it is relatively faster than the sequential mode learning. It is also much easier to use theoretically and can be easily parallelised.

Section 4. Testing

Considering the trade off between time/ rate and accuracy of the Neuronetwork, it biomes clear that, the extremes of both multiple hidden layers and few hidden layer should be avoided. On average, few- two or three hidden layers are adequate for such a large volume of data. If the nursery data had only two variables it would have been easy. To avoid many errors and more the accuracy of the neural networks, it important to maintain few runs, hidden layers. Therefore, both learning and training are important processes that guide the rest of the work. Testing the network include testing the sensitivity about the mean and finding an optimal way of allocating resources. It is not easy to complete the Neuronetwork design using the neurosolution for excels, but it provides a good results comparable to the complete version.

Section 5. Conclusion

The project was to use the nursery database to determine or predict the optimal space allocation for students in public schools. The MLP was used to batch learn and train and test the neuron. Generally, there were a lot of drawback using the neurosolution for excel considering the fact that the data was not coma delimited and the tagging process was long because the variables under consideration were many. Never the less, despite these drawbacks, the program merged as the best software since it provided room for mapping the data multiple times rill the objective was reached. The most important lesson through is that few variables lead to inaccuracy and many variables takes a lot of time to train and learn. Therefore in order to achieve the best results, it is important to trade off time with quality till the optimal results are reached.

7. Bibliography

M. Olave, V. Rajkovic, M. Bohanec, (1989). An application for admission in public school systems. In (I. Th. M. Snellen and W. B. H. J. van de Donk and J.-P. Baquiast, editors) Expert Systems in Public Administration, pages 145-160. Elsevier Science Publishers (North Holland),

B. Zupan, M. Bohanec, I. Bratko, J. Demsar (1997). Machine learning by function decomposition. In (D. Fisher, ed.) Proc. ICML-97, pages 421-429.

Ashenhurst, R. L.: 1952, The decomposition of switching functions, Techical report, Bell Laboratories BL-1 (11), 541-602.

Biermann, A. W., Fierfield, J. and Beres, T.: 1982, Signature table systems and learning, IEEE Trans. Syst. Man Cybern. 12(5), 635-648.

Curtis. H. A.: 1962, A New Approach to the Design of Switching Functions, Van Nostrand, Princeton, N. J.

Luba, T.: 1995, Decomposition of multiple-valued functions, 25th Intl. Symposium on multivalued logic, Bloomington, Indiana, 256-261.

Michalski, R. S.: 1986, Understanding the nature of learning: issues and research directions,in R. Michalski, J. Carbonnel and T. Mitchell (eds), Machine LEarning: An ArtificialIntelligence Approach, KAufmann, Los Atlos, CA, 3-25.

Michie D.: 1995, Problem decomposition and the learning of skills, in N. Lavrac and S. Wrobel (eds),Machine Learning: ECML-95, Notes in Artificial Inteligence 912, Springer-Verlag, 17-31.

Perkowski, M. A. et al.: 1995, Unified approach to functional decompositions of switching functions,Technical report, Warsaw University of Technology and Eindhoven University of Technology.

Pfahringer, B.: 1994, Controlling constructive induction in CiPF, in F. Bergadano and L. D. Raedt (eds),Machine Learning: ECML-94, Springer-Verlag, 242-256.

Ragavan, H. and Rendell, L.: 1993, Lookaheah feature construction for learning hard concepts, Proc. Tenth International Machine Learning Conference, Morgan Kaufman, 252-259.

Ross, T. D. et al (1994): On the Decomposition of Real-valued Functions, 3rd International Workshop if Post-Binary VLSI Systems.

Samuel, A.: 1967, Some studied in machine learning using the game of checkers II: Recent progress, IBM J. Res. Develop. 11, 601-617.

Shapiro, A. D.: 1987, Structured induction in expert systems, Turing Institute Press in association with Addison-Wesley Publishing Company.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s