Clustering techniques are data mining techniques used when the knowledge about the problem at hand is very limited. An entire chapter of the book is dedicated to the k-means algorithm, which is one of the simplest algorithms for clustering. It is widely used in agriculture. Moreover, its simplicity also allowed the development of many variants of the algorithm that can be tailored to different applications. An implementation of the k-means algorithm in MATLAB is given in the book. Another chapter of the book is completely devoted to techniques for biclustering.
Classification techniques fall into another category of data mining techniques. They can be employed when a training set is available, i.e. a set of samples for which a classification is already known, that can be exploited for learning how to classify samples having an unknown classification. Artificial Neural Networks (ANNs), Support Vector Machines (SVMs) and the k-Nearest Neighbor (kNN) method are some of the techniques for classification in data mining presented in the book. Single chapters of the book are devoted to these techniques. Because of the simplicity of kNN, its implementation in MATLAB is also provided. Available software for ANNs and SVMs are instead presented, and examples of use are discussed.
Applications in Agriculture.
All chapters of the book contain a section in which real-life applications are presented in details. The focus is on applications to the agricultural field. A large list of applications is presented. They include, for example, an application in which apples are checked and classified as good or bad for the market. In another application, problematic wine fermentations are predicted after three days from the beginning of the process, so that enologists can interfere in time for guaranteeing good fermentations. The sounds issued by pigs are also studied for discovering diseases. Many other applications are included in the book.
Once a data mining technique has been applied and results have been obtained, the validation of such results is usually required. Methods for validation include the Test Set method, the Leave-one-out method and the k-fold method. An implementation in MATLAB of all these methods is provided. Examples are discussed in which data mining techniques are applied to simple problems and the obtained results are validated by using validation techniques.
Parallel computing allows to exploit the CPU power of many processors simultaneously for solving difficult problems. Parallel versions of the data mining techniques discussed in the book are presented. At the time we were preparing the book, we found no data mining applications in agriculture in which the parallel computing paradigm was exploited. However, we believed it was important to devote a chapter of the book to parallel computing and data mining, where some possible future applications in agriculture are pointed out.
One of the two appendices of the book is devoted to an application wrote in C programming language. The implemented data mining technique is one of the variants of the clustering algorithm k-means. All the details of the application are provided: the codes of all the C functions are presented and commented in the text row by row. This application in C is presented with the aim of providing the reader with a complete example of implementation of a data mining technique. The other appendix of the book contains a brief description of the MATLAB environment, to which the reader can refer for the several codes in MATLAB that are presented in the book.
At the end of many chapters, exercises related to the topic discussed in the chapter are presented. All solutions are given in the last chapter of the book.