Abstract: Regression analysis has been a major theoretical pillar for supervised machine learning as it is applicable to various identification and classification problems. Aiming at robust regressors, two major approaches have been adopted. The first category contains a variety of regularization techniques whose principle lies in incorporating both the error and penalty terms into the cost function. It is represented by the ridge regressor. Other prominent examples include RBF approximation networks by Poggio and Girosi  and Least-Squares SVM introduced by Suykens and Vandewalle .
The second category is based on the premise that robustness of the regressor could be enhanced by explicitly accounting for measurement errors in the independent variables. This is known as errors-in-variables models in statistics and is relatively new to the machine learning community. Based on such models, we have developed an approach named perturbation-regularized (PR) regressor. (1) It can yield a desirable smoothing effect on the regressor result. (2) It can enhance the robustness of classification results. (3) The regressor can facilitate identification and removal of outliers from the training dataset (a notion closely related to PPDA).
There is no doubt that a regressor would certainly yield a better estimation if the original input is directly available. Additional estimation error will inevitably occur due to the fact the input information is only indirectly available under the errors-in-variables model. Our PR regression analysis is founded on an effective decoupling between the usages of direct and indirect information. Our main result is a "Two-Projection Theorem". It facilitates the error analysis by effectively dividing the estimation into two stages. More exactly, the first estimation (i.e. projection) reveals the effect of output noise and model-induced error (caused by under-represented regressors). Then, the second projection leads to a tradeoff analysis between order and error. This facilitates our determination of a practical order of kernel regressor (under the Gaussian assumption). By making use of the property of orthogonal polynomials, the regressor may be expressed as a linear combination of many simple Hermit Estimators, each focusing on one (and only one) orthogonal polynomial.
Ultimately, the two-projection analysis leads to a closed-form formula for two FAQs: ``What error is for a given regressor order?" or ``What order should be adopted in order to achieve a specified error?" Based on simulation on synthetic data (on nonlinear Inverse System Identification), performances of ridge and PR regressors are compared. Several examples on the order/error tradeoff will be highlighted. The issues raised on the outliers also prompt a PPDA classifier, enhancing the inference accuracy by removal of “anti-support” training vectors. Based on simulation on the MIT-BIH ECG dataset, the effectiveness of the proposed methods will be demonstrated.