Predicting Credit Default by C5.0 Decision Tree Algorithm

There are a variety of situations, in which clients need to predict non-numerical factors. For instance, banks might be interested in predicting whether a loan will be default or non-default based on a variety of attributes of a borrower. Another example would be a mailing filter that predicts whether an email is a regular mail or a spam mail. Or, doctors might want to predict a specific disease based on a number of symptoms. In order to do this, we can apply a C5.0 decision tree, a very popular algorithm that was originally created by computer scientist J. Ross Quinlan. Of course, there are other machine learning techniques such as topological analysis or neural networks, which perform much better in many cases, but the C5.0 algorithm is very accurate and practical, easy to understand and applicable to a variety of problems.

In the following analysis I will give the C5.0 algorithm a try in order to predict credit default/non-default. In doing so, I will apply the algorithm to a dataset on default and non-default home equity loans provided by Credit Risk Analytics. The dataset includes the variables “default/non-default”, “credit amount”, “property”, “amount due on existing mortgage”, “job”, “years at present job”, “number of derogatory reports”, “number of delinquent credit lines”, “oldest credit line (in months)”, “number of credit inquiries”, “number of credit lines” and “debt-to-income ratio”.

1 comment on “Advanced Data Systems Analysis (ADSA)”

Advanced Data Systems Analysis (ADSA)

matrix-1013611_1280

CM is collaborating with an international team of data scientists to develop and implement the Advanced Data Systems Analysis (ADSA), an algorithm to rank and cluster multi-dimensional (multi-sectoral) items. ADSA provides a wide range of advantages compared to other tools currently prevalent on the market and can be implemented in a variety of forms including consultancy and research papers, cloud-based software applications or as an algorithm that can be integrated into a company’s own data analysis section or automated industrial systems.

The areas of application include such fields as financial risk analysis, pricing optimization, product positioning, credit scoring, political risk analysis, or emotion scoring. Moreover, automated systems such as manufacturing and production processes can be equipped with the ranking and cluster algorithm.