Credit Scoring, Artificial Intelligence and Quantum Machine Learning
COURSE OBJECTIVE
This rigorous course is designed to impart skills necessary for creating and calibrating credit scoring models, including calculating default probabilities (PD) and validating these models. Participants will explore a range of machine learning approaches from traditional methods to quantum and probabilistic techniques, and learn how to leverage artificial intelligence for automating these processes.
Participants will gain proficiency in both conventional and cutting-edge models for credit scoring during the stages of credit admission and monitoring. This includes handling vast datasets to construct comprehensive credit and behavior scoring systems.
The course also delves into advanced data analytics, covering topics like sampling, exploratory analysis, feature engineering, segmentation, and outlier detection.
A variety of machine learning techniques will be discussed—ranging from supervised and unsupervised learning to reinforcement learning—specifically applied to developing tools for credit scoring. Well-established methods like logistic regression and other innovative machine learning techniques such as decision trees, naive Bayes, K-nearest neighbors, LASSO logistic regression, random forests, neural networks, Bayesian networks, Support Vector Machines, and gradient boosting trees will be explored.
The application of deep learning in building robust credit scoring models suitable for banking applications will be covered extensively. This includes the use of various neural network architectures such as feedforward, convolutional, recurrent, and adversarial generative networks, alongside Fermac Risk’s proprietary methodology for managing and interpreting deep learning models to prevent the pitfalls of black box scenarios.
Instruction on tuning hyperparameters, which are crucial for controlling the learning process and optimizing model performance, will be provided along with techniques like grid search, random search, and Bayesian optimization.
The course provides over 20 distinct credit scoring models using different methodologies across multiple programming environments like R, Python, Jupyterlab, Tensorflow, and SAS. This spans models for various credit aspects including origination, behavior, recovery, income, and churn.
Advanced techniques for calibrating risk parameters for the IRB and IFRS 9 PD are included, covering methods from adjustment to central tendency to deep learning models for PD lifetime calibration under IFRS 9.
The curriculum introduces automated machine learning (AutoML), enhancing the ability of risk analysts to develop, scale, and validate high-quality machine learning models efficiently.
Participants will also explore probabilistic machine learning techniques, like Bayesian neural networks, to construct credit scoring models, alongside best practices for model validation, particularly focusing on AI-driven financial tools as per European regulatory standards.
Finally, the course highlights the emerging field of Quantum Machine Learning, discussing its potential to revolutionize financial services through enhanced computational speeds and capabilities using quantum algorithms.
This comprehensive program aims to equip participants with the skills to utilize advanced computing technologies, including quantum and tensor networks, for machine learning calculations, preparing them for significant advancements in the financial sector.
WHO SHOULD ATTEND?
The Course is aimed at professionals from financial institutions interested in developing powerful credit scoring models and calibrating their output, as well as model managers in credit risk and data science departments.
For a better understanding of the topics, the participant must know statistics and mathematics. You can benefit from quantum computing technologies without needing to have knowledge of quantum physics.
AGENDA
Credit Scoring, Artificial Intelligence and Quantum Machine Learning
CREDIT SCORING
Module 0: Quantum Computing and Algorithms
-
Future of quantum computing in banking
-
Is it necessary to know quantum mechanics?
-
QIS Hardware and Apps
-
quantum operations
-
Qubit representation
-
Measurement
-
Overlap
-
matrix multiplication
-
Qubit operations
-
Multiple Quantum Circuits
-
Entanglement
-
Deutsch Algorithm
-
Quantum Fourier transform and search algorithms
-
Hybrid quantum-classical algorithms
-
Quantum annealing, simulation and optimization of algorithms
-
Quantum machine learning algorithms
-
Exercise 1: Quantum operations
Module 1: Artificial Intelligence for Credit Scoring
-
Big Data Definition
-
Big Data in financial institutions and fintech
-
Big data in Bigtech
-
Data typology
-
structured
-
semi-structured
-
Unstructured Data
-
-
Big data: Volume, Velocity, Variety, Veracity and Value
-
Big Data Size
-
Big data sources
-
transactional data
-
social media dating
-
Credit bureau data
-
Origin of data sources
-
The data of the website
-
Text Data
-
sensor data
-
RFID and NFC data
-
Data from telecom operators
-
Smart grid data
-
-
banking digitization
-
financial inclusion
-
Regulation in Europe, USA and Latin America
-
Artificial intelligence in banking
-
Artificial intelligence in the credit cycle
Module 2: AI in Credit Scoring
-
AI in Credit Scoring for Banking and Fintech
-
Offline and online credit scoring
-
Design and Construction of Credit Scoring Models
-
Advantages and disadvantages
-
Models to face new financial crises
-
Machine Learning to develop and validate credit scoring
-
Importance of the Bureau Score
-
Credit Scorecard Management
-
Default Probability Estimation PD
Module 3: Machine Learning
-
Definition of Machine Learning
-
Machine Learning Methodology
-
Data Storage
-
Abstraction
-
Generalization
-
Assessment
-
-
Supervised Learning
-
Unsupervised Learning
-
Reinforcement Learning
-
deep learning
-
Typology of Machine Learning Algorithms
-
Steps to Implement an Algorithm
-
information collection
-
Exploratory Analysis
-
Model Training
-
Model Evaluation
-
Model improvements
-
Machine Learning in Credit Scoring Models
-
Quantum Machine Learning
-
Exploratory Data Analysis (EDA) and Feature Engineering
Module 4: Exploratory Data Analysis
-
Data typology
-
transactional data
-
Unstructured data embedded in text documents
-
Social Media Data
-
data sources
-
Data review
-
Target definition
-
Time horizon of the target variable
-
Sampling
-
Random Sampling
-
Stratified Sampling
-
Rebalanced Sampling
-
-
Exploratory Analysis:
-
histograms
-
Q Q Plot
-
Moment analysis
-
boxplot
-
-
Treatment of Missing values
-
Multivariate Imputation Model
-
Advanced Outlier detection and treatment techniques
-
Univariate technique: winsorized and trimming
-
Multivariate Technique: Mahalanobis Distance
-
Module 5: Feature Engineering
-
Feature Engineering
-
Data Standardization
-
Variable categorization
-
Equal Interval Binning
-
Equal Frequency Binning
-
Chi-Square Test
-
-
binary coding
-
WOE Coding
-
WOE Definition
-
Univariate Analysis with Target Variable
-
Variable Selection
-
Treatment of Continuous Variables
-
Treatment of Categorical Variables
-
Using Gini
-
Information Value
-
Optimization of continuous variables
-
Optimization of categorical variables
-
-
Exercise 1: Exploratory Analysis in R
-
Exercise 2: Detection and Treatment of Advanced Outliers
-
Exercise 3: Stratified and Random Sampling in R
-
Exercise 4: Multivariate imputation model
-
Exercise 5: Univariate analysis in percentiles in R
-
Exercise 6: Continuous variable optimal univariate analysis in Excel
-
Exercise 7: Estimation of the KS, Gini, and IV of each variable in Excel
-
Exercise 8: Word Cloud analysis of variables in R
MACHINE LEARNING
Unsupervised Learning
Module 6: Unsupervised models
-
Hierarchical Clusters
-
K Means
-
standard algorithm
-
Euclidean distance
-
Principal Component Analysis (PCA)
-
Advanced PCA Visualization
-
Eigenvectors and Eigenvalues
-
Exercise 14: Core components in R and SAS
-
Exercise 15: Segmentation of the data with K-Means R
Supervised Learning
Module 7: Logistic Regression and LASSO Regression
-
Econometric Models
-
Logit regression
-
probit regression
-
Piecewise Regression
-
survival models
-
-
Machine Learning Models
-
Lasso Regression
-
Ridge Regression
-
-
Model Risk in Logistic Regression
-
Exercise 16: Credit Scoring Logistic Regression in SAS and R
-
Exercise 17: Credit Scoring Lasso Logistic Regression in R
-
Exercise 18: Model Risk Using Confidence Intervals of Logistic Regression Coefficients
Module 8: Trees, KNN and Naive Bayes
-
Decision Trees
-
modeling
-
Advantages and disadvantages
-
Recursion and Partitioning Processes
-
Recursive partitioning tree
-
Pruning Decision tree
-
Conditional inference tree
-
tree display
-
Measurement of decision tree prediction
-
CHAID model
-
Model C5.0
-
-
K-Nearest Neighbors KNN
-
modeling
-
Advantages and disadvantages
-
Euclidean distance
-
Distance Manhattan
-
K value selection
-
-
Probabilistic Model: Naive Bayes
-
naive bayes
-
Bayes' theorem
-
Laplace estimator
-
Classification with Naive Bayes
-
Advantages and disadvantages
-
-
Exercise 19: Credit Scoring Decision Tree in SAS and R
-
Exercise 20: Credit Scoring KNN in R and SAS
-
Exercise 21: Credit Scoring Naive Bayes in R
Module 9: Support Vector Machine SVM
-
SVM with dummy variables
-
SVM
-
optimal hyperplane
-
Support Vectors
-
add costs
-
Advantages and disadvantages
-
SVM visualization
-
Tuning SVM
-
kernel trick
-
Exercise 22: Credit Scoring Support Vector Machine in R data 1
-
Exercise 23: Credit Scoring Support Vector Machine in Python data 2
Module 10: Ensemble Learning
-
set models
-
bagging
-
bagging trees
-
Random Forest
-
Boosting
-
adaboost
-
Gradient Boosting Trees
-
Advantages and disadvantages
-
Exercise 24: Credit Scoring Boosting in R
-
Exercise 25: Credit Scoring Bagging in R
-
Exercise 26: Credit Scoring Random Forest, R and Python, data 1 and 2
-
Exercise 27: Credit Scoring Gradient Boosting Trees
MODEL VALIDATION
Module 11: Validation of traditional and Machine Learning models
-
Model validation
-
Validation of machine learning models
-
Regulatory validation of machine learning models in Europe
-
Out of Sample and Out of time validation
-
Checking p-values in regressions
-
R squared, MSE, MAD
-
Waste diagnosis
-
Goodness of Fit Test
-
multicollinearity
-
Binary case confusion matrix
-
Multinomial case confusion matrix
-
Main discriminant power tests
-
confidence intervals
-
Jackknifing with discriminant power test
-
Bootstrapping with discriminant power test
-
Kappa statistic
-
K-Fold Cross Validation
-
Exercise 28: Logistic Regression Goodness-of-Fit Test
-
Exercise 29: Cross validation in SAS
-
Exercise 30: Gini Estimation, Information Value, Brier Score, Lift Curve, CAP, ROC, Divergence in SAS and Excel
-
Exercise 31: Bootstrapping of SAS parameters
-
Exercise 32: Jackkinifng in SAS
-
Exercise 33: Gini/ROC Bootstrapping in SAS
-
Exercise 34: Kappa estimation
-
Exercise 35: K-Fold Cross Validation in R
-
Exercise 36: Traffic light validation out of time (horizon 6 years) of Logistics and Machine Learning models
Module 12: Stability Testing
-
Model stability index
-
Factor stability index
-
Xi-square test
-
K-S test
-
Exercise 37: Stability tests of models and factors
DEEP LEARNING
Module 14: Introduction to Deep Learning
-
Definition and concept of deep learning
-
Why now the use of deep learning?
-
Artificial neural networks
-
Neural network architectures
-
activation function
-
sigmoidal
-
Rectified linear unit
-
hypertangent
-
Softmax
-
-
feedforward network
-
Multilayer Perceptron
-
Using Tensorflow
-
Using Tensorboard
-
R deep learning
-
Python deep learning
-
Convolutional Neural Networks
-
Use of deep learning in image classification
-
cost function
-
Gradient descending optimization
-
Using deep learning for credit scoring
-
How many hidden layers?
-
How many neurons, 100, 1000?
-
How many times and size of the batch size?
-
What is the best activation function?
-
-
Deep Learning Software: Caffe, H20, Keras, Microsoft, Matlab, etc.
-
Deployment software: Nvidia and Cuda
-
Hardware, CPU, GPU and cloud environments
-
Advantages and disadvantages of deep learning
Module 15: Deep Learning Feed Forward Neural Networks
-
Single Layer Perceptron
-
Multiple Layer Perceptron
-
Neural network architectures
-
activation function
-
sigmoidal
-
Rectified linear unit (Relu)
-
The U
-
Selu
-
hyperbolic hypertangent
-
Softmax
-
other
-
-
Back propagation
-
Directional derivatives
-
gradients
-
Jacobians
-
Chain rule
-
Optimization and local and global minima
-
-
Exercise 38: Credit Scoring using Deep Learning Feed Forward
Module 16: Deep Learning Convolutional Neural Networks CNN
-
CNN for pictures
-
Design and architectures
-
convolution operation
-
descending gradient
-
filters
-
strider
-
padding
-
Subsampling
-
pooling
-
fully connected
-
Credit Scoring using CNN
-
Recent CNN studies applied to credit risk and scoring
-
Exercise 39: Credit scoring using deep learning CNN
Module 17: Deep Learning Recurrent Neural Networks RNN
-
Natural Language Processing
-
Natural Language Processing (NLP) text classification
-
Long Term Short Term Memory (LSTM)
-
hopfield
-
Bidirectional associative memory
-
descending gradient
-
Global optimization methods
-
RNN and LSTM for credit scoring
-
One-way and two-way models
-
Deep Bidirectional Transformers for Language Understanding
-
Exercise 40: Credit Scoring using Deep Learning LSTM
Module 18: Generative Adversarial Networks (GANs)
-
Generative Adversarial Networks (GANs)
-
Fundamental components of the GANs
-
GAN architectures
-
Bidirectional GAN
-
Training generative models
-
Synthetic Data
-
Credit Scoring using GANs
-
Exercise 41: Credit Scoring using GANs
Module 19: Calibrating Machine Learning and Deep Learning
-
hyperparameterization
-
grid search
-
random search
-
Bayesian Optimization
-
Train test split ratio
-
Learning rate in optimization algorithms (e.g. gradient descent)
-
Selection of optimization algorithm (e.g., gradient descent, stochastic gradient descent, or Adam optimizer)
-
Activation function selection in a (nn) layer neural network (e.g. Sigmoid, ReLU, Tanh)
-
Selection of loss, cost and custom function
-
Number of hidden layers in an NN
-
Number of activation units in each layer
-
The drop-out rate in nn (dropout probability)
-
Number of iterations (epochs) in training a nn
-
Number of clusters in a clustering task
-
Kernel or filter size in convolutional layers
-
pooling size
-
batch size
-
Exercise 42: Optimization Credit Scoring Xboosting, Random forest and SVM
-
Exercise 43: Optimized Credit Scoring Deep Learning
Module 20: Traditional Scorecard Construction
-
scoring assignment
-
Scorecard Classification
-
Scorecard WOE
-
Binary Scorecard
-
Continuous Scorecard
-
-
Scorecard Rescaling
-
Factor and Offset Analysis
-
Scorecard WOE
-
Binary Scorecard
-
-
Reject Inference Techniques
-
cut-off
-
parceling
-
Fuzzy Augmentation
-
Machine Learning
-
-
Advanced Cut Point Techniques
-
Cut-off optimization using ROC curves
-
-
Exercise 44: Building Scorecard in Excel, R and Python
-
Exercise 45: Optimum cut-off point estimation in Excel and model risk by cut-off point selection
-
Exercise 46: Confusion matrix to verify Type 1 and Type 2 Error in Excel with and without variables
QUANTUM MACHINE LEARNING
Module 21: Quantum Credit Scoring
-
What is quantum machine learning?
-
Qubit and Quantum States
-
Quantum Automatic Machine Algorithms
-
quantum circuits
-
quantum k means
-
Support Vector Machine
-
Support Vector Quantum Machine
-
Variational quantum classifier
-
Training quantum machine learning models
-
Quantum Neural Networks
-
Quantum GAN
-
Quantum Boltzmann machines
-
Quantum machine learning in Credit Risk
-
Quantum machine learning in credit scoring
-
quantum software
-
Exercise 47: Quantum K-means
-
Exercise 48: Quantum Support Vector Machine to develop credit scoring model
-
Exercise 49: Quantum feed forward Neural Networks to develop a credit scoring model
-
Exercise 50: Quantum Convoluted Neural Networks to develop a credit scoring model
Module 22: Tensor Networks for Quantum Machine Learning
-
What are tensor networks?
-
Quantum Entanglement
-
Tensor networks in machine learning
-
Tensor networks in unsupervised models
-
Tensor networks in SVM
-
Tensor networks in NN
-
NN tensioning
-
Application of tensor networks in credit scoring models
-
Exercise 51: Construction of credit scoring using tensor networks
PROBABILISTIC MACHINE LEARNING
Module 23: Probabilistic Machine Learning
-
Introduction to probabilistic machine learning
-
Gaussian models
-
Bayesian Statistics
-
Bayesian logistic regression
-
Kernel family
-
Gaussian processes
-
Gaussian processes for regression
-
-
Hidden Markov Model
-
Markov chain Monte Carlo (MCMC)
-
Metropolis Hastings algorithm
-
-
Machine Learning Probabilistic Model
-
Bayesian Boosting
-
Bayesian Neural Networks
-
Exercise 52: Gaussian process for regression
-
Exercise 53: Credit scoring model using Bayesian Neural Networks
MODEL RISK
Module 24: Model Risk in Credit Scoring
-
Model Risk
-
Model risk in deep learning
-
Model risk in credit scoring
-
black boxes
-
cut-off decision
-
absence of data
-
Model Risk for not updating or recalibrating
-
Ethical concepts of credit scoring
-
Exercise 54: Model risk in credit scoring due to not recalibrating on time
CREDIT SCORING MODELS
Module 25: Credit Scoring Models by Product
-
Origination Credit Scoring
-
Credit Card Score
-
Mortgage Score
-
consumption scores
-
Car Score
-
-
Behavior Score (BS)
-
Temporal horizon
-
Dashboard data information
-
Panel data regression
-
Cox regression
-
Behavior Score with macroeconomic variables
-
transition matrices
-
Behavior Score with transition matrices
-
Transaction Score
-
Machine Learning Models
-
BEHAVIOR SCORE ON CREDIT CARDS
-
-
Exercise 55: Behavior Score Logistic Regression in Python data 2
-
Exercise 56: Behavior Score Support Vector Machines in python
-
Exercise 57: Behavior Score Random Forest in python
-
Exercise 58: Behavior Score Gradient Boosting Trees in python
-
Exercise 59: Behavior Score Deep Learning LSTM in python
Module 26: Typology of Scoring models
-
Response Score
-
Income score
-
Churn Score
-
Origination Fraud Score
-
Behavior Fraud Score
-
Collection Score
-
Recovery Score
-
Big Data Scoring
-
Exercise 60: Fraud Score with neural networks
-
Exercise 61: Income Score
-
Exercise 62: Collection Score
-
Exercise 63: Recovery Score
-
Exercise 64: Quit Score
CALIBRATION OF PD MODELS
Module 27: Calibration of the Probability of Default PD IRB
-
PD estimation
-
econometric models
-
Machine Learning Models
-
Data requirement
-
Risk drivers and credit scoring criteria
-
Rating philosophy
-
Pool Treatment
-
-
PD Calibration
-
Default Definition
-
Long run average for PD
-
Technical defaults and technical default filters
-
Data requirement
-
One Year Default Rate Calculation
-
Long-Term Default Rate Calculation
-
-
PD Model Risk
-
Conservatism Margin
-
-
PD Calibration Techniques
-
Anchor Point Estimate
-
Mapping from Score to PD
-
Adjustment to the PD Economic Cycle
-
Rating Philosophy
-
-
PD Trough The Cycle (PD TTC) models
-
PD Point in Time PD (PD PIT ) models
-
PD Calibration of Models Using Machine and Deep Learning
-
Exercise 65: PD Calibration in Machine Learning Models
Module 28: Machine Learning models to estimate Lifetime PD under IFRS 9
-
Credit scoring models to estimate Lifetime PD
-
PD Lifetime in IFRS 9
-
Impact of COVID-19 on models
-
Climate Risk Impact
-
Inflation impact
-
Impact of rising prices
-
Regression Models
-
Logistic regression
-
Logistic Multinomial Regression
-
Ordinal Probit Regression
-
-
VAR and VEC models
-
Machine Learning Model
-
SVM: Kernel Function Definition
-
Neural Network: definition of hyperparameters and activation function
-
deep learning
-
LSTM
-
-
PD Calibration of Models Using Machine and Deep Learning
-
Exercise 66: PD Lifetime using logistic regression
-
Exercise 67: PD Lifetime using multinomial regression in R
-
Exercise 68: PD Lifetime using SVM in Python
-
Exercise 69: PD Lifetime using Deep Learning in Python
-
Exercise 70: PD Lifetime using Deep Learning LSTM in Python
VALIDATION OF PD MODELS
Module 29: Validation of PD models
-
Definition of PD Backtesting
-
PD Calibration Validation
-
normal test
-
Binomial Test
-
Traffic Light Approach
-
-
Traffic Light Analysis and PD Dashboard
-
PS Stability Test
-
Forecasting PD vs. Real PD in time
-
When should we recalibrate or reestimate a credit scoring model?
-
Re-development
-
Re-estimation
-
Model Risk in PD
-
Machine Learning to validate PD models
-
Artificial Intelligence to recalibrate and rebuild models autonomously
-
Exercise 71: Backtesting PD in Excel
-
Exercise 72: Forecasting PD and actual PD in Excel
AUTOMATION OF CREDIT SCORING AND PD WITH AI
Module 30: Automation of Credit Scoring and PD Modeling
-
What is modeling automation?
-
that is automated
-
Automation of machine learning processes
-
Optimizers and Evaluators
-
Modeling Automation Workflow Components
-
Summary
-
Indicted
-
Feature engineering
-
Model generation
-
Assessment
-
-
Hyperparameter optimization
-
Reconstruction or recalibration of credit scoring
-
Credit Scoring Modeling
-
Main milestones
-
Evaluation and optimization
-
Possible Issues
-
-
PD calibration modeling
-
Evaluation and optimization
-
backtesting
-
Discriminating Power
-
Stability Tests
-
-
Global evaluation of modeling automation
-
Implementation of modeling automation in banking
-
Technological requirements
-
available tools
-
Benefits and possible ROI estimation
-
Main Issues
-
Model Risk
-
Genetic algorithms
-
Exercise 73: Automation of the modeling, optimization and validation of credit scoring hyperparametry
-
Exercise 74: Automation of PD modeling and validation