风控模型-风险预警模型

最近一个朋友在面试一家银行的算法岗,第一轮是take home test,也就是公司发了个测试题,要求候选人回家做完,面试讲解。看来今年市场行情的确不太好,很少高级算法岗会有这么一轮面试。笔者看了下数据,还蛮有意思,正好顺一下建模的pipeline,讲解下WOE和LR的结合应用。

WOE 的应用价值

  • 处理缺失值:利用分箱讲null单独处理,可以讲有效覆盖率只有30%的数据利用起来
  • 处理异常值:利用分箱讲异常值单独处理,增加变量的鲁棒性。例如,年龄由于用户手动填写的,可能存在200这种异常值,可以将这种情况划入age>60的分箱。
  • 业务解释性:业务习惯于线性判断变量的作用,当x越来越大,y就越来越大。但实际x与y之间经常存在着非线性关系,可经过WOE进行变换

建模pipeline

建模的pipeline:

加载数据->EDA->Feature Generation->Model Establishment-> Release online

EDA

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from scipy.stats import chi2_contingency
pd.set_option('display.max_columns', None)
sns.set_theme(style="darkgrid")
sns.set(rc = {'figure.figsize':(20,15)})
dataset.head()
accountNumber customerId creditLimit availableMoney transactionDateTime transactionAmount merchantName acqCountry merchantCountryCode posEntryMode posConditionCode merchantCategoryCode currentExpDate accountOpenDate dateOfLastAddressChange cardCVV enteredCVV cardLast4Digits transactionType echoBuffer currentBalance merchantCity merchantState merchantZip cardPresent posOnPremises recurringAuthInd expirationDateKeyInMatch isFraud transactionDate transactionHour transactionMonth
0 737265056 737265056 5000.0 5000.0 2016-08-13 14:27:32 98.55 Uber US US 02 01 rideshare 06/2023 2015-03-14 2015-03-14 414 414 1803 PURCHASE NaN 0.0 NaN NaN NaN False NaN NaN False False 2016-08-13 14 8
1 737265056 737265056 5000.0 5000.0 2016-10-11 05:05:54 74.51 AMC #191138 US US 09 01 entertainment 02/2024 2015-03-14 2015-03-14 486 486 767 PURCHASE NaN 0.0 NaN NaN NaN True NaN NaN False False 2016-10-11 5 10
2 737265056 737265056 5000.0 5000.0 2016-11-08 09:18:39 7.47 Play Store US US 09 01 mobileapps 08/2025 2015-03-14 2015-03-14 486 486 767 PURCHASE NaN 0.0 NaN NaN NaN False NaN NaN False False 2016-11-08 9 11
3 737265056 737265056 5000.0 5000.0 2016-12-10 02:14:50 7.47 Play Store US US 09 01 mobileapps 08/2025 2015-03-14 2015-03-14 486 486 767 PURCHASE NaN 0.0 NaN NaN NaN False NaN NaN False False 2016-12-10 2 12
4 830329091 830329091 5000.0 5000.0 2016-03-24 21:04:46 71.18 Tim Hortons #947751 US US 02 01 fastfood 10/2029 2015-08-06 2015-08-06 885 885 3143 PURCHASE NaN 0.0 NaN NaN NaN True NaN NaN False False 2016-03-24 21 3
fraud = dataset['isFraud'].value_counts().to_frame()
fraud['pct'] = fraud['isFraud']/fraud['isFraud'].sum()
display(fraud)
isFraud pct
False 773946 0.98421
True 12417 0.01579

这是个极度不平衡的数据集,对于unbalanced的数据集,两个思路,一个是upsampling,一个是downsampling.这个放在其他文章里讲吧,本篇就不涉及了

信用卡刷卡,有时候会因为网络等原因存在多次扣款的情况,数据集里就会存在多条重复记录,需要将这些重复记录删除

dataset = dataset[~(dataset['transactionType'].isin(['REVERSAL']))]
mult_swipe = dataset[dataset.duplicated(keep='first',subset=['customerId','transactionDate','transactionAmount','cardLast4Digits','transactionHour'])]
print('multi-swipe transaction number:{0},amount:{1}'.format(len(mult_swipe),sum(mult_swipe['transactionAmount'])))
dataset = dataset[~(dataset.index.isin(mult_swipe.index))]
multi-swipe transaction number:7565,amount:1076660.0299999956

先看一眼随时间的分布

sns.set_theme(style="darkgrid")
sns.set(rc = {'figure.figsize':(20,15)})
plt.figure(figsize = (15,8))
sns.barplot(data = dataset, x='transactionMonth',y='transactionAmount',estimator=sum)

sns.set_theme(style="darkgrid")
sns.set(rc = {'figure.figsize':(20,15)})
plt.figure(figsize = (15,8))
sns.barplot(data = dataset, x='transactionMonth',y='transactionAmount',estimator=len)

可以看出每个月份的交易金额和交易笔数相差不大

plt.figure(figsize = (15,8))
sns.boxplot(data = dataset,x='transactionMonth',y='transactionAmount',notch=True,showcaps=True,flierprops={'marker':'x'},medianprops={'color':'coral'})

boxplot反映异常值主要是较高金额

处理下null较多的字段

data_null
null_num total pct
accountNumber 0 758495 0.000000
customerId 0 758495 0.000000
creditLimit 0 758495 0.000000
availableMoney 0 758495 0.000000
transactionDateTime 0 758495 0.000000
transactionAmount 0 758495 0.000000
merchantName 0 758495 0.000000
acqCountry 4401 758495 0.005802
merchantCountryCode 703 758495 0.000927
posEntryMode 3904 758495 0.005147
posConditionCode 396 758495 0.000522
merchantCategoryCode 0 758495 0.000000
currentExpDate 0 758495 0.000000
accountOpenDate 0 758495 0.000000
dateOfLastAddressChange 0 758495 0.000000
cardCVV 0 758495 0.000000
enteredCVV 0 758495 0.000000
cardLast4Digits 0 758495 0.000000
transactionType 690 758495 0.000910
echoBuffer 758495 758495 1.000000
currentBalance 0 758495 0.000000
merchantCity 758495 758495 1.000000
merchantState 758495 758495 1.000000
merchantZip 758495 758495 1.000000
cardPresent 0 758495 0.000000
posOnPremises 758495 758495 1.000000
recurringAuthInd 758495 758495 1.000000
expirationDateKeyInMatch 0 758495 0.000000
isFraud 0 758495 0.000000
transactionDate 0 758495 0.000000
transactionHour 0 758495 0.000000
transactionMonth 0 758495 0.000000

remove columns which null_pct >= 0.5

data_df = dataset[data_null[data_null['pct']<0.5].index.tolist()]
data_df = data_df[['customerId', 'creditLimit', 'availableMoney','transactionDateTime','transactionAmount', 'merchantName','acqCountry', 'merchantCountryCode', 'posEntryMode', 'posConditionCode','merchantCategoryCode','accountOpenDate','dateOfLastAddressChange', 'cardCVV', 'enteredCVV', 'cardLast4Digits','transactionType', 'currentBalance','expirationDateKeyInMatch', 'isFraud', 'transactionDate','transactionHour','transactionMonth']]

进一步看下fraud交易和正常交易随时间分布的差异

fig, axes = plt.subplots(3,4)
sns.histplot(ax=axes[0,0],data = data_df[(data_df['transactionMonth']==1)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='1')
sns.histplot(ax=axes[0,1],data = data_df[(data_df['transactionMonth']==2)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='2')
sns.histplot(ax=axes[0,2],data = data_df[(data_df['transactionMonth']==3)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='3')
sns.histplot(ax=axes[0,3],data = data_df[(data_df['transactionMonth']==4)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='4')
sns.histplot(ax=axes[1,0],data = data_df[(data_df['transactionMonth']==5)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='5')
sns.histplot(ax=axes[1,1],data = data_df[(data_df['transactionMonth']==6)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='6')
sns.histplot(ax=axes[1,2],data = data_df[(data_df['transactionMonth']==7)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='7')
sns.histplot(ax=axes[1,3],data = data_df[(data_df['transactionMonth']==8)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='8')
sns.histplot(ax=axes[2,0],data = data_df[(data_df['transactionMonth']==9)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='9')
sns.histplot(ax=axes[2,1],data = data_df[(data_df['transactionMonth']==10)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='10')
sns.histplot(ax=axes[2,2],data = data_df[(data_df['transactionMonth']==11)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='11')
sns.histplot(ax=axes[2,3],data = data_df[(data_df['transactionMonth']==12)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='12')
plt.tight_layout()

fig, axes = plt.subplots(3,4)
sns.histplot(ax=axes[0,0],data = data_df[(data_df['transactionMonth']==1)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='1')
sns.histplot(ax=axes[0,1],data = data_df[(data_df['transactionMonth']==2)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='2')
sns.histplot(ax=axes[0,2],data = data_df[(data_df['transactionMonth']==3)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='3')
sns.histplot(ax=axes[0,3],data = data_df[(data_df['transactionMonth']==4)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='4')
sns.histplot(ax=axes[1,0],data = data_df[(data_df['transactionMonth']==5)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='5')
sns.histplot(ax=axes[1,1],data = data_df[(data_df['transactionMonth']==6)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='6')
sns.histplot(ax=axes[1,2],data = data_df[(data_df['transactionMonth']==7)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='7')
sns.histplot(ax=axes[1,3],data = data_df[(data_df['transactionMonth']==8)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='8')
sns.histplot(ax=axes[2,0],data = data_df[(data_df['transactionMonth']==9)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='9')
sns.histplot(ax=axes[2,1],data = data_df[(data_df['transactionMonth']==10)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='10')
sns.histplot(ax=axes[2,2],data = data_df[(data_df['transactionMonth']==11)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='11')
sns.histplot(ax=axes[2,3],data = data_df[(data_df['transactionMonth']==12)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='12')
plt.tight_layout()

fraud类型交易0-100和100-200区间交易占比较高。正常交易0-100占比明显高于其他区间,且第四季度占比略高于其他季度

plt.figure(figsize = (15,8))
g = sns.FacetGrid(data_df,col='transactionTime_',row='isFraud',margin_titles=True)
g.map(sns.histplot,'transactionAmount',stat='probability',binwidth = 50)
plt.tight_layout()

看着这个比例分布,正常交易对时间段不敏感,无差异。fraud类型的交易下午时间段分布和其他时间段不同,100-200占比最高

cus_df.head()
customerId transactionAmount transactionDate cumpct
0 380680241 4589985.93 31554 0.044210
1 882815134 1842601.52 12665 0.061958
2 570884863 1514931.43 10452 0.076549
3 246251253 1425588.84 9806 0.090280
4 369308035 1012414.42 6928 0.100032
sns.lineplot(x='cus_pct',y='cumpct',data=cus_df )

80%的交易金额由20%的客户贡献

sns.histplot(data = cus_df, x='TA',stat='count',binwidth=10).set(title='TA')

客单价集中在150左右

df_ttl = df[['customerId', 'creditLimit','transactionAmount','acqCountry','merchantCategoryCode','transactionType','currentBalance','transactionMonth','transactionTime_','days_tranz_open','merchantName_', 'CVV', 'days_address', 'isFraud']]
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import roc_curve
from sklearn.model_selection import GridSearchCV
trained_cols = ['customerId', 'creditLimit','transactionAmount','acqCountry','merchantCategoryCode','transactionType','currentBalance','transactionMonth','transactionTime_','days_tranz_open','merchantName_', 'CVV', 'days_address','label']s_train_X,s_test_X,s_train_y,s_test_y = train_test_split(df_ttl[trained_cols],\df_ttl['label'],train_size=0.8,random_state=123)
cus_df = s_train_X.groupby('customerId',as_index=False).agg({'transactionAmount':'sum','acqCountry':'count'})
cus_df = cus_df.rename(columns = {'acqCountry':'Frequency'})
cus_df['TA'] = cus_df['transactionAmount']/cus_df['Frequency']
s_train_X = s_train_X.merge(cus_df[['customerId','Frequency','TA']],on='customerId',how='left')
s_test_X = s_test_X.merge(cus_df[['customerId','Frequency','TA']],on='customerId',how='left')
s_train_X['TA_Tranz'] = s_train_X['transactionAmount'] - s_train_X['TA']
s_test_X['TA_Tranz'] = s_test_X['transactionAmount'] - s_test_X['TA']

feature generation

class WOE(Analysis):@staticmethoddef __perc_share(df,group_name):return df[group_name]/df[group_name].sum()def __calculate_perc_share(self,feat):df = self.group_by_feature(feat)df['perc_good'] = self.__perc_share(df,'good')df['perc_bad'] = self.__perc_share(df,'bad')df['perc_diff'] = df['perc_good'] - df['perc_bad']return dfdef calculate_woe(self,feat):df = self.__calculate_perc_share(feat)df['woe'] = np.log(df['perc_good']/df['perc_bad'])df['woe'] = df['woe'].replace([np.inf,-np.inf],np.nan).fillna(0)return df
class CategoricalFeature():def __init__(self,df,feature):self.df = dfself.feature = feature@propertydef _df_woe(self):df_woe = self.df.copy()df_woe['bin'] = df_woe[self.feature].fillna('missing')return df_woe[['bin','label']]
def draw_woe(woe_df):fig, ax = plt.subplots(figsize=(10,6))sns.barplot(x=woe_df.columns[0], y=woe_df.columns[-2], data=woe_df, palette=sns.cubehelix_palette(len(woe_df),start=0.5,rot=0.75,reverse=True))ax.set_title('WOE visualization for: ' + feature)plt.xticks(rotation=30)plt.show()
def print_iv(woe_df):iv = woe_df['iv'].sum()if iv < 0.02:interpre = 'useless'elif iv < 0.1:interpre = 'weak'elif iv < 0.3:interpre = 'medium'elif iv < 0.5:interpre = 'strong'else:interpre = 'toogood'return iv,interpre

category feature

feature_cat = ['creditLimit','acqCountry', 'transactionType', 'transactionMonth', 'transactionTime_', 'CVV','merchantName_','merchantCategoryCode']
# feature_cat = ['creditLimit']
iv_dic = {}
iv_dic['feature'] = []
iv_dic['iv'] = []
iv_dic['interpretation'] = []
iv_df = pd.DataFrame(iv_dic)
display(iv_df)
feature iv interpretation
0 creditLimit 0.019578 useless
1 acqCountry 0.000674 useless
2 transactionType 0.016641 useless
3 transactionMonth 0.003295 useless
4 transactionTime_ 0.000690 useless
5 CVV 0.004968 useless
6 merchantName_ 0.766754 toogood
7 merchantCategoryCode 0.222249 medium

continuous feature

import scipy.stats as stats
feature_conti = ['transactionAmount','currentBalance','days_tranz_open','days_address', 'Frequency', 'TA','TA_Tranz']
# feature_conti = ['Frequency']
iv_con_dic = {}
iv_con_dic['feature'] = []
iv_con_dic['iv'] = []
iv_con_dic['interpretation'] = []
iv_con_df = pd.DataFrame(iv_con_dic)
iv_con_df
feature iv interpretation
0 transactionAmount 0.377574 strong
1 currentBalance 0.003375 useless
2 days_tranz_open 0.000420 useless
3 days_address 0.003841 useless
4 Frequency 0.023392 weak
5 TA 0.022107 weak
6 TA_Tranz 0.322500 strong
draw_woe(woe_df)

这里用单笔交易与相应客户的平均比单价的差举例,可以看出当差大于10时,相应交易更倾向于fraud

col_model = iv_df[~(iv_df['interpretation'].isin(['useless']))]['feature'].values.tolist()
col_model.extend(iv_con_df[~(iv_con_df['interpretation'].isin(['useless']))]['feature'].values.tolist())
trained_col = []
for col in col_model:trained_col.append('woe'+col)
trained_col
['woemerchantName_','woemerchantCategoryCode','woetransactionAmount','woeFrequency','woeTA','woeTA_Tranz']

后面就使用这6个feature进行建模

Model

利用GridsearchCV对超参数进行遍历寻优

lr_paras = [{'penalty':['l2'],'C':[0.01,0.05,0.1,0.5,1,5,10,50,100],'class_weight':[{0:0.1,1:0.9},{0:0.2,1:0.8},{0:0.3,1:0.7},{0:0.4,1:0.6},{0:0.5,1:0.5},{0:0.6,1:0.4},{0:0.7,1:0.3},{0:0.8,1:0.2},{0:0.9,1:0.1}],'solver':['liblinear'],'multi_class':['ovr']}]modelLR = GridSearchCV(LogisticRegression(tol=1e-6),lr_paras,cv=5,verbose=1)
modelLR.fit(s_train_X,s_train_y)
Fitting 5 folds for each of 81 candidates, totalling 405 fits[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 405 out of 405 | elapsed:  7.0min finishedGridSearchCV(cv=5, estimator=LogisticRegression(tol=1e-06),param_grid=[{'C': [0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100],'class_weight': [{0: 0.1, 1: 0.9}, {0: 0.2, 1: 0.8},{0: 0.3, 1: 0.7}, {0: 0.4, 1: 0.6},{0: 0.5, 1: 0.5}, {0: 0.6, 1: 0.4},{0: 0.7, 1: 0.3}, {0: 0.8, 1: 0.2},{0: 0.9, 1: 0.1}],'multi_class': ['ovr'], 'penalty': ['l2'],'solver': ['liblinear']}],verbose=1)
coef_,intercept_ = LR_(modelLR.best_estimator_,s_train_X,s_train_y)
s_test_X.drop(columns='pred',inplace=True)
s_train_X.drop(columns='pred',inplace=True)
s_y_pred_prob = lr_score(s_test_X,coef_,intercept_)s_y_pred_prob_train = lr_score(s_train_X,coef_,intercept_)

LR可以输出probability,遍历下步长,确定最优的threshold

thre = np.linspace(0,0.5,50)
score_dic = {}
score_dic['thre'] = []
score_dic['score'] = []
for item in thre:s_y_pred_train = [1 if i >= item else 0 for i in s_y_pred_prob_train]score_f1 = f1_score(s_train_y,s_y_pred_train,average='macro')score_dic['thre'].append(item)score_dic['score'].append(score_f1)
s_y_pred_train = [1 if i >= thresh else 0 for i in s_y_pred_prob_train]
s_y_pred = [1 if i >= thresh else 0 for i in s_y_pred_prob]
print('Training set performance')
print(metrics.confusion_matrix(s_train_y, s_y_pred_train))
print(metrics.classification_report(s_train_y, s_y_pred_train))print('Test set performance')
print(metrics.confusion_matrix(s_test_y, s_y_pred))
print(metrics.classification_report(s_test_y, s_y_pred))
Training set performance
[[576091  14082][  7969   1162]]precision    recall  f1-score   support0       0.99      0.98      0.98    5901731       0.08      0.13      0.10      9131accuracy                           0.96    599304macro avg       0.53      0.55      0.54    599304
weighted avg       0.97      0.96      0.97    599304Test set performance
[[144053   3385][  2091    297]]precision    recall  f1-score   support0       0.99      0.98      0.98    1474381       0.08      0.12      0.10      2388accuracy                           0.96    149826macro avg       0.53      0.55      0.54    149826
weighted avg       0.97      0.96      0.97    149826
fpr, tpr, _ = metrics.roc_curve(s_test_y, s_test_X_.pred)
auc = metrics.roc_auc_score(s_test_y, s_test_X_.pred)plt.figure(figsize=(8,6))
sns.lineplot(fpr,tpr,label='Model AUC %0.2f' % auc, color='palevioletred', lw = 2)
plt.plot([0, 1], [0, 1], color='lightgrey', lw=1.5, linestyle='--')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate',fontsize=12)
plt.ylabel('True Positive Rate',fontsize=12)
plt.title('ROC - Test Set',fontsize=13)
plt.legend(loc="lower right",fontsize=12)
plt.rc_context({'axes.edgecolor':'darkgrey','xtick.color':'black','ytick.color':'black','figure.facecolor':'white'})
plt.show()

AUC虽然有0.75,但是F1score只有0.54,模型效果其实是很差的。这主要是极端不平衡数据集带来的影响。同时也说明,对于这种极端不平衡数据集挑选合适的metrics很重要

风控模型-风险预警模型相关推荐

  1. 【2015年第4期】基于大数据技术的P2P网贷平台风险预警模型

    基于大数据技术的P2P网贷平台风险预警模型 林春雨1,李崇纲1,许方圆2,许会泉1,石 磊1,卢祥虎1 (1. 北京金信网银金融信息服务有限公司 北京 100101:2. 国网能源研究院 北京 100 ...

  2. python金融风控评分卡模型和数据分析

    python金融风控评分卡模型和数据分析微专业课(博主录制):http://dwz.date/b9vv 作者Toby:持牌照消费金融模型专家,和中科院,中科大教授保持长期项目合作:和同盾,聚信立等外部 ...

  3. python金融风控评分卡模型

    python金融风控评分卡模型和数据分析微专业课(博主录制): [ http://dwz.date/b9vv ](https://study.163.com/series/1202875601.htm ...

  4. 基于随机森林、svm、CNN机器学习的风控欺诈识别模型

    在信息爆炸时代,"信用"已成为越来越重要的无形财产. "数据风控"的实际意义是用DT(Data Technology)识别欺诈,将欺诈防患于未然,然后净化信用体 ...

  5. R语言基于决策树的银行信贷风险预警模型

     引言 我国经济高速发展,个人信贷业务也随着快速发展,而个人信贷业务对提高内需,促进消费也有拉动作用.有正必有反,在个人信贷业务规模不断扩大的同时,信贷的违约等风险问题也日益突出,一定程度上制约着我国 ...

  6. python金融风控评分卡模型和数据分析(加强版)-收藏

    信用评分卡 信用评分是指根据银行客户的各种历史信用资料,利用一定的信用评分模型,得到不同等级的信用分数,根据客户的信用分数,授信者可以通过分析客户按时还款的可能性,据此决定是否给予授信以及授信的额度和 ...

  7. 信贷风控评分卡模型(上)_Give Me Some Credit(技术实现过程)

    本帖是在2019年5月初入门python之时,选取的较为系统的练手案例,主要内容是信用风险计量体系之主体评级模型的开发过程(可用"四张卡"来表示,分别是A卡.B卡.C卡和F卡). ...

  8. R语言vtreat包的mkCrossFrameCExperiment函数交叉验证构建数据处理计划并进行模型训练、通过显著性进行变量筛选(删除相关性较强的变量)、构建多变量模型、转化为分类模型、模型评估

    R语言vtreat包的mkCrossFrameCExperiment函数交叉验证构建数据处理计划并进行模型训练.通过显著性进行变量筛选(删除相关性较强的变量).构建多变量模型.转化为分类模型.模型评估 ...

  9. 为多模型寻找模型最优参数、多模型交叉验证、可视化、指标计算、多模型对比可视化(系数图、误差图、混淆矩阵、校正曲线、ROC曲线、AUC、Accuracy、特异度、灵敏度、PPV、NPV)、结果数据保存

    使用randomsearchcv为多个模型寻找模型最优参数.多模型交叉验证.可视化.指标计算.多模型对比可视化(系数图.误差图.classification_report.混淆矩阵.校正曲线.ROC曲 ...

最新文章

  1. flex 单独一行_Flex布局从了解到使用只需5min
  2. OSPF LSA序列号问题
  3. bootstrap下拉列表与输入框组结合的样式调整
  4. 初学__Python——Python 可重用结构:Python模块
  5. Kaggle实战:点击率预估
  6. IT与业务之间的鸿沟根源
  7. dalsa工业相机8k参数_工业传感器再掀巨浪 | Teledyne 以80亿美元收购FLIR,互补性产品组合又增体量...
  8. MATLAB中用FDATool设计滤波器及使用
  9. javascript数组类型
  10. Linux常用的命令及操作技巧
  11. 机器学习算法中的准确率、精确率、召回率和F值
  12. 复杂类型java对象 — dto数据传输对象
  13. VS2017控制台打印问题
  14. 保存好用的工具---转载
  15. [我研究]看最新会议相关论文感想
  16. 网赚APP资源下载类网站源码
  17. Android 微信授权登陆
  18. 2009 中国协同软件机遇年?
  19. 渐变背景怎么搞?2分钟教你制作渐变背景
  20. 20200227——Spring 框架的设计理念与设计模式分析

热门文章

  1. Bugku web 聪明的php
  2. Python数据科学库(三)
  3. webrtc 踩坑实录
  4. Lisp尺寸标注增加前后缀_迅捷CAD编辑器标注前后缀如何添加 添加方法分享
  5. 程序员接私活那点事01
  6. 如何将Notepad++设置到鼠标右键里去
  7. 【计算机网络】【大体框架】
  8. vue解决输入框自动填充问题
  9. matlab实例——动态心形函数及其涉及的知识点
  10. 一种使用Qt快速绘图的思路