风控模型-风险预警模型
风控模型-风险预警模型
最近一个朋友在面试一家银行的算法岗,第一轮是take home test,也就是公司发了个测试题,要求候选人回家做完,面试讲解。看来今年市场行情的确不太好,很少高级算法岗会有这么一轮面试。笔者看了下数据,还蛮有意思,正好顺一下建模的pipeline,讲解下WOE和LR的结合应用。
WOE 的应用价值
- 处理缺失值:利用分箱讲null单独处理,可以讲有效覆盖率只有30%的数据利用起来
- 处理异常值:利用分箱讲异常值单独处理,增加变量的鲁棒性。例如,年龄由于用户手动填写的,可能存在200这种异常值,可以将这种情况划入age>60的分箱。
- 业务解释性:业务习惯于线性判断变量的作用,当x越来越大,y就越来越大。但实际x与y之间经常存在着非线性关系,可经过WOE进行变换
建模pipeline
建模的pipeline:
加载数据->EDA->Feature Generation->Model Establishment-> Release online
EDA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
from scipy.stats import chi2_contingency
pd.set_option('display.max_columns', None)
sns.set_theme(style="darkgrid")
sns.set(rc = {'figure.figsize':(20,15)})
dataset.head()
accountNumber | customerId | creditLimit | availableMoney | transactionDateTime | transactionAmount | merchantName | acqCountry | merchantCountryCode | posEntryMode | posConditionCode | merchantCategoryCode | currentExpDate | accountOpenDate | dateOfLastAddressChange | cardCVV | enteredCVV | cardLast4Digits | transactionType | echoBuffer | currentBalance | merchantCity | merchantState | merchantZip | cardPresent | posOnPremises | recurringAuthInd | expirationDateKeyInMatch | isFraud | transactionDate | transactionHour | transactionMonth | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 737265056 | 737265056 | 5000.0 | 5000.0 | 2016-08-13 14:27:32 | 98.55 | Uber | US | US | 02 | 01 | rideshare | 06/2023 | 2015-03-14 | 2015-03-14 | 414 | 414 | 1803 | PURCHASE | NaN | 0.0 | NaN | NaN | NaN | False | NaN | NaN | False | False | 2016-08-13 | 14 | 8 |
1 | 737265056 | 737265056 | 5000.0 | 5000.0 | 2016-10-11 05:05:54 | 74.51 | AMC #191138 | US | US | 09 | 01 | entertainment | 02/2024 | 2015-03-14 | 2015-03-14 | 486 | 486 | 767 | PURCHASE | NaN | 0.0 | NaN | NaN | NaN | True | NaN | NaN | False | False | 2016-10-11 | 5 | 10 |
2 | 737265056 | 737265056 | 5000.0 | 5000.0 | 2016-11-08 09:18:39 | 7.47 | Play Store | US | US | 09 | 01 | mobileapps | 08/2025 | 2015-03-14 | 2015-03-14 | 486 | 486 | 767 | PURCHASE | NaN | 0.0 | NaN | NaN | NaN | False | NaN | NaN | False | False | 2016-11-08 | 9 | 11 |
3 | 737265056 | 737265056 | 5000.0 | 5000.0 | 2016-12-10 02:14:50 | 7.47 | Play Store | US | US | 09 | 01 | mobileapps | 08/2025 | 2015-03-14 | 2015-03-14 | 486 | 486 | 767 | PURCHASE | NaN | 0.0 | NaN | NaN | NaN | False | NaN | NaN | False | False | 2016-12-10 | 2 | 12 |
4 | 830329091 | 830329091 | 5000.0 | 5000.0 | 2016-03-24 21:04:46 | 71.18 | Tim Hortons #947751 | US | US | 02 | 01 | fastfood | 10/2029 | 2015-08-06 | 2015-08-06 | 885 | 885 | 3143 | PURCHASE | NaN | 0.0 | NaN | NaN | NaN | True | NaN | NaN | False | False | 2016-03-24 | 21 | 3 |
fraud = dataset['isFraud'].value_counts().to_frame()
fraud['pct'] = fraud['isFraud']/fraud['isFraud'].sum()
display(fraud)
isFraud | pct | |
---|---|---|
False | 773946 | 0.98421 |
True | 12417 | 0.01579 |
这是个极度不平衡的数据集,对于unbalanced的数据集,两个思路,一个是upsampling,一个是downsampling.这个放在其他文章里讲吧,本篇就不涉及了
信用卡刷卡,有时候会因为网络等原因存在多次扣款的情况,数据集里就会存在多条重复记录,需要将这些重复记录删除
dataset = dataset[~(dataset['transactionType'].isin(['REVERSAL']))]
mult_swipe = dataset[dataset.duplicated(keep='first',subset=['customerId','transactionDate','transactionAmount','cardLast4Digits','transactionHour'])]
print('multi-swipe transaction number:{0},amount:{1}'.format(len(mult_swipe),sum(mult_swipe['transactionAmount'])))
dataset = dataset[~(dataset.index.isin(mult_swipe.index))]
multi-swipe transaction number:7565,amount:1076660.0299999956
先看一眼随时间的分布
sns.set_theme(style="darkgrid")
sns.set(rc = {'figure.figsize':(20,15)})
plt.figure(figsize = (15,8))
sns.barplot(data = dataset, x='transactionMonth',y='transactionAmount',estimator=sum)
sns.set_theme(style="darkgrid")
sns.set(rc = {'figure.figsize':(20,15)})
plt.figure(figsize = (15,8))
sns.barplot(data = dataset, x='transactionMonth',y='transactionAmount',estimator=len)
可以看出每个月份的交易金额和交易笔数相差不大
plt.figure(figsize = (15,8))
sns.boxplot(data = dataset,x='transactionMonth',y='transactionAmount',notch=True,showcaps=True,flierprops={'marker':'x'},medianprops={'color':'coral'})
boxplot反映异常值主要是较高金额
处理下null较多的字段
data_null
null_num | total | pct | |
---|---|---|---|
accountNumber | 0 | 758495 | 0.000000 |
customerId | 0 | 758495 | 0.000000 |
creditLimit | 0 | 758495 | 0.000000 |
availableMoney | 0 | 758495 | 0.000000 |
transactionDateTime | 0 | 758495 | 0.000000 |
transactionAmount | 0 | 758495 | 0.000000 |
merchantName | 0 | 758495 | 0.000000 |
acqCountry | 4401 | 758495 | 0.005802 |
merchantCountryCode | 703 | 758495 | 0.000927 |
posEntryMode | 3904 | 758495 | 0.005147 |
posConditionCode | 396 | 758495 | 0.000522 |
merchantCategoryCode | 0 | 758495 | 0.000000 |
currentExpDate | 0 | 758495 | 0.000000 |
accountOpenDate | 0 | 758495 | 0.000000 |
dateOfLastAddressChange | 0 | 758495 | 0.000000 |
cardCVV | 0 | 758495 | 0.000000 |
enteredCVV | 0 | 758495 | 0.000000 |
cardLast4Digits | 0 | 758495 | 0.000000 |
transactionType | 690 | 758495 | 0.000910 |
echoBuffer | 758495 | 758495 | 1.000000 |
currentBalance | 0 | 758495 | 0.000000 |
merchantCity | 758495 | 758495 | 1.000000 |
merchantState | 758495 | 758495 | 1.000000 |
merchantZip | 758495 | 758495 | 1.000000 |
cardPresent | 0 | 758495 | 0.000000 |
posOnPremises | 758495 | 758495 | 1.000000 |
recurringAuthInd | 758495 | 758495 | 1.000000 |
expirationDateKeyInMatch | 0 | 758495 | 0.000000 |
isFraud | 0 | 758495 | 0.000000 |
transactionDate | 0 | 758495 | 0.000000 |
transactionHour | 0 | 758495 | 0.000000 |
transactionMonth | 0 | 758495 | 0.000000 |
remove columns which null_pct >= 0.5
data_df = dataset[data_null[data_null['pct']<0.5].index.tolist()]
data_df = data_df[['customerId', 'creditLimit', 'availableMoney','transactionDateTime','transactionAmount', 'merchantName','acqCountry', 'merchantCountryCode', 'posEntryMode', 'posConditionCode','merchantCategoryCode','accountOpenDate','dateOfLastAddressChange', 'cardCVV', 'enteredCVV', 'cardLast4Digits','transactionType', 'currentBalance','expirationDateKeyInMatch', 'isFraud', 'transactionDate','transactionHour','transactionMonth']]
进一步看下fraud交易和正常交易随时间分布的差异
fig, axes = plt.subplots(3,4)
sns.histplot(ax=axes[0,0],data = data_df[(data_df['transactionMonth']==1)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='1')
sns.histplot(ax=axes[0,1],data = data_df[(data_df['transactionMonth']==2)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='2')
sns.histplot(ax=axes[0,2],data = data_df[(data_df['transactionMonth']==3)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='3')
sns.histplot(ax=axes[0,3],data = data_df[(data_df['transactionMonth']==4)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='4')
sns.histplot(ax=axes[1,0],data = data_df[(data_df['transactionMonth']==5)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='5')
sns.histplot(ax=axes[1,1],data = data_df[(data_df['transactionMonth']==6)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='6')
sns.histplot(ax=axes[1,2],data = data_df[(data_df['transactionMonth']==7)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='7')
sns.histplot(ax=axes[1,3],data = data_df[(data_df['transactionMonth']==8)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='8')
sns.histplot(ax=axes[2,0],data = data_df[(data_df['transactionMonth']==9)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='9')
sns.histplot(ax=axes[2,1],data = data_df[(data_df['transactionMonth']==10)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='10')
sns.histplot(ax=axes[2,2],data = data_df[(data_df['transactionMonth']==11)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='11')
sns.histplot(ax=axes[2,3],data = data_df[(data_df['transactionMonth']==12)&(data_df['isFraud']==True)], x='transactionAmount',binwidth = 100,stat='probability').set(title='12')
plt.tight_layout()
fig, axes = plt.subplots(3,4)
sns.histplot(ax=axes[0,0],data = data_df[(data_df['transactionMonth']==1)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='1')
sns.histplot(ax=axes[0,1],data = data_df[(data_df['transactionMonth']==2)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='2')
sns.histplot(ax=axes[0,2],data = data_df[(data_df['transactionMonth']==3)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='3')
sns.histplot(ax=axes[0,3],data = data_df[(data_df['transactionMonth']==4)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='4')
sns.histplot(ax=axes[1,0],data = data_df[(data_df['transactionMonth']==5)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='5')
sns.histplot(ax=axes[1,1],data = data_df[(data_df['transactionMonth']==6)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='6')
sns.histplot(ax=axes[1,2],data = data_df[(data_df['transactionMonth']==7)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='7')
sns.histplot(ax=axes[1,3],data = data_df[(data_df['transactionMonth']==8)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='8')
sns.histplot(ax=axes[2,0],data = data_df[(data_df['transactionMonth']==9)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='9')
sns.histplot(ax=axes[2,1],data = data_df[(data_df['transactionMonth']==10)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='10')
sns.histplot(ax=axes[2,2],data = data_df[(data_df['transactionMonth']==11)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='11')
sns.histplot(ax=axes[2,3],data = data_df[(data_df['transactionMonth']==12)&(data_df['isFraud']==False)], x='transactionAmount',binwidth = 100,stat='probability').set(title='12')
plt.tight_layout()
fraud类型交易0-100和100-200区间交易占比较高。正常交易0-100占比明显高于其他区间,且第四季度占比略高于其他季度
plt.figure(figsize = (15,8))
g = sns.FacetGrid(data_df,col='transactionTime_',row='isFraud',margin_titles=True)
g.map(sns.histplot,'transactionAmount',stat='probability',binwidth = 50)
plt.tight_layout()
看着这个比例分布,正常交易对时间段不敏感,无差异。fraud类型的交易下午时间段分布和其他时间段不同,100-200占比最高
cus_df.head()
customerId | transactionAmount | transactionDate | cumpct | |
---|---|---|---|---|
0 | 380680241 | 4589985.93 | 31554 | 0.044210 |
1 | 882815134 | 1842601.52 | 12665 | 0.061958 |
2 | 570884863 | 1514931.43 | 10452 | 0.076549 |
3 | 246251253 | 1425588.84 | 9806 | 0.090280 |
4 | 369308035 | 1012414.42 | 6928 | 0.100032 |
sns.lineplot(x='cus_pct',y='cumpct',data=cus_df )
80%的交易金额由20%的客户贡献
sns.histplot(data = cus_df, x='TA',stat='count',binwidth=10).set(title='TA')
客单价集中在150左右
df_ttl = df[['customerId', 'creditLimit','transactionAmount','acqCountry','merchantCategoryCode','transactionType','currentBalance','transactionMonth','transactionTime_','days_tranz_open','merchantName_', 'CVV', 'days_address', 'isFraud']]
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import roc_curve
from sklearn.model_selection import GridSearchCV
trained_cols = ['customerId', 'creditLimit','transactionAmount','acqCountry','merchantCategoryCode','transactionType','currentBalance','transactionMonth','transactionTime_','days_tranz_open','merchantName_', 'CVV', 'days_address','label']s_train_X,s_test_X,s_train_y,s_test_y = train_test_split(df_ttl[trained_cols],\df_ttl['label'],train_size=0.8,random_state=123)
cus_df = s_train_X.groupby('customerId',as_index=False).agg({'transactionAmount':'sum','acqCountry':'count'})
cus_df = cus_df.rename(columns = {'acqCountry':'Frequency'})
cus_df['TA'] = cus_df['transactionAmount']/cus_df['Frequency']
s_train_X = s_train_X.merge(cus_df[['customerId','Frequency','TA']],on='customerId',how='left')
s_test_X = s_test_X.merge(cus_df[['customerId','Frequency','TA']],on='customerId',how='left')
s_train_X['TA_Tranz'] = s_train_X['transactionAmount'] - s_train_X['TA']
s_test_X['TA_Tranz'] = s_test_X['transactionAmount'] - s_test_X['TA']
feature generation
class WOE(Analysis):@staticmethoddef __perc_share(df,group_name):return df[group_name]/df[group_name].sum()def __calculate_perc_share(self,feat):df = self.group_by_feature(feat)df['perc_good'] = self.__perc_share(df,'good')df['perc_bad'] = self.__perc_share(df,'bad')df['perc_diff'] = df['perc_good'] - df['perc_bad']return dfdef calculate_woe(self,feat):df = self.__calculate_perc_share(feat)df['woe'] = np.log(df['perc_good']/df['perc_bad'])df['woe'] = df['woe'].replace([np.inf,-np.inf],np.nan).fillna(0)return df
class CategoricalFeature():def __init__(self,df,feature):self.df = dfself.feature = feature@propertydef _df_woe(self):df_woe = self.df.copy()df_woe['bin'] = df_woe[self.feature].fillna('missing')return df_woe[['bin','label']]
def draw_woe(woe_df):fig, ax = plt.subplots(figsize=(10,6))sns.barplot(x=woe_df.columns[0], y=woe_df.columns[-2], data=woe_df, palette=sns.cubehelix_palette(len(woe_df),start=0.5,rot=0.75,reverse=True))ax.set_title('WOE visualization for: ' + feature)plt.xticks(rotation=30)plt.show()
def print_iv(woe_df):iv = woe_df['iv'].sum()if iv < 0.02:interpre = 'useless'elif iv < 0.1:interpre = 'weak'elif iv < 0.3:interpre = 'medium'elif iv < 0.5:interpre = 'strong'else:interpre = 'toogood'return iv,interpre
category feature
feature_cat = ['creditLimit','acqCountry', 'transactionType', 'transactionMonth', 'transactionTime_', 'CVV','merchantName_','merchantCategoryCode']
# feature_cat = ['creditLimit']
iv_dic = {}
iv_dic['feature'] = []
iv_dic['iv'] = []
iv_dic['interpretation'] = []
iv_df = pd.DataFrame(iv_dic)
display(iv_df)
feature | iv | interpretation | |
---|---|---|---|
0 | creditLimit | 0.019578 | useless |
1 | acqCountry | 0.000674 | useless |
2 | transactionType | 0.016641 | useless |
3 | transactionMonth | 0.003295 | useless |
4 | transactionTime_ | 0.000690 | useless |
5 | CVV | 0.004968 | useless |
6 | merchantName_ | 0.766754 | toogood |
7 | merchantCategoryCode | 0.222249 | medium |
continuous feature
import scipy.stats as stats
feature_conti = ['transactionAmount','currentBalance','days_tranz_open','days_address', 'Frequency', 'TA','TA_Tranz']
# feature_conti = ['Frequency']
iv_con_dic = {}
iv_con_dic['feature'] = []
iv_con_dic['iv'] = []
iv_con_dic['interpretation'] = []
iv_con_df = pd.DataFrame(iv_con_dic)
iv_con_df
feature | iv | interpretation | |
---|---|---|---|
0 | transactionAmount | 0.377574 | strong |
1 | currentBalance | 0.003375 | useless |
2 | days_tranz_open | 0.000420 | useless |
3 | days_address | 0.003841 | useless |
4 | Frequency | 0.023392 | weak |
5 | TA | 0.022107 | weak |
6 | TA_Tranz | 0.322500 | strong |
draw_woe(woe_df)
这里用单笔交易与相应客户的平均比单价的差举例,可以看出当差大于10时,相应交易更倾向于fraud
col_model = iv_df[~(iv_df['interpretation'].isin(['useless']))]['feature'].values.tolist()
col_model.extend(iv_con_df[~(iv_con_df['interpretation'].isin(['useless']))]['feature'].values.tolist())
trained_col = []
for col in col_model:trained_col.append('woe'+col)
trained_col
['woemerchantName_','woemerchantCategoryCode','woetransactionAmount','woeFrequency','woeTA','woeTA_Tranz']
后面就使用这6个feature进行建模
Model
利用GridsearchCV对超参数进行遍历寻优
lr_paras = [{'penalty':['l2'],'C':[0.01,0.05,0.1,0.5,1,5,10,50,100],'class_weight':[{0:0.1,1:0.9},{0:0.2,1:0.8},{0:0.3,1:0.7},{0:0.4,1:0.6},{0:0.5,1:0.5},{0:0.6,1:0.4},{0:0.7,1:0.3},{0:0.8,1:0.2},{0:0.9,1:0.1}],'solver':['liblinear'],'multi_class':['ovr']}]modelLR = GridSearchCV(LogisticRegression(tol=1e-6),lr_paras,cv=5,verbose=1)
modelLR.fit(s_train_X,s_train_y)
Fitting 5 folds for each of 81 candidates, totalling 405 fits[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 405 out of 405 | elapsed: 7.0min finishedGridSearchCV(cv=5, estimator=LogisticRegression(tol=1e-06),param_grid=[{'C': [0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100],'class_weight': [{0: 0.1, 1: 0.9}, {0: 0.2, 1: 0.8},{0: 0.3, 1: 0.7}, {0: 0.4, 1: 0.6},{0: 0.5, 1: 0.5}, {0: 0.6, 1: 0.4},{0: 0.7, 1: 0.3}, {0: 0.8, 1: 0.2},{0: 0.9, 1: 0.1}],'multi_class': ['ovr'], 'penalty': ['l2'],'solver': ['liblinear']}],verbose=1)
coef_,intercept_ = LR_(modelLR.best_estimator_,s_train_X,s_train_y)
s_test_X.drop(columns='pred',inplace=True)
s_train_X.drop(columns='pred',inplace=True)
s_y_pred_prob = lr_score(s_test_X,coef_,intercept_)s_y_pred_prob_train = lr_score(s_train_X,coef_,intercept_)
LR可以输出probability,遍历下步长,确定最优的threshold
thre = np.linspace(0,0.5,50)
score_dic = {}
score_dic['thre'] = []
score_dic['score'] = []
for item in thre:s_y_pred_train = [1 if i >= item else 0 for i in s_y_pred_prob_train]score_f1 = f1_score(s_train_y,s_y_pred_train,average='macro')score_dic['thre'].append(item)score_dic['score'].append(score_f1)
s_y_pred_train = [1 if i >= thresh else 0 for i in s_y_pred_prob_train]
s_y_pred = [1 if i >= thresh else 0 for i in s_y_pred_prob]
print('Training set performance')
print(metrics.confusion_matrix(s_train_y, s_y_pred_train))
print(metrics.classification_report(s_train_y, s_y_pred_train))print('Test set performance')
print(metrics.confusion_matrix(s_test_y, s_y_pred))
print(metrics.classification_report(s_test_y, s_y_pred))
Training set performance
[[576091 14082][ 7969 1162]]precision recall f1-score support0 0.99 0.98 0.98 5901731 0.08 0.13 0.10 9131accuracy 0.96 599304macro avg 0.53 0.55 0.54 599304
weighted avg 0.97 0.96 0.97 599304Test set performance
[[144053 3385][ 2091 297]]precision recall f1-score support0 0.99 0.98 0.98 1474381 0.08 0.12 0.10 2388accuracy 0.96 149826macro avg 0.53 0.55 0.54 149826
weighted avg 0.97 0.96 0.97 149826
fpr, tpr, _ = metrics.roc_curve(s_test_y, s_test_X_.pred)
auc = metrics.roc_auc_score(s_test_y, s_test_X_.pred)plt.figure(figsize=(8,6))
sns.lineplot(fpr,tpr,label='Model AUC %0.2f' % auc, color='palevioletred', lw = 2)
plt.plot([0, 1], [0, 1], color='lightgrey', lw=1.5, linestyle='--')
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate',fontsize=12)
plt.ylabel('True Positive Rate',fontsize=12)
plt.title('ROC - Test Set',fontsize=13)
plt.legend(loc="lower right",fontsize=12)
plt.rc_context({'axes.edgecolor':'darkgrey','xtick.color':'black','ytick.color':'black','figure.facecolor':'white'})
plt.show()
AUC虽然有0.75,但是F1score只有0.54,模型效果其实是很差的。这主要是极端不平衡数据集带来的影响。同时也说明,对于这种极端不平衡数据集挑选合适的metrics很重要
风控模型-风险预警模型相关推荐
- 【2015年第4期】基于大数据技术的P2P网贷平台风险预警模型
基于大数据技术的P2P网贷平台风险预警模型 林春雨1,李崇纲1,许方圆2,许会泉1,石 磊1,卢祥虎1 (1. 北京金信网银金融信息服务有限公司 北京 100101:2. 国网能源研究院 北京 100 ...
- python金融风控评分卡模型和数据分析
python金融风控评分卡模型和数据分析微专业课(博主录制):http://dwz.date/b9vv 作者Toby:持牌照消费金融模型专家,和中科院,中科大教授保持长期项目合作:和同盾,聚信立等外部 ...
- python金融风控评分卡模型
python金融风控评分卡模型和数据分析微专业课(博主录制): [ http://dwz.date/b9vv ](https://study.163.com/series/1202875601.htm ...
- 基于随机森林、svm、CNN机器学习的风控欺诈识别模型
在信息爆炸时代,"信用"已成为越来越重要的无形财产. "数据风控"的实际意义是用DT(Data Technology)识别欺诈,将欺诈防患于未然,然后净化信用体 ...
- R语言基于决策树的银行信贷风险预警模型
引言 我国经济高速发展,个人信贷业务也随着快速发展,而个人信贷业务对提高内需,促进消费也有拉动作用.有正必有反,在个人信贷业务规模不断扩大的同时,信贷的违约等风险问题也日益突出,一定程度上制约着我国 ...
- python金融风控评分卡模型和数据分析(加强版)-收藏
信用评分卡 信用评分是指根据银行客户的各种历史信用资料,利用一定的信用评分模型,得到不同等级的信用分数,根据客户的信用分数,授信者可以通过分析客户按时还款的可能性,据此决定是否给予授信以及授信的额度和 ...
- 信贷风控评分卡模型(上)_Give Me Some Credit(技术实现过程)
本帖是在2019年5月初入门python之时,选取的较为系统的练手案例,主要内容是信用风险计量体系之主体评级模型的开发过程(可用"四张卡"来表示,分别是A卡.B卡.C卡和F卡). ...
- R语言vtreat包的mkCrossFrameCExperiment函数交叉验证构建数据处理计划并进行模型训练、通过显著性进行变量筛选(删除相关性较强的变量)、构建多变量模型、转化为分类模型、模型评估
R语言vtreat包的mkCrossFrameCExperiment函数交叉验证构建数据处理计划并进行模型训练.通过显著性进行变量筛选(删除相关性较强的变量).构建多变量模型.转化为分类模型.模型评估 ...
- 为多模型寻找模型最优参数、多模型交叉验证、可视化、指标计算、多模型对比可视化(系数图、误差图、混淆矩阵、校正曲线、ROC曲线、AUC、Accuracy、特异度、灵敏度、PPV、NPV)、结果数据保存
使用randomsearchcv为多个模型寻找模型最优参数.多模型交叉验证.可视化.指标计算.多模型对比可视化(系数图.误差图.classification_report.混淆矩阵.校正曲线.ROC曲 ...
最新文章
- flex 单独一行_Flex布局从了解到使用只需5min
- OSPF LSA序列号问题
- bootstrap下拉列表与输入框组结合的样式调整
- 初学__Python——Python 可重用结构:Python模块
- Kaggle实战:点击率预估
- IT与业务之间的鸿沟根源
- dalsa工业相机8k参数_工业传感器再掀巨浪 | Teledyne 以80亿美元收购FLIR,互补性产品组合又增体量...
- MATLAB中用FDATool设计滤波器及使用
- javascript数组类型
- Linux常用的命令及操作技巧
- 机器学习算法中的准确率、精确率、召回率和F值
- 复杂类型java对象 — dto数据传输对象
- VS2017控制台打印问题
- 保存好用的工具---转载
- [我研究]看最新会议相关论文感想
- 网赚APP资源下载类网站源码
- Android 微信授权登陆
- 2009 中国协同软件机遇年?
- 渐变背景怎么搞?2分钟教你制作渐变背景
- 20200227——Spring 框架的设计理念与设计模式分析