时值蚂蚁上市之际，马云在上海滩发表演讲。马云的核心逻辑其实只有一个，在全球数字经济时代，有且只有一种金融优势，那就是基于消费者大数据的纯信用！

我们不妨称之为数据信用，它比抵押更靠谱，它比担保更保险，它比监管更高明，它是一种面向未来的财产权，它是数字货币背后核心的抵押资产，它决定了数字货币时代信用创造的方向、速度和规模。一句话，谁掌握了数据信用，谁就控制了数字货币的发行权！

数据信用判断依靠的就是金融风控模型。更准确的说谁能掌握风控模型知识，谁就掌握了数字货币的发行权！信用评分卡是风控模型中最常见的模型，基于线性算法和sigmoid函数二分类，可以实现自动预测坏客户概率和变量量化分析，有利于高层领导决策。

欢迎各位同学学习python信用评分卡建模（附代码）视频系列教程
地址为：https://edu.csdn.net/course/detail/30611

接下来，我讲解信用评分卡系列内容第5集

信用评分卡-逻辑回归

Credit Scorecards – Logistic Regression (part 5 of 7)

A Primer on Logistic Regression – Are you Happy?

Logistic regression for happiness- by Roopam

A few years ago, my wife and I took a couple of weeks’ vacation to England and Scotland. Just before boarding the British Airway’s plane, an air-hostess informed us that we were upgraded to business class. Jolly good! What a wonderful start to the vacation. Once we got onto to the plane, we got another tempting offer for a further upgrade to the first class. However, this time, there was a catch – just one seat was available. Now that is a shame, of course, we could not take this offer. The business class seats were fabulous before the first class offer came – by the way, all free upgrades. This is the situation behavioral economist describe as relativity & anchoring – in plain English comparison. Anchoring or comparison is at the root of pricing strategies in business and also to all the human sorrow. However, eventually the vacation mood took over and we enjoyed the business class thoroughly. Humans are phenomenally good at adjusting to the situation in the end and enjoy it as well. You will find some of the happiest faces with people in the most difficult situations. Here is a quote by Henry Miller “I have no money, no resources, no hopes. I am the happiest man alive”. Human behavior is full of anomaly – full of puzzles. The following is an example to strengthen this thesis.

几年前，我和妻子在英格兰和苏格兰度过了几个星期的假期。就在登上英国航空公司的飞机之前，一名空姐告诉我们，我们已升级为商务舱。快乐！度假真是一个美好的开始。一旦我们登上飞机，我们又获得了另一个诱人的提议，可以进一步升级到头等舱。然而，这一次，有一个问题 - 只有一个座位可用。当然，这是一种耻辱，我们无法接受这个提议。在提供头等舱优惠之前，商务舱座位非常棒 - 顺便说一下，所有免费升级。这是行为经济学家描述为相对论和锚定的情况 - 用简单的英语比较。锚定或比较是企业定价策略的根源，也是所有人类悲伤的根源。然而，最终度假心情接管了，我们彻底享受了商务舱。人类在适应最终情况方面非常擅长并享受它。在最困难的情况下，你会发现一些最快乐的面孔。以下是亨利米勒的一句话：“我没有钱，没有资源，没有希望。我是最幸福的人“。人类的行为充满了异常 - 充满了谜题。以下是加强本论文的一个例子

列侬，麦卡特尼，哈里森和贝斯特是这个星球上最着名的乐队 - 甲壳虫乐队的成员。好的，我知道你发现了这个错误。到现在为止，你必须说出正确的名字：John Lennon，Paul McCartney，George Harrison和Ringo Starr，而不是Pete Best。实际上，Ringo Starr是Pete Best的替代品，Pete Best是甲壳虫乐队的原始常规鼓手。皮特一定是被摧毁了，看到他的伙伴们在落后的时候冉冉升起。错了，在Google上搜索他 - 他是所有人中最快乐的披头士乐队。现在这是违反直觉的，我想我们不知道是什么让我们开心。

正如在前一篇文章中所承诺的那样，在本文中，我将尝试使用逻辑回归来探索幸福 - 这种技术广泛用于记分卡开发。

Source: flicker.com

Lennon, McCartney, Harrison, and Best are the members of the most famous band ever on the planet – the Beatles. Ok, I know you have spotted the error. By now your must have uttered out the right names: John Lennon, Paul McCartney, George Harrison and Ringo Starr not Pete Best. Actually, Ringo Starr was the replacement for Pete Best, the original regular drummer for the Beatles. Pete must have been devastated seeing his partners rising to glory while he was left behind. Wrong, search for him on Google – he is the happiest Beatle of all. Now that is counter intuitive, I guess we do not have a clue what makes us happy.

As promised in a previous article, in this article I will attempt to explore happiness using logistic regression – the technique extensively used in scorecard development.

我是一位彻底的经验主义者 - 支持基于事实的管理。因此，让我设计一个快速而肮脏的实验*来生成数据来评估幸福感。我们的想法是确定影响我们整体幸福感的因素/变量。让我列出一个生活在城市中的工作成年人的代表性因素列表：

Logistic Regression – An Experiment

I am a thorough empiricist – a proponent of fact-based management. Hence, let me design a quick and dirty experiment* to generate data to evaluate happiness. The idea is to identify the factors / variables that influence our overall happiness. Let me present a representative list of factors for a working adult living in a city:

我是一个彻底的经验主义者-一个基于事实的管理的支持者。因此，让我设计一个快速而肮脏的实验*来生成数据来评估幸福感。这个想法是确定影响我们整体幸福感的因素/变量。让我为一个在城市工作的成年人提供一个有代表性的因素清单：

Now, throw in some other factors to the above list such as – random act of kindness or an unplanned visit to a friend. As you could see, the above list can easily be expanded (recall the article on variable selection- article 3). This is a representative list and you will have to create your own to figure out factors that influence your level of happiness.

The second part of the experiment is to collect data. This is like maintaining a diary only this one will be in Microsoft Excel. Every night before sleeping, you could assess your day and fill up numbers in the Spreadsheet along with your overall level of happiness for the day (as shown in the figure below).

现在，在上面的列表中加入一些其他的因素，比如：随意的友好行为或意外拜访朋友。如您所见，上面的列表可以很容易地展开（回想一下关于变量选择的文章—第3条）。这是一个有代表性的列表，你必须自己创建一个能影响你幸福程度的因素。

实验的第二部分是收集数据。这就像是维护一个日记，只有这一个将在Microsoft Excel中。每天晚上睡觉前，你可以评估一下你的一天，并在电子表格中填写你一天的总体幸福水平（如下图所示）。

*I am calling this a quick and dirty experiment for the following reasons (1) It’s not a well thought out experiment but is created more to illustrate how logistic regression works (2) the observer and the observed are same in this experiment which might create a challenge for objective measurement.

*我称这是一个快速而肮脏的实验，原因如下：（1）这不是一个经过深思熟虑的实验，但更多的是为了说明逻辑回归是如何工作的（2）观察者和被观察者在这个实验中是相同的，这可能会给客观测量带来挑战。

After a couple of years of data collection, you will have enough observations to create a model – a logistic regression model in this case. We are trying to model feeling of happiness (column B) with other columns (C to I) in the above data set. If we plot B on the Y-axis and the additive combination of C to I (we’ll call it Z) on the X-axis it will look something like the plot shown below.

经过几年的数据收集，您将有足够的观察结果来创建一个模型，在本例中是一个逻辑回归模型。我们试图用上述数据集中的其他列（C到I）来模拟幸福感（B列）。如果我们在Y轴上画B，在X轴上画C到I的加法组合（我们称之为Z），它看起来像下面的图。

The idea behind logistic regression is to optimize Z in such a way that we get the best possible distinction between happy and sad faces, as achieved in the plot above. This is a curve-fitting problem with sigmoid function (the curve in violet) as the choice of function.

I would recommend using dates of observations (column A) in our model; this might give an interesting influence of seasons on our mood.

逻辑回归背后的想法是以这样的方式优化Z，使得我们在快乐和悲伤面孔之间得到最佳区分，如上图所示。这是一个曲线拟合问题，其中sigmoid函数（紫色曲线）作为函数的选择。

我建议在我们的模型中使用观察日期（A栏）; 这可能会给季节带来有趣的影响。

Applications in Banking and Finance

This is exactly what we do in case of analytical scorecards such as credit scorecards, behavioral scorecards, fraud scorecards or buying propensity models. Just replace happy and sad faces with …

• Good and Bad borrowers
• Fraud and genuine cases
• Buyers and non-buyers

…. for the respective cases and you have the model. If you remember in the previous article (4), I have shown a simple credit scorecard model: Credit Score = Age + Loan to Value Ratio (LTV) + Instalment (EMI) to Income Ratio (IIR)A straightforward transformation of the sigmoid function will help us arrive at the above equation of the line. This is the final link to arrive at the desired scorecard.

这正是我们在分析记分卡（如信用记分卡、行为记分卡、欺诈记分卡或购买倾向模型）中所做的。把快乐和悲伤的脸换成…

•好借款人和坏借款人

•欺诈和真实案件

•买家和非买家

…. 对于不同的情况，你有模型。如果你还记得在上一篇文章（4）中，我展示了一个简单的信用记分卡模型：信用评分=年龄+贷款价值比（LTV）+分期付款（EMI）与收入比率（IIR）。直接转换S形函数将帮助我们得出上述等式。这是获得所需记分卡的最后一个环节。

Variable Transformation in Credit Scorecards

The Swordsmith – by Roopam

I loved the movie Kill-Bill, both parts. In the first part, I enjoyed when Uma Thurman’s character went to Japan to get a sword from Hattori Hanzō, the legendary swordsmith. After learning about her motive, he agrees to make his finest sword for her. Then Quentin Tarantino, director of the movie, briefly showed the process of making the sword. Hattori Hanzō transformed a regular piece of iron to the fabulous sword – what a craftsman. This is fairly similar to how analysts perform transformation of the sigmoid function to the linear equation. The difference is that analysts use mathematical tools rather than hammers and are not as legendary as Hattori Hanzō.

我喜欢电影Kill-Bill这两部分。在第一部分中，当Uma Thurman的角色去日本从传说中的剑士HattoriHanzō手中拿剑时，我很享受。在了解了她的动机之后，他同意为她做出最好的剑。然后电影导演昆汀·塔伦蒂诺（Quentin Tarantino）简要介绍了制作剑的过程。 HattoriHanzō将一块普通的铁片变成了神话般的剑 - 这真是一个工匠。这与分析师如何将S形函数转换为线性方程非常相似。不同之处在于，分析师使用数学工具而不是锤子，并不像HattoriHanzō那样具有传奇色彩。

Reject Inference

Reject inference is a distinguishing aspect about credit or application scorecards which is different from all other classification models. For the application scorecards, the development sample is biased because of the absence of performance for rejected loans. Reject inference is a way to rectify this shortcoming and removing the bias from the sample. We will discuss reject inference in detail in some later article on YOU CANalytics.

拒绝推断是信用或应用记分卡的一个显着方面，它与所有其他分类模型不同。对于应用记分卡，由于拒绝贷款缺乏绩效，开发样本存在偏差。拒绝推断是一种纠正这一缺点并消除样本偏差的方法。我们将在后面有关您的CANalytics的文章中详细讨论拒绝推断。

Sign-off Note

Now that we have our scorecard ready the next task is to validate the predictive power of the scorecard. This is precisely what we will do in the next article. See you soon.

现在我们已经准备好了记分卡，下一个任务就是验证记分卡的预测能力。这正是我们在下一篇文章中将要做的。期待很快与您见面。

欢迎各位同学学习系列课python金融风控评分卡模型和数据分析，包括逻辑回归，评分卡，树模型（xgboost,lightbm,catboost）,神经网络算法，信贷用户数据分析和用户画像等全面性知识。
地址为：https://edu.csdn.net/combo/detail/1927

信用评分卡 (part 5 of 7)相关推荐

信用评分卡模型的理论准备
目录 0 前言 1 构建评分卡的整个流程图 2 信息值 IV(Information Value)和证据权重 WOE(Weight of Evidence) 2.1 WOE 定义 2.2 IV 定义 ...
基于R的信用评分卡模型解析
信用评分流程 1.数据获取我使用的信贷数据共有3000条数据,每条数据11个特征. rm(list=ls()) setwd("D:\\case") library(xlsx) d ...
r k-means 分类结果_R语言信用评分卡：数据分箱（binning）
作者:黄天元,复旦大学博士在读,热爱数据科学与R,热衷推广R在工业界与学术界的应用.邮箱:huang.tian-yuan@qq.com.欢迎合作交流 library(knitr) opts_chunk ...
信用评分python_信用评分卡（python）
目录导入数据缺失值和异常值处理特征可视化特征选择模型训练模型评估模型结果转评分计算用户总分一.导入数据 #导入模块 importpandas as pdimportnumpy as ...
python信用评分卡_基于Python的信用评分卡模型分析（二）
上一篇文章基于Python的信用评分卡模型分析(一)已经介绍了信用评分卡模型的数据预处理.探索性数据分析.变量分箱和变量选择等.接下来我们将继续讨论信用评分卡的模型实现和分析,信用评分的方法和自动评分 ...
数据挖掘项目：银行信用评分卡建模分析（上篇）
kaggle上的Give Me Some Credit一个8年前的老项目,网上的分析说明有很多,但本人通过阅读后,也发现了很多的问题.比如正常随着月薪越高,违约率会下降.但对于过低的月薪,违约率却为0 ...
数据挖掘项目：银行信用评分卡建模分析（下篇）
以下是银行信用评分卡建模分析下篇的内容,包括特征工程,构建模型,模型评估,评分卡建立这四部分.其中如果有一些地方分析的不正确,希望大家多多指正,感谢! 上篇文章的链接:数据挖掘项目:银行信用评分卡建模 ...
[机器学习] 信用评分卡中的应用 | 干货
背景介绍与评分卡模型的基本概念如今在银行.消费金融公司等各种贷款业务机构,普遍使用信用评分,对客户实行打分制,以期对客户有一个优质与否的评判.交易对手未能履行约定契约中的义务而造成经济损失的风险,即 ...
基于Python的信用评分卡建模分析
1.背景介绍信用评分技术是一种应用统计模型,其作用是对贷款申请人(信用卡申请人)做风险评估分值的方法.信用评分卡模型是一种成熟的预测方法,尤其在信用风险评估以及金融风险控制领域更是得到了比较广泛的使 ...
3分钟搞明白信用评分卡模型模型验证
2019独角兽企业重金招聘Python工程师标准>>> 信用评分卡模型在国外是一种成熟的预测方法,尤其在信用风险评估以及金融风险控制领域更是得到了比较广泛的使用,其原理是将模型变量W ...

信用评分卡 (part 5 of 7)