随机森林的几个重要参数

2024-06-28 19:16:16

翻译自：https://www.analyticsvidhya.com/blog/2015/06/tuning-random-forest-model/

There are primarily 3 features which can be tuned to improve the predictive power of the model :

说明：随机森林有3个比较重要的参数，对结果影响比较大，max_features，n_estimators，min_sample_leaf

1.a. max_features:

These are the maximum number of features Random Forest is allowed to try in individual tree. There are multiple options available in Python to assign maximum features. Here are a few of them :

Auto/None : This will simply take all the features which make sense in every tree.Here we simply do not put any restrictions on the individual tree.
sqrt : This option will take square root of the total number of features in individual run. For instance, if the total number of variables are 100, we can only take 10 of them in individual tree.”log2″ is another similar type of option for max_features.
0.2 : This option allows the random forest to take 20% of variables in individual run. We can assign and value in a format “0.x” where we want x% of features to be considered.

How does “max_features” impact performance and speed?

Increasing max_features generally improves the performance of the model as at each node now we have a higher number of options to be considered. However, this is not necessarily true as this decreases the diversity of individual tree which is the USP of random forest. But, for sure, you decrease the speed of algorithm by increasing the max_features. Hence, you need to strike the right balance and choose the optimal max_features.

1.b. n_estimators :

This is the number of trees you want to build before taking the maximum voting or averages of predictions. Higher number of trees give you better performance but makes your code slower. You should choose as high value as your processor can handle because this makes your predictions stronger and more stable.

1.c. min_sample_leaf :

If you have built a decision tree before, you can appreciate the importance of minimum sample leaf size. Leaf is the end node of a decision tree. A smaller leaf makes the model more prone to capturing noise in train data. Generally I prefer a minimum leaf size of more than 50. However, you should try multiple leaf sizes to find the most optimum for your use case.

说明：如果 min_sample_leaf过小，很容易过拟合，学习到噪声

随机森林的几个重要参数相关推荐

python 随机森林分类 DecisionTreeClassifier 随机搜索优化参数 GridSearchCV
@python 随机森林分类模型随机优化参数学习笔记随机森林 1.随机森林模型随机森林算法是基于决策树算法的Begging优化版本,通过集成学习的思想将多棵树集成的一种算法,它的基本单元是决策 ...
Lesson 9.2 随机森林回归器的参数
文章目录一.弱分类器的结构 1. 分枝标准与特征重要性 2. 调节树结构来控制过拟合二.弱分类器的数量三.弱分类器训练的数据 1. 样本的随机抽样 2. 特征的随机抽样 3. 随机抽样的模式四 ...
利用mysql建立随机森林_随机森林算法实例 - osc_4imme0wh的个人空间 - OSCHINA - 中文开源技术交流社区...
根据成年人数据集来预测一个人的收入 1.准备数据集我下载好了一个成年人数据集,从百度云下载链接:https://pan.baidu.com/s/10gC8U0tyh1ERxLhtY8i0bQ 提取 ...
机器学习（10）随机森林（预测泰坦尼克号旅客存活率）
目录一.基础理论 1.集成学习方法 2.随机森林 API 二.过程 1.创建随机森林预估器 2.参数准备(网格搜索) 3.训练模型评估结果: 总代码一.基础理论 1.集成学习方法集成学习通过建 ...
随机森林 java_机器学习weka，java api调用随机森林及保存模型
工作需要,了解了一下weka的java api,主要是随机森林这一块,刚开始学习,记录下. 了解不多,直接上demo,里面有一些注释说明: package weka; import java.io.F ...
Facebook工程师教你什么是随机森林，就算零基础也可以看懂 | 干货
白交发自凹非寺量子位报道 | 公众号 QbitAI 今天的这篇入门贴,我们就来介绍一下决策树与随机森林. 这篇帖子适合机器学习基础为0的同学~ 当然,有基础的同学也可以来看一下,加深一下理解 ...
12_信息熵，信息熵公式，信息增益，决策树、常见决策树使用的算法、决策树的流程、决策树API、决策树案例、随机森林、随机森林的构建过程、随机森林API、随机森林的优缺点、随机森林案例
1 信息熵以下来自:https://www.zhihu.com/question/22178202/answer/161732605 1.2 信息熵的公式先抛出信息熵公式如下: 1.2 信息熵信 ...
机器学习-分类算法-决策树，随机森林10
决策树: 决策树的思想来源非常朴素,程序设计中的条件分支机构就是if-then结构,最早的决策树就是利用这类结构分割数据的一种分类学习方法. 信息和消除不确定性是相联系的信息增益:当得知一个特征后, ...
比赛结果预测_决策树_随机森林(通用数据挖掘入门与实践-实验5)
#数据导入 import pandas as pddata_filename="datasets.csv" dataset=pd.read_csv(data_filename) # ...

最新文章

热门文章