linux内核计数函数,Linux 内核学习笔记：malloc 函数

Linux 0.11 实现的 malloc 函数实际上分配的是内核程序所用的内存，而不是用户程序的，因为源码中找不到类似系统调用 malloc 的代码。在 Linux 0.98 后，为了不与用户程序使用的 malloc 相混淆，将内核使用的 malloc 函数改名为 kmalloc(free_s 改名为 kfree_s)。

对于 malloc 函数，malloc.c 文件开头的注释其实解释的很清晰：

malloc.c — a general purpose kernel memory allocator for Linux.

Written by Theodore Ts’o (tytso@mit.edu), 11/29/91

This routine is written to be as fast as possible, so that it can be called from the interrupt level.

Limitations: maximum size of memory we can allocate using this routine is 4k, the size of a page in Linux.

The general game plan is that each page (called a bucket) will only hold objects of a given size. When all of the object on a page are released, the page can be returned to the general free pool. When malloc() is called, it looks for the smallest bucket size which will fulfill its request, and allocate a piece of memory from that bucket pool.

Each bucket has as its control block a bucket descriptor which keeps track of how many objects are in use on that page, and the free list for that page. Like the buckets themselves, bucket descriptors are stored on pages requested from get_free_page(). However, unlike buckets, pages devoted to bucket descriptor pages are never released back to the system. Fortunately, a system should probably only need 1 or 2 bucket descriptor pages, since a page can hold 256 bucket descriptors (which corresponds to 1 megabyte worth of bucket pages.) If the kernel is using that much allocated memory, it’s probably doing something wrong. :-)

Note: malloc() and free() both call get_free_page() and free_page() in sections of code where interrupts are turned off, to allow malloc() and free() to be safely called from an interrupt routine. (We will probably need this functionality when networking code, particularily things like NFS, is added to Linux.) However, this presumes that get_free_page() and free_page() are interrupt-level safe, which they may not be once paging is added. If this is the case, we will need to modify malloc() to keep a few unused pages “pre-allocated” so that it can safely draw upon those pages if it is called from an interrupt routine.

Another concern is that get_free_page() should not sleep; if it does, the code is carefully ordered so as to avoid any race conditions. The catch is that if malloc() is called re-entrantly, there is a chance that unecessary pages will be grabbed from the system. Except for the pages for the bucket descriptor page, the extra pages will eventually get released back to the system, though, so it isn’t all that bad.

也可以直接看翻译：

malloc.c - Linux 的通用内核内存分配函数。

由Theodore Ts’o 编制 (tytso@mit.edu), 11/29/91

该函数被编写成尽可能地快，从而可以从中断层调用此函数。

限制：使用该函数一次所能分配的最大内存是 4k，也即 Linux 中内存页面的大小。

编写该函数所遵循的一般规则是每页(被称为一个存储桶)仅分配所要容纳对象的大小。当一页上的所有对象都释放后，该页就可以返回通用空闲内存池。当 malloc() 被调用时，它会寻找满足要求的最小的存储桶，并从该存储桶中分配一块内存。

每个存储桶都有一个作为其控制用的存储桶描述符，其中记录了页面上有多少对象正被使用以及该页上空闲内存的列表。就象存储桶自身一样，存储桶描述符也是存储在使用 get_free_page() 申请到的页面上的，但是与存储桶不同的是，桶描述符所占用的页面将不再会释放给系统。幸运的是一个系统大约只需要1 到2 页的桶描述符页面，因为一个页面可以存放 256 个桶描述符(对应 1MB 内存的存储桶页面)。如果系统为桶描述符分配了许多内存，那么肯定系统什么地方出了问题?。

注意！malloc() 和 free() 两者关闭了中断的代码部分都调用了 get_free_page() 和 free_page() 函数，以使 malloc() 和 free() 可以安全地被从中断程序中调用(当网络代码，尤其是 NFS 等被加入到 Linux 中时就可能需要这种功能)。但前提是假设 get_free_page() 和 free_page() 是可以安全地在中断级程序中使用的，这在一旦加入了分页处理之后就可能不是安全的。如果真是这种情况，那么我们就需要修改 malloc() 来“预先分配”几页不用的内存，如果 malloc() 和 free() 被从中断程序中调用时就可以安全地使用这些页面。

另外需要考虑到的是 get_free_page() 不应该睡眠；如果会睡眠的话，则为了防止任何竞争条件，代码需要仔细地安排顺序。关键在于如果 malloc() 是可以重入地被调用的话，那么就会存在不必要的页面被从系统中取走的机会。除了用于桶描述符的页面，这些额外的页面最终会释放给系统，所以并不是象想象的那样不好。

数据结构

malloc 函数使用了桶(bucket)的原理对分配的内存进行管理。基本思想是对不同请求的内存块大小(长度)，使用桶目录进行管理。例如，对于请求内存块的长度在 64 字节或 64 字节以下但大于 32 字节时，就使用桶目录中的第 3 项桶目录项所指向的桶描述符链表分配内存。

malloc 函数所涉及的数据结构有 3 个，即桶描述符、桶目录项、桶目录。桶目录存储桶目录项，而桶目录项指向桶描述符链表。具体代码如下：

40// lib/malloc.c --------------------------------

// 桶描述符

struct bucket_desc {/* 16 bytes */

void*page;

struct bucket_desc*next;

void*freeptr;

unsigned shortrefcnt;

unsigned shortbucket_size;

};

// 桶目录项

struct _bucket_dir {/* 8 bytes */

intsize;

struct bucket_desc*chain;

};

* The following is the where we store a pointer to the first bucket

* descriptor for a given size.

* If it turns out that the Linux kernel allocates a lot of objects of a

* specific size, then we may want to add that specific size to this list,

* since that will allow the memory to be allocated more efficiently.

* However, since an entire page must be dedicated to each specific size

* on this list, some amount of temperance must be exercised here.

* Note that this list *must* be kept in order.

// 桶目录

struct _bucket_dir bucket_dir[] = {

{ 16,(struct bucket_desc *) 0},

{ 32,(struct bucket_desc *) 0},

{ 64,(struct bucket_desc *) 0},

{ 128,(struct bucket_desc *) 0},

{ 256,(struct bucket_desc *) 0},

{ 512,(struct bucket_desc *) 0},

{ 1024,(struct bucket_desc *) 0},

{ 2048, (struct bucket_desc *) 0},

{ 4096, (struct bucket_desc *) 0},

{ 0, (struct bucket_desc *) 0}}; /* End of list marker */

图片来源：《Linux 内核完全注释》(修改过)

malloc 函数

桶描述符链表初始化

第一次调用 malloc 函数是，需要对桶描述符链表进行初始化。代码如下：

28// lib/malloc.c --------------------------------

* This contains a linked list of free bucket descriptor blocks

struct bucket_desc *free_bucket_desc = (struct bucket_desc *) 0;

* This routine initializes a bucket description page.

static inline void init_bucket_desc()

{

struct bucket_desc *bdesc, *first;

inti;

first = bdesc = (struct bucket_desc *) get_free_page();

if (!bdesc)

panic("Out of memory in init_bucket_desc()");

for (i = PAGE_SIZE/sizeof(struct bucket_desc); i > 1; i--) {

bdesc->next = bdesc+1;

bdesc++;

}

* This is done last, to avoid race conditions in case

* get_free_page() sleeps and this routine gets called again....

bdesc->next = free_bucket_desc;

free_bucket_desc = first;

}

上述代码可用图表示如下：

图片来源：《Linux 内核完全注释》(修改过)

需要注意的是，这个桶描述符链表是给"所有"桶目录项共用的，而不是只给某个桶目录项使用。一个桶目录项会有属于自己的"局部"桶描述符链表，只不过这个链表是全局的局部。

malloc 函数

malloc 函数定义如下：

58// lib/malloc.c --------------------------------

void *malloc(unsigned int len)

{

struct _bucket_dir*bdir;

struct bucket_desc*bdesc;

void*retval;

* First we search the bucket_dir to find the right bucket change

* for this request.

for (bdir = bucket_dir; bdir->size; bdir++)

if (bdir->size >= len)

break;

if (!bdir->size) {

printk("malloc called with impossibly large argument (%d)\n",

len);

panic("malloc: bad arg");

}

* Now we search for a bucket descriptor which has free space

cli();/* Avoid race conditions */

for (bdesc = bdir->chain; bdesc; bdesc = bdesc->next)

if (bdesc->freeptr)

break;

* If we didn't find a bucket with free space, then we'll

* allocate a new one.

if (!bdesc) {

char*cp;

inti;

if (!free_bucket_desc)

init_bucket_desc();

bdesc = free_bucket_desc;

free_bucket_desc = bdesc->next;

bdesc->refcnt = 0;

bdesc->bucket_size = bdir->size;

bdesc->page = bdesc->freeptr = (void *) cp = get_free_page();

if (!cp)

panic("Out of memory in kernel malloc()");

/* Set up the chain of free objects */

for (i=PAGE_SIZE/bdir->size; i > 1; i--) {

*((char **) cp) = cp + bdir->size;// ！！！

cp += bdir->size;

}

*((char **) cp) = 0;

bdesc->next = bdir->chain; /* OK, link it in! */

bdir->chain = bdesc;

}

retval = (void *) bdesc->freeptr;

bdesc->freeptr = *((void **) retval);// 指向下一个空闲块

bdesc->refcnt++;

sti();/* OK, we're safe again */

return(retval);

}

free_s 函数

free_s 函数定义如下：

58// lib/malloc.c --------------------------------

* Here is the free routine. If you know the size of the object that you

* are freeing, then free_s() will use that information to speed up the

* search for the bucket descriptor.

* We will #define a macro so that "free(x)" is becomes "free_s(x, 0)"

void free_s(void *obj, int size)

{

void*page;

struct _bucket_dir*bdir;

struct bucket_desc*bdesc, *prev;

/* Calculate what page this object lives in */

page = (void *) ((unsigned long) obj & 0xfffff000);

/* Now search the buckets looking for that page */

for (bdir = bucket_dir; bdir->size; bdir++) {

prev = 0;

/* If size is zero then this conditional is always false */

if (bdir->size < size)

continue;

for (bdesc = bdir->chain; bdesc; bdesc = bdesc->next) {

if (bdesc->page == page)

goto found;

prev = bdesc;

}

panic("Bad address passed to kernel free_s()");

found:

cli(); /* To avoid race conditions */

*((void **)obj) = bdesc->freeptr;

bdesc->freeptr = obj;

bdesc->refcnt--;

if (bdesc->refcnt == 0) {// 若引用计数为 0，表示该桶描述符对应的页面已经完全空出

* We need to make sure that prev is still accurate. It

* may not be, if someone rudely interrupted us....

if ((prev && (prev->next != bdesc)) ||

(!prev && (bdir->chain != bdesc)))

for (prev = bdir->chain; prev; prev = prev->next)

if (prev->next == bdesc)

break;

if (prev)

prev->next = bdesc->next;

else {

if (bdir->chain != bdesc)

panic("malloc bucket chains corrupted");

bdir->chain = bdesc->next;

}

free_page((unsigned long) bdesc->page);

bdesc->next = free_bucket_desc;

free_bucket_desc = bdesc;

}

sti();

return;

}

现代 malloc 实现

在现代 Linux 系统中，有专用于内核程序的 kmalloc(在 Linux 0.11 称为 malloc)和专用于用户程序的 malloc 函数。

在这里，我们指的是专用于用户程序的 malloc 函数。关于其实现还是比较复杂的，可参考资料：

linux内核计数函数,Linux 内核学习笔记：malloc 函数相关推荐

Linux内核设计与实现学习笔记目录
**注:**这是别人的笔记,我只是把目录抄过来 <Linux内核设计与实现学习笔记> 1.<Linux内核设计与实现>读书笔记(一)-内核简介 2.<Linux内核设计与 ...
《Linux高性能服务器编程》学习笔记
<Linux高性能服务器编程>学习笔记 Linux高性能服务器编程 TCP/IP协议族 TCP/IP协议族体系结构以及主要协议数据链路层网络层传输层应用层封装分用测试网络 A ...
《鸟哥的Linux私房菜》个人学习笔记-第一篇
<鸟哥的Linux私房菜>个人学习笔记-基础篇这是一篇一个linux菜鸡自学的笔记 csdn上的各位大手子们好,本人实习生一枚最近想自己深入学习下linux,所以在社区里发博客,希望能记 ...
鸟叔linux私房菜基础篇简体,鸟叔的Linux私房菜基础篇-学习笔记(一)
鸟叔的Linux私房菜基础篇-学习笔记(一) 开机进入命令行模式: ctrl+alt+[F1-F6]的任意键进入命令行编辑界面 ctrl+alt+F7进入图形界面模式开始下达指令 [dmtsai@s ...
Linux C/C++ 开发（学习笔记十一）：TCP服务器（并发网络网络编程一请求一线程）
Linux C/C++ 开发(学习笔记十一 ):TCP服务器(并发网络网络编程一请求一线程) 一.TCP服务器(一请求一线程) 的原理二.完整代码三.测试四.补充一.TCP服务器(一请求一线 ...
Class4 Linux云上环境搭建学习笔记
Class4 Linux云上环境搭建学习笔记 Linux的远程管理为Linux环境安装图形化桌面(Gnome) 学习Linux的基本操作更新一个官方教程附阿里云高校学习计划的地址 class4 ...
python函数是一段具有特定功能的语句组_Python学习笔记(五)函数和代码复用
本文将为您描述Python学习笔记(五)函数和代码复用,具体完成步骤: 函数能提高应用的模块性,和代码的重复利用率.在很多高级语言中,都可以使用函数实现多种功能.在之前的学习中,相信你已经知道Pyth ...
Python学习笔记：函数（Function）
Python学习笔记:函数(Function) 一.函数基本概念函数是Python里组织与重用代码最重要的方法.一般来说,如果你期望多次重复相同或相似的代码,写一个可重用的函数可能是值得的.函数通过 ...
php中声明一个函数,php学习笔记之函数声明
/* 函数定义: * 1.函数是一个被命名的 * 2.独立的代码段 * 3.函数执行特定任务 * 4.并可以给调用它的程序返回一个值 * * 函数的优点: * 1.提高程序的重用性 * 2.提高程序的 ...

linux内核计数函数,Linux 内核学习笔记：malloc 函数

linux内核计数函数,Linux 内核学习笔记：malloc 函数相关推荐

最新文章

热门文章