malloc申请内存问题

问题描述

最近发现了一个越界有概率会造成段错误的问题。具体问题是这样的，首先malloc申请一块内存，但使用时比实际的大一个字节，比如我申请了52个字节，使用了53个或者申请50个使用了51个，然后我发现的现象是当我申请了52个字节使用了53个字节的时候，程序肯定会挂掉，但申请了50个字节使用了51个的时候程序是不会挂的。同样是越界，为什么会造成这样的结果呢？

问题排查

于是，做了一个的实验，查看申请的内存和实际可使用的内存是否一致的，使用malloc_usable_size可以查看内存的实际可用空间。

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <string.h>
//打印从startAddr到endAddr的字节
void printAddrData1Byte(void* startAddr, void* endAddr)
{printf("printf startAddr = %p to endAddr = %p data\n", startAddr, endAddr);char* pMove = (char*)startAddr;int i = 0;while(((char*)endAddr - pMove) != 0){printf("%x  ", (unsigned char)*pMove);pMove += 1;i++;if(!(i % 4))printf("\n");}
}int main()
{char *p=(char *)malloc(0);char *p1=(char *)malloc(13);char *p2=(char *)malloc(21);char *p3=(char *)malloc(29);char *p4=(char *)malloc(37);printf("p size %d\n",malloc_usable_size(p));printf("p1 size %d\n",malloc_usable_size(p1));printf("p2 size %d\n",malloc_usable_size(p2));printf("p3 size %d\n",malloc_usable_size(p3));printf("p4 size %d\n",malloc_usable_size(p4));printf("p adddr is %p\n",p);printf("p1 adddr is %p\n",p1);printf("p2 adddr is %p\n",p2);printf("p3 adddr is %p\n",p3);printf("p4 adddr is %p\n",p4);free(p);free(p1);free(p2);free(p3);free(p4);}

测试结果：

从测试结果可以看出，申请了0个字节的时候，实际可用字节是12个；申请13个的时候可以字节是20个；申请21个时候，实际可用的是28个，以此类推，可以发现，malloc申请的内存在32位系统是以8个字节为一个单位，并不是说申请多少个字节就分配多少个字节，大多数情况是多分配几个字节给你。但如果你恰好申请的字节是8的倍数，那么你申请的内存实际可用的空间大小也就和你申请的空间大小一样了。

所以，这就能解释为什么，我们申请52个字节的时候使用了53个字节会造成段错误，那是因为我们真的越界了，但申请了50个字节的时候使用了51个字节，实际上你使用的还是系统分配给你的，因为实际可用的大小是52，所以并不会造成段错误。

malloc_chunk

通过查看测试结果的地址，可能会有人发现，0x9104008到0x9104018相差了16个字节的地址，但p实际可用的空间大小为12，为什么会多出4个字节来呢？

带着这个问题，我们来看一下malloc的源码，有以下的一个malloc_block定结构的定义：

struct malloc_chunk {INTERNAL_SIZE_T      prev_size;  /* Size of previous chunk (if free).  */INTERNAL_SIZE_T      size;       /* Size in bytes, including overhead. */struct malloc_chunk* fd;         /* double links -- used only if free. */struct malloc_chunk* bk;/* Only used for large blocks: pointer to next larger size.  */struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */struct malloc_chunk* bk_nextsize;
};/*malloc_chunk details:(The following includes lightly edited explanations by Colin Plumb.)Chunks of memory are maintained using a `boundary tag' method asdescribed in e.g., Knuth or Standish.  (See the paper by PaulWilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for asurvey of such techniques.)  Sizes of free chunks are stored bothin the front of each chunk and at the end.  This makesconsolidating fragmented chunks into bigger chunks very fast.  Thesize fields also hold bits representing whether chunks are free orin use.An allocated chunk looks like this:chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|             Size of previous chunk, if allocated            | |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|             Size of chunk, in bytes                       |M|P|mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|             User data starts here...                          ..                                                               ..             (malloc_usable_size() bytes)                      ..                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|             Size of chunk                                     |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Where "chunk" is the front of the chunk for the purpose of most ofthe malloc code, but "mem" is the pointer that is returned to theuser.  "Nextchunk" is the beginning of the next contiguous chunk.Chunks always begin on even word boundaries,(总是以偶数字长为边界,意味着以2 * size_t为对齐) so the mem portion(which is returned to the user) is also on an even word boundary, andthus at least double-word aligned(double-word对齐).Free chunks are stored in circular doubly-linked lists, and look like this:chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|             Size of previous chunk                            |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+`head:' |             Size of chunk, in bytes                         |P|mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|             Forward pointer to next chunk in list             |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|             Back pointer to previous chunk in list            |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|             Unused space (may be 0 bytes long)                ..                                                               ..                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+`foot:' |             Size of chunk, in bytes                           |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+The P (PREV_INUSE) bit, stored in the unused low-order bit of thechunk size (which is always a multiple of two words), is an in-usebit for the *previous* chunk.  If that bit is *clear*, then theword before the current chunk size contains the previous chunksize, and can be used to find the front of the previous chunk.The very first chunk allocated always has this bit set,preventing access to non-existent (or non-owned) memory. Ifprev_inuse is set for any given chunk, then you CANNOT determinethe size of the previous chunk, and might even get a memoryaddressing fault when trying to do so.Note that the `foot' of the current chunk is actually representedas the prev_size of the NEXT chunk. This makes it easier todeal with alignments etc but can be very confusing when tryingto extend or adapt this code.The two exceptions to all this are1. The special chunk `top' doesn't bother using thetrailing size field since there is no next contiguous chunkthat would have to index off it. After initialization, `top'is forced to always exist.  If it would become less thanMINSIZE bytes long, it is replenished.2. Chunks allocated via mmap, which have the second-lowest-orderbit M (IS_MMAPPED) set in their size fields.  Because they areallocated one-by-one, each must contain its own trailing size field.
*/

当一个内存块为空闲时，至少要有prev_size、size、fd和bk四个参数，因此MINSIZE就代表了这四个参数需要占用的内存大小。而当前一个内存块被使用时，prev_size可能会被前一个内存块用来存储其大小，fd和bk也会被当作内存存储数据，因此当内存块被使用时，只剩下了size参数需要设置。MIN_CHUNK_SIZE就是malloc生成时最小的空间。所以在32位系统下，即使是malloc(0)时，也会有4*size_t = 16字节，除掉size的大小，用户可使用的是24字节。在内存块空闲的时候，prev_size、fd和bk这三个参数才会发挥作用。

所以，上面说到的多出4个字节其实是size的大小；但size仅仅是用来存储内存块的大小的吗？其实并不止，通过下图，可以更直观的理解。

有几个需要说明一下：

chunk指针指向chunk开始的地址；mem指针指向用户内存块开始的地址。
p=0时，表示前一个chunk为空闲，prev_size才有效。
p=1时，表示前一个chunk正在使用，prev_size无效 p主要用于内存块的合并操作；ptmalloc 分配的第一个块总是将p设为1, 以防止程序引用到不存在的区域
M=1 为mmap映射区域分配；M=0为heap区域分配
A=0 为主分配区分配；A=1 为非主分配区分配
当chunk空闲时，其M状态是不存在的，只有AP状态
原本是用户数据区的地方存储了四个指针，指针fd指向后一个空闲的chunk,而bk指向前一个空闲的chunk，malloc通过这两个指针将大小相近的chunk连成一个双向链表。在large bin中的空闲chunk，还有两个指针，fd_nextsize和bk_nextsize，用于加快在large bin中查找最近匹配的空闲chunk。不同的chunk链表又是通过bins或者fastbins来组织的。

其实可变相看成，一个chunk有头部和尾部，的头部和尾部都是保存size of chunk，当尾部划分到下一个chunk的区域时，则变成了prev_size。chunk在被使用时，除了size外，其他的字段都被用来存储数据，是为了提高chunk的有效荷载。在《深入理解计算机系统》中，也提到了头部和尾部保存当前块的大小，已分配的块中不再需要脚部，只有当前面块是空闲时，才会需要用到它的的脚部。

测试验证

那到底是不是真的是这样呢？我写了一个测试程序如下

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <string.h>
//打印从startAddr到endAddr的字节
void printAddrData1Byte(void* startAddr, void* endAddr)
{printf("printf startAddr = %p to endAddr = %p data\n", startAddr, endAddr);char* pMove = (char*)startAddr;int i = 0;while(((char*)endAddr - pMove) != 0){printf("%x  ", (unsigned char)*pMove);pMove += 1;i++;if(!(i % 4))printf("\n");}
}int main()
{char *p=(char *)malloc(0);char *p1=(char *)malloc(13);char *p2=(char *)malloc(21);char *p3=(char *)malloc(29);char *p4=(char *)malloc(37);char *p5=(char *)malloc(132*1024);printAddrData1Byte(p-4, p);printAddrData1Byte(p1-4, p1);printAddrData1Byte(p2-4, p2);printAddrData1Byte(p3-4, p3);printAddrData1Byte(p4-4, p4);printAddrData1Byte(p5-4, p5);printf("p size %d\n",malloc_usable_size(p));printf("p1 size %d\n",malloc_usable_size(p1));printf("p2 size %d\n",malloc_usable_size(p2));printf("p3 size %d\n",malloc_usable_size(p3));printf("p4 size %d\n",malloc_usable_size(p4));printf("p4 size %d\n",malloc_usable_size(p5));printf("p adddr is %p\n",p);printf("p1 adddr is %p\n",p1);printf("p2 adddr is %p\n",p2);printf("p3 adddr is %p\n",p3);printf("p4 adddr is %p\n",p4);printf("p5 adddr is %p\n",p5);free(p);free(p1);free(p2);free(p3);free(p4);free(p5);
}

测试结果：

从测试结果可以看出，当实际大小为12时，其头部的第一个字节的十六进制为11，换成二进制则是0001 0001；当实际大小为20时，其头部的第一个字节的十六进制为19，换成二进制则是0001 1001,；以此类推，可以看出其低三位是不会变的，那也就对应了上面所说的第三位是AMP标志位，而P为1则说明前一个chunk正在使用，所以说，实验结果是一致的。

malloc大于128k的内存，使用mmap分配内存

或者有人会发现，p5申请了132KB内存，这时候AMP标志位的M置为1了，而且它的地址和之前的不一样，并没有和p4相邻呢？

前面说到，当M=1 时为mmap映射区域分配，那怎么样才能使用mmap映射区域分配内存呢？从下面源码我们可以得出答案，当申请的内存大于>=mmap_threshold使用mmap函数。最小的threshold = 128KB。

The maximum overhead wastage (i.e., number of extra bytes
allocated than were requested in malloc) is less than or equal
to the minimum size, except for requests >= mmap_threshold that
are serviced via mmap(), where the worst case wastage is 2 *
sizeof(size_t) bytes plus the remainder from a system page (the
minimal mmap unit); typically 4096 or 8192 bytes./*
MMAP_THRESHOLD_MAX and _MIN are the bounds on the dynamically
adjusted MMAP_THRESHOLD.
*/#ifndef DEFAULT_MMAP_THRESHOLD_MIN
#define DEFAULT_MMAP_THRESHOLD_MIN (128 * 1024)
#endif#ifndef DEFAULT_MMAP_THRESHOLD_MAX
/* For 32-bit platforms we cannot increase the maximum mmapthreshold much because it is also the minimum value for themaximum heap size and its alignment.  Going above 512k (i.e., 1Mfor new heaps) wastes too much address space.  */
# if __WORDSIZE == 32
#  define DEFAULT_MMAP_THRESHOLD_MAX (512 * 1024)
# else
#  define DEFAULT_MMAP_THRESHOLD_MAX (4 * 1024 * 1024 * sizeof(long))
# endif
#endif

内存分配的原理

或者有人又问，mmap映射区域分配有什么不同呢？

从操作系统角度来看，进程分配内存有两种方式，分别由两个系统调用完成：brk和mmap（不考虑共享内存）。brk是将数据段(.data)的最高地址指针_edata往高地址推；mmap是在进程的虚拟地址空间中（堆和栈中间，称为文件映射区域的地方）找一块空闲的虚拟内存。

这两种方式分配的都是虚拟内存，没有分配物理内存。在第一次访问已分配的虚拟地址空间的时候，发生缺页中断，操作系统负责分配物理内存，然后建立虚拟内存和物理内存之间的映射关系。

在标准C库中，提供了malloc/free函数分配释放内存，这两个函数底层是由brk，mmap，munmap这些系统调用实现的

下面以一个例子来说明内存分配的原理：

情况一：malloc小于128k的内存

情况一、malloc小于128k的内存，使用brk分配内存，将_edata往高地址推(只分配虚拟空间，不对应物理内存(因此没有初始化)，第一次读/写数据时，引起内核缺页中断，内核才分配对应的物理内存，然后虚拟地址空间建立映射关系)，如下图：

进程启动的时候，其（虚拟）内存空间的初始布局如图中的（1）所示。其中，mmap内存映射文件是在堆和栈的中间（例如libc-2.2.93.so，其它数据文件等），为了简单起见，省略了内存映射文件。_edata指针（glibc里面定义）指向数据段的最高地址。
进程调用A=malloc(30K)以后，内存空间如图中（2）所示，malloc函数会调用brk系统调用，将_edata指针往高地址推30K，就完成虚拟内存分配。你可能会问：只要把_edata+30K就完成内存分配了？事实是这样的，_edata+30K只是完成虚拟地址的分配，A这块内存现在还是没有物理页与之对应的，等到进程第一次读写A这块内存的时候，发生缺页中断，这个时候，内核才分配A这块内存对应的物理页。也就是说，如果用malloc分配了A这块内容，然后从来不访问它，那么，A对应的物理页是不会被分配的。
进程调用B=malloc(40K)以后，内存空间如图中（3）所示。

情况二：malloc大于128k的内存

情况二、malloc大于128k的内存，使用mmap分配内存，在堆和栈之间找一块空闲内存分配(对应独立内存，而且初始化为0)，如下图：

进程调用C=malloc(200K)以后，内存空间如图中（4）：默认情况下，malloc函数分配内存，如果请求内存大于128K（可由M_MMAP_THRESHOLD选项调节），那就不是去推_edata指针了，而是利用mmap系统调用，从堆和栈的中间分配一块虚拟内存。这样子做主要是因为，brk分配的内存需要等到高地址内存释放以后才能释放（例如，在B释放之前，A是不可能释放的，这就是内存碎片产生的原因，什么时候紧缩看下面），而mmap分配的内存可以单独释放。当然，还有其它的好处，也有坏处，再具体下去，有兴趣的同学可以去看glibc里面malloc的代码了。
进程调用D=malloc(100K)以后，内存空间如图中（5）；
进程调用free(C)以后，C对应的虚拟内存和物理内存一起释放

进程调用free(B)以后，如图中（7）所示：B对应的虚拟内存和物理内存都没有释放，因为只有一个_edata指针，如果往回推，那么D这块内存怎么办呢？当然，B这块内存，是可以重用的，如果这个时候再来一个40K的请求，那么malloc很可能就把B这块内存返回回去了。
进程调用free(D)以后，如图（8）所示：B和D连接起来，变成一块140K的空闲内存。
默认情况下：当最高地址空间的空闲内存超过128K（可由M_TRIM_THRESHOLD选项调节）时，执行内存紧缩操作（trim）。在上一个步骤free的时候，发现最高地址空闲内存超过128K，于是内存紧缩，变成图中（9）所示。

既然堆内内存brk和sbrk不能直接释放，为什么不全部使用 mmap 来分配，munmap直接释放呢？

既然堆内碎片不能直接释放，导致疑似“内存泄露”问题，为什么 malloc 不全部使用 mmap 来实现呢(mmap分配的内存可以会通过 munmap 进行 free ，实现真正释放)？而是仅仅对于大于 128k 的大块内存才使用 mmap ？

其实，进程向 OS 申请和释放地址空间的接口 sbrk/mmap/munmap 都是系统调用，频繁调用系统调用都比较消耗系统资源的。并且， mmap 申请的内存被 munmap 后，重新申请会产生更多的缺页中断。例如使用 mmap 分配 1M 空间，第一次调用产生了大量缺页中断 (1M/4K 次 ) ，当munmap 后再次分配 1M 空间，会再次产生大量缺页中断。缺页中断是内核行为，会导致内核态CPU消耗较大。另外，如果使用 mmap 分配小内存，会导致地址空间的分片更多，内核的管理负担更大。

同时堆是一个连续空间，并且堆内碎片由于没有归还 OS ，如果可重用碎片，再次访问该内存很可能不需产生任何系统调用和缺页中断，这将大大降低 CPU 的消耗。因此， glibc 的 malloc 实现中，充分考虑了 sbrk 和 mmap 行为上的差异及优缺点，默认分配大块内存 (128k) 才使用 mmap 获得地址空间，也可通过 mallopt(M_MMAP_THRESHOLD, <SIZE>) 来修改这个临界值。

参考博客：

https://blog.csdn.net/Hungxum/article/details/92062666?d=1568702329780

https://blog.csdn.net/yusiguyuan/article/details/39496057