文章目录

  • 问题抛出
  • 结构设计
    • 实际节点
  • 基本操作
      • 重新编码
      • 解析数据
      • 重新分配空间
      • 接入数据

问题抛出

用过 Python 的列表吗?就是那种可以存储任意类型数据的,支持随机读取的数据结构。
没有用过的话那就没办法了。

本质上这种列表可以使用数组、链表作为其底层结构,不知道Python中的列表是以什么作为底层结构的。
但是redis的列表既不是用链表,也不是用数组作为其底层实现的,原因也显而易见:数组不方便,弄个二维的?柔性的?怎么写?链表可以实现,通用链表嘛,数据域放 void* 就可以实现列表功能。但是,链表的缺点也很明显,容易造成内存碎片。

在这个大环境下,秉承着“能省就省”的指导思想,请你设计一款数据结构。


结构设计

这个图里要注意,右侧是没有记录“当前元素的大小”的

这个图挺详细哈,都省得我对每一个字段释义了,整挺好。

其他话,文件开头的注释也讲的很清楚了。(ziplist.c)

/* The ziplist is a specially encoded dually linked list that is designed* to be very memory efficient. It stores both strings and integer values,* where integers are encoded as actual integers instead of a series of* characters. It allows push and pop operations on either side of the list* in O(1) time. However, because every operation requires a reallocation of* the memory used by the ziplist, the actual complexity is related to the* amount of memory used by the ziplist.** ----------------------------------------------------------------------------** ZIPLIST OVERALL LAYOUT* ======================** The general layout of the ziplist is as follows:** <zlbytes> <zltail> <zllen> <entry> <entry> ... <entry> <zlend>** NOTE: all fields are stored in little endian, if not specified otherwise.** <uint32_t zlbytes> is an unsigned integer to hold the number of bytes that* the ziplist occupies, including the four bytes of the zlbytes field itself.* This value needs to be stored to be able to resize the entire structure* without the need to traverse it first.** <uint32_t zltail> is the offset to the last entry in the list. This allows* a pop operation on the far side of the list without the need for full* traversal.** <uint16_t zllen> is the number of entries. When there are more than* 2^16-2 entries, this value is set to 2^16-1 and we need to traverse the* entire list to know how many items it holds.** <uint8_t zlend> is a special entry representing the end of the ziplist.* Is encoded as a single byte equal to 255. No other normal entry starts* with a byte set to the value of 255.** ZIPLIST ENTRIES* ===============** Every entry in the ziplist is prefixed by metadata that contains two pieces* of information. First, the length of the previous entry is stored to be* able to traverse the list from back to front. Second, the entry encoding is* provided. It represents the entry type, integer or string, and in the case* of strings it also represents the length of the string payload.* So a complete entry is stored like this:** <prevlen> <encoding> <entry-data>** Sometimes the encoding represents the entry itself, like for small integers* as we'll see later. In such a case the <entry-data> part is missing, and we* could have just:** <prevlen> <encoding>** The length of the previous entry, <prevlen>, is encoded in the following way:* If this length is smaller than 254 bytes, it will only consume a single* byte representing the length as an unsinged 8 bit integer. When the length* is greater than or equal to 254, it will consume 5 bytes. The first byte is* set to 254 (FE) to indicate a larger value is following. The remaining 4* bytes take the length of the previous entry as value.** So practically an entry is encoded in the following way:** <prevlen from 0 to 253> <encoding> <entry>** Or alternatively if the previous entry length is greater than 253 bytes* the following encoding is used:** 0xFE <4 bytes unsigned little endian prevlen> <encoding> <entry>** The encoding field of the entry depends on the content of the* entry. When the entry is a string, the first 2 bits of the encoding first* byte will hold the type of encoding used to store the length of the string,* followed by the actual length of the string. When the entry is an integer* the first 2 bits are both set to 1. The following 2 bits are used to specify* what kind of integer will be stored after this header. An overview of the* different types and encodings is as follows. The first byte is always enough* to determine the kind of entry.** |00pppppp| - 1 byte*      String value with length less than or equal to 63 bytes (6 bits).*      "pppppp" represents the unsigned 6 bit length.* |01pppppp|qqqqqqqq| - 2 bytes*      String value with length less than or equal to 16383 bytes (14 bits).*      IMPORTANT: The 14 bit number is stored in big endian.* |10000000|qqqqqqqq|rrrrrrrr|ssssssss|tttttttt| - 5 bytes*      String value with length greater than or equal to 16384 bytes.*      Only the 4 bytes following the first byte represents the length*      up to 2^32-1. The 6 lower bits of the first byte are not used and*      are set to zero.*      IMPORTANT: The 32 bit number is stored in big endian.* |11000000| - 3 bytes*      Integer encoded as int16_t (2 bytes).* |11010000| - 5 bytes*      Integer encoded as int32_t (4 bytes).* |11100000| - 9 bytes*      Integer encoded as int64_t (8 bytes).* |11110000| - 4 bytes*      Integer encoded as 24 bit signed (3 bytes).* |11111110| - 2 bytes*      Integer encoded as 8 bit signed (1 byte).* |1111xxxx| - (with xxxx between 0000 and 1101) immediate 4 bit integer.*      Unsigned integer from 0 to 12. The encoded value is actually from*      1 to 13 because 0000 and 1111 can not be used, so 1 should be*      subtracted from the encoded 4 bit value to obtain the right value.* |11111111| - End of ziplist special entry.** Like for the ziplist header, all the integers are represented in little* endian byte order, even when this code is compiled in big endian systems.** EXAMPLES OF ACTUAL ZIPLISTS* ===========================** The following is a ziplist containing the two elements representing* the strings "2" and "5". It is composed of 15 bytes, that we visually* split into sections:**  [0f 00 00 00] [0c 00 00 00] [02 00] [00 f3] [02 f6] [ff]*        |             |          |       |       |     |*     zlbytes        zltail    entries   "2"     "5"   end** The first 4 bytes represent the number 15, that is the number of bytes* the whole ziplist is composed of. The second 4 bytes are the offset* at which the last ziplist entry is found, that is 12, in fact the* last entry, that is "5", is at offset 12 inside the ziplist.* The next 16 bit integer represents the number of elements inside the* ziplist, its value is 2 since there are just two elements inside.* Finally "00 f3" is the first entry representing the number 2. It is* composed of the previous entry length, which is zero because this is* our first entry, and the byte F3 which corresponds to the encoding* |1111xxxx| with xxxx between 0001 and 1101. We need to remove the "F"* higher order bits 1111, and subtract 1 from the "3", so the entry value* is "2". The next entry has a prevlen of 02, since the first entry is* composed of exactly two bytes. The entry itself, F6, is encoded exactly* like the first entry, and 6-1 = 5, so the value of the entry is 5.* Finally the special entry FF signals the end of the ziplist.** Adding another element to the above string with the value "Hello World"* allows us to show how the ziplist encodes small strings. We'll just show* the hex dump of the entry itself. Imagine the bytes as following the* entry that stores "5" in the ziplist above:** [02] [0b] [48 65 6c 6c 6f 20 57 6f 72 6c 64]** The first byte, 02, is the length of the previous entry. The next* byte represents the encoding in the pattern |00pppppp| that means* that the entry is a string of length <pppppp>, so 0B means that* an 11 bytes string follows. From the third byte (48) to the last (64)* there are just the ASCII characters for "Hello World".** ----------------------------------------------------------------------------** Copyright (c) 2009-2012, Pieter Noordhuis <pcnoordhuis at gmail dot com>* Copyright (c) 2009-2017, Salvatore Sanfilippo <antirez at gmail dot com>* All rights reserved.*/

看完了么?接下来就是基操阶段了,对于任何一种数据结构,基操无非增删查改。

实际节点

typedef struct zlentry {unsigned int prevrawlensize; /* Bytes used to encode the previous entry len*/unsigned int prevrawlen;     /* Previous entry len. */unsigned int lensize;        /* Bytes used to encode this entry type/len.For example strings have a 1, 2 or 5 bytesheader. Integers always use a single byte.*/unsigned int len;            /* Bytes used to represent the actual entry.For strings this is just the string lengthwhile for integers it is 1, 2, 3, 4, 8 or0 (for 4 bit immediate) depending on thenumber range. */unsigned int headersize;     /* prevrawlensize + lensize. */unsigned char encoding;      /* Set to ZIP_STR_* or ZIP_INT_* depending onthe entry encoding. However for 4 bitsimmediate integers this can assume a rangeof values and must be range-checked. */unsigned char *p;            /* Pointer to the very start of the entry, thatis, this points to prev-entry-len field. */
} zlentry;

基本操作

我觉得这张图还是要再摆一下:

这个图里要注意,右侧是没有记录“当前元素的大小”的

真实插入的是这个函数:

讲真,头皮有点发麻。那么我们等下还是用老套路,按步骤拆开来看。

/* Insert item at "p". */
unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) {size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen;unsigned int prevlensize, prevlen = 0;size_t offset;int nextdiff = 0;unsigned char encoding = 0;long long value = 123456789; /* initialized to avoid warning. Using a valuethat is easy to see if for some reasonwe use it uninitialized. */zlentry tail;/* Find out prevlen for the entry that is inserted. */if (p[0] != ZIP_END) {ZIP_DECODE_PREVLEN(p, prevlensize, prevlen);} else {unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl);if (ptail[0] != ZIP_END) {prevlen = zipRawEntryLength(ptail);}}/* See if the entry can be encoded */if (zipTryEncoding(s,slen,&value,&encoding)) {/* 'encoding' is set to the appropriate integer encoding */reqlen = zipIntSize(encoding);} else {/* 'encoding' is untouched, however zipStoreEntryEncoding will use the* string length to figure out how to encode it. */reqlen = slen;}/* We need space for both the length of the previous entry and* the length of the payload. */reqlen += zipStorePrevEntryLength(NULL,prevlen);reqlen += zipStoreEntryEncoding(NULL,encoding,slen);/* When the insert position is not equal to the tail, we need to* make sure that the next entry can hold this entry's length in* its prevlen field. */int forcelarge = 0;nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0;if (nextdiff == -4 && reqlen < 4) {nextdiff = 0;forcelarge = 1;}/* Store offset because a realloc may change the address of zl. */offset = p-zl;zl = ziplistResize(zl,curlen+reqlen+nextdiff);p = zl+offset;/* Apply memory move when necessary and update tail offset. */if (p[0] != ZIP_END) {/* Subtract one because of the ZIP_END bytes */memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff);/* Encode this entry's raw length in the next entry. */if (forcelarge)zipStorePrevEntryLengthLarge(p+reqlen,reqlen);elsezipStorePrevEntryLength(p+reqlen,reqlen);/* Update offset for tail */ZIPLIST_TAIL_OFFSET(zl) =intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen);/* When the tail contains more than one entry, we need to take* "nextdiff" in account as well. Otherwise, a change in the* size of prevlen doesn't have an effect on the *tail* offset. */zipEntry(p+reqlen, &tail);if (p[reqlen+tail.headersize+tail.len] != ZIP_END) {ZIPLIST_TAIL_OFFSET(zl) =intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff);}} else {/* This element will be the new tail. */ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl);}/* When nextdiff != 0, the raw length of the next entry has changed, so* we need to cascade the update throughout the ziplist */if (nextdiff != 0) {offset = p-zl;zl = __ziplistCascadeUpdate(zl,p+reqlen);p = zl+offset;}/* Write the entry */p += zipStorePrevEntryLength(p,prevlen);p += zipStoreEntryEncoding(p,encoding,slen);if (ZIP_IS_STR(encoding)) {memcpy(p,s,slen);} else {zipSaveInteger(p,value,encoding);}ZIPLIST_INCR_LENGTH(zl,1);return zl;
}

对“链表”插入数据有几个步骤?
1、偏移
2、插进去
3、缝合

那这个“列表”,比较特殊一点,特殊在哪里?特殊在它比较紧凑,而且数据类型,其实也就两种,要么integer,要么string。所以它的步骤是?
1、数据重新编码
2、解析数据并分配空间
3、接入数据


重新编码

什么是重新编码?插入一个元素,是不是需要对:“前一个元素的大小、本身大小、当前元素编码” 这些数据进行一个统计,然后一并插入。就编这个。

插入位置无非三个,头中尾。
头:前一个元素大小为0,因为前面没有元素。
中:待插入位置后一个元素记录的“前一个元素大小”,当然,之后本身大小就成为了后一个元素眼中的“前一个元素大小”。
尾:那就要把三个字段加起来了。

具体怎么重新编码就不看了吧,这篇本来就已经很长了。


解析数据

再往下就是解析数据了。
首先尝试将数据解析为整数,如果可以解析,就按照压缩列表整数类型编码存储;如果解析失败,就按照压缩列表字节数组类型编码存储。

解析之后,数值存储在 value 中,编码格式存储在 encoding中。如果解析成功,还要计算整数所占字节数。变量 reqlen 存储当前元素所需空间大小,再累加其他两个字段的空间大小,就是本节点所需空间大小了。


重新分配空间

看注释这架势,咋滴,还存在没地方给它塞?

来我们看看。

这里的分配空间不是简单的就新插进来的数据多少空间就分配多少,如果没有仔细阅读上面那段英文的话,嗯,可以选择绕回去仔细阅读一下那个节点组成。特别是那个:

/*
* The length of the previous entry, <prevlen>, is encoded in the following way:
* If this length is smaller than 254 bytes, it will only consume a single
* byte representing the length as an unsinged 8 bit integer. When the length
* is greater than or equal to 254, it will consume 5 bytes. The first byte is
* set to 254 (FE) to indicate a larger value is following. The remaining 4
* bytes take the length of the previous entry as value.
*/

所以这个 previous 就是个不确定因素。有可能人家本来是 1 1 排列的,中间插进来一个之后变成 1 1 5 排列了;也有可能人家是1 5 排列的、5 1 排列的,总之就是不确定。

所以,在 entryX 的位置插入一个数据之后,entryX+1 的 previous 可能不变,可能加四,也可能减四,谁也说不准。说不准那不就得测一下嘛。所以就测一下,仅此而已。


接入数据

数据怎么接入?鉴于这里真心不是链表,是列表。
所以,按数组那一套来。对。

很麻烦吧。其实不麻烦,你在redis里见过它给你中间插入的机会了吗?更不要说头插了,你见过它给你头插的机会了吗?

插个题外话:大数据插入时,数组不一定输给链表。在尾插的时候,数组的优势是远超链表的(当然,仅限于尾插)。在我两个月前的博客里有做过这一系列的实验。


删就不写了吧,增的逆操作,从系列开始就没写过删。不过这里删就不可避免的大量数据进行复制了(如果不真删,只是做个删除标志呢?这样会省时间,但是时候会造成内存碎片化。不过可以设计一个定期调整内存的函数,比方说重用三分之一的块之后紧凑一下?内存不够用的时候紧凑一下?STL就是这么干的)。


查也没啥好讲的了吧,这个数据结构的应用场景一般就是对键进行检索,这里就是个值,不一样的是这个值是一串的。
所以除了提供原有的前后向遍历之外,还提供了 range 查询,不难的。

【redis源码学习】redis 专属“链表”:ziplist相关推荐

  1. 【Redis学习笔记】2018-05-30 Redis源码学习之Ziplist、Server

    作者:施洪宝 顺风车运营研发团队 一. 压缩列表 压缩列表是Redis的关键数据结构之一.目前已经有大量的相关资料,下面几个链接都已经对Ziplist进行了详细的介绍. http://origin.r ...

  2. redis源码学习笔记目录

    Redis源码分析(零)学习路径笔记 Redis源码分析(一)redis.c //redis-server.c Redis源码分析(二)redis-cli.c Redis源码剖析(三)--基础数据结构 ...

  3. Redis源码学习(20),学习感悟

      最近学习Redis源码也有半个月的时间了,有不少收获也有不少感悟,今天来好好聊聊我学习的感悟. 1 发现问题   人非圣贤孰能无过,只要是人难免会犯错,回顾我之前的学习历程,其实是可以发现不少的问 ...

  4. 结合redis设计与实现的redis源码学习-2-SDS(简单动态字符串)

    上一次我们学习了redis的内存分配方式,今天我们来学习redis最基本的数据结构SDS,在redis的数据库里,包含字符产值的简直对在底层都是由SDS实现的. SDS的基本数据结构是sdshdr结构 ...

  5. 【redis源码学习】simple dynamic strings(简单动态字符串 sds)

    文章目录 接 化 sds 结构分析 基本操作 创建字符串 释放字符串 sdsMakeRoomFor 扩容 小tip:`__attribute__ ((__packed__))` 发 接 阅读源码之前, ...

  6. Redis源码学习(6),t_list.c 学习(一),rpush、lpush命令实现学习

    前言   大体学习完t_string.c的代码,我们正式进入下一个文件的学习,这一次我们学习的是t_list.c文件,从文件名我们可以知道这是一个关于列表的相关命令的源代码.   在学习列表命令之前, ...

  7. 【redis源码学习】redisObject

    使用的是redis6.0.6版本,因为我第一次接触 redis 时它就是这个最新稳定版. 文章目录 robj 数据类型 编码类型 随机应变的对象编码 回到robj robj redis中的数据对象 s ...

  8. Redis源码学习(10),t_hash.c 学习(一),hset、hmset 命令学习

       学习完 t_string.c.t_list.c文件后,现在开始学习 t_hash.c 的代码,从文件名可以看到是相关hash的相关命令代码. 1 hsetCommand 1.1 方法说明    ...

  9. Redis源码学习-MasterSlave的命令交互

    0. 写在前面 Version Redis2.2.2 Redis中可以支持主从结构,本文主要从master和slave的心跳机制出发(PING),分析redis的命令行交互. 在Redis中,serv ...

最新文章

  1. 如何高效地逛Github?
  2. SqlDataReader执行带输出参数存储过程 错误分析
  3. 基础SQL面试题(3)
  4. xss实例-输出在script/script之间的情况
  5. mysql 笔试题_MySQL笔试题详解(一)(中等难度)
  6. error: style attribute '@android:attr/windowEnterAnimation' not found.
  7. 微软借Bletchley项目将云计算信息加入区块链
  8. QString::arg()//用字符串变量参数依次替代字符串中最小数值
  9. redis 命令行 操作
  10. IBM MQ - 连接远程队列管理器报AMQ4036错误
  11. OpenCV中的reshape
  12. 计算机毕业设计asp.net企业差旅管理系统(源码+系统+mysql数据库+Lw文档)
  13. HTML+CSS小米注册登录界面
  14. CSS_class标签
  15. 思科交换机接口配置trunk_思科交换机虚拟串口配置VLAN Trunk的步骤
  16. 【毕业设计】单片机 火灾智能报警系统 - 嵌入式 物联网
  17. python format( )强力格式化
  18. THINKPHP网站漏洞怎么修复解决
  19. 发一些收藏的收费MD5
  20. python绘制动态Julia集,超炫酷

热门文章

  1. android当无线鼠标代码,如何将Android手机用作计算机的无线鼠标 | MOS86
  2. 计算机科学与技术 金海,金海-中国科学院大学-UCAS
  3. 维护条件记录_销项税(MWST)
  4. 腾讯QQ会员技术团队:以手机QQ会员H5加速为例,为你揭开sonic技术内幕
  5. OrangePi PC 玩Linux主线内核踩坑之旅(一)之制作第一个镜像
  6. [慈溪2010]卡布列克常数
  7. 【Kindle DXG】Kindle DXG的使用方法和技巧 (2013-07-19 11:18:19)
  8. 乘着开源云的翅膀 | 从“铁信云”看中国铁路总公司转型之路
  9. 给每一个没写完作业却毫不顾忌地玩游戏的学生,请静下心来看一看
  10. arduino ide+esp32-cam基于点灯科技blinker实现手机监控