
1.课程官网:6.830/6.814: Database Systems




SimpleDB consists of:

  • Classes that represent fields, tuples, and tuple schemas;(字段、元组(即记录)、模式)
  • Classes that apply predicates and conditions to tuples;(描述元组)
  • One or more access methods (e.g., heap files) that store relations on disk and provide a way to iterate through tuples
    of those relations;(访问元组)
  • A collection of operator classes (e.g., select, join, insert, delete, etc.) that process tuples;(CRUD)
  • A buffer pool that caches active tuples and pages in memory and handles concurrency control and transactions (neither
    of which you need to worry about for this lab); and,(实现buffer pool及并发控制、事务)
  • A catalog that stores information about available tables and their schemas.

整个实验一共有6个lab,通过每一个lab的代码去实现一个简单的数据库,主要有:数据库的组织架构(字段、元组、模式、buffer pool等)、sql boy最爱的CRUD的实现、查询优化、事务与并发控制、崩溃与故障恢复。刚做完第一个lab,记录一下方便后面复习。



1.Tuple:元组,数据库上把一个有n列的table称作n元组,一个Tuple有多个字段。通俗来讲,一条记录就是一个元组,在该实验中体现为一个Tuple类的实例。一个Tuple由以下部分组成:a.TupleDesc:该元组的描述信息;b.fields:该记录各个字段的类型与值; c.RecordId:该记录在磁盘的位置。


3.Catalog:Catalog是仅次于DataBase的抽象概念,一个DataBase可以有多个Catalog,一个Catalog有多个Schema,一个Schema有多张table,不过该数据库没有太过区分这三个概念,而在MySQL中也是用Schema来表示整个数据库包含的多张table。该lab中实现了一个Table类,并在Catalog类中使用一个HashMap来存放table id与table的映射关系。

4.BufferPool:BufferPool的基本单位是Page,每次从磁盘中(这里表现为DbFile)读取数据页到BufferPool,在数据库上的crud操作都是在Buffer Pool的Page中进行的(所以有脏页、故障恢复等)。该数据库的BufferPool默认是缓冲50个Page,每个Page的默认大小是4096bytes即4kb。lab1主要是实现getPage方法,从BufferPool中获取Page,如果获取不到,就在磁盘中获取,并保存到BufferPool中。当然,当BufferPool满了之后,需要有淘汰策略,后续的lab会实现,应该是使用LRU算法来做的。


6.SeqScan:全表顺序扫描的实现,相当于select * from my_table.




BufferPool:实际进行crud操作的地方,buffer pool以page为单位从磁盘中读入。




Tuples in SimpleDB are quite basic. They consist of a collection of Field objects, one per field in the Tuple. Field is an interface that different data types (e.g., integer, string) implement. Tuple objects are created by the underlying access methods (e.g., heap files, or B-trees), as described in the next section. Tuples also have a type (or schema), called a tuple descriptor, represented by a TupleDesc object. This object consists of a collection of Type objects, one per field in the tuple, each of which describes the type of the corresponding field.








The catalog (class Catalog in SimpleDB) consists of a list of the tables and schemas of the tables that are currently
in the database. You will need to support the ability to add a new table, as well as getting information about a
particular table. Associated with each table is a TupleDesc object that allows operators to determine the types and
number of fields in a table.

The global catalog is a single instance of Catalog that is allocated for the entire SimpleDB process. The global
catalog can be retrieved via the method Database.getCatalog(), and the same goes for the global buffer pool (
using Database.getBufferPool()).

exercise2主要是实现Catalog.java这个类,这里的关键是在Catalog里实现一个Table类,来存放一张表格的信息;然后一个CataLog有多张表格,在Catalog里我们可以用map来存储tableid与table的映射关系;为了方便操作,我还多创建了一个存放table name与 table id映射关系的map:


The buffer pool (class BufferPool in SimpleDB) is responsible for caching pages in memory that have been recently read from disk. All operators read and write pages from various files on disk through the buffer pool. It consists of a fixed number of pages, defined by the numPages parameter to the BufferPool constructor. In later labs, you will implement an eviction policy(淘汰策略). For this lab, you only need to implement the constructor and the BufferPool.getPage() method used by the SeqScan operator. The BufferPool should store up to numPages pages. For this lab, if more than numPages requests are made for different pages, then instead of implementing an eviction policy, you may throw a DbException. In future labs you will be required to implement an eviction policy.



A HeapFile object is arranged into a set of pages, each of which consists of a fixed number of bytes for storing
tuples, (defined by the constant BufferPool.DEFAULT_PAGE_SIZE), including a header. In SimpleDB, there is
one HeapFile object for each table in the database(一个HeapFile对应一张表). Each page in a HeapFile is arranged as a set of slots, each of
which can hold one tuple (tuples for a given table in SimpleDB are all of the same size). In addition to these slots,
each page has a header that consists of a bitmap with one bit per tuple slot. If the bit corresponding to a particular
tuple is 1, it indicates that the tuple is valid; if it is 0, the tuple is invalid (e.g., has been deleted or was never
initialized.) Pages of HeapFile objects are of type HeapPage which implements the Page interface. Pages are
stored in the buffer pool but are read and written by the HeapFile class.


  • src/java/simpledb/storage/HeapPageId.java
  • src/java/simpledb/storage/RecordId.java
  • src/java/simpledb/storage/HeapPage.java



_tuples per page_ = floor((_page size_ * 8) / (_tuple size_ * 8 + 1))

headerBytes = ceiling(tupsPerPage/8)








    /*** HeapFile迭代器,用于遍历HeapFile的所有tuple;* 需要使用上BufferPool.getPage(),注意一次不能读出HeapFile的所有tuples,不然会出现OOM*/private static class HeapFileIterator implements DbFileIterator {private final TransactionId tid;private final HeapFile file;private Iterator<Tuple> it;private int pageNo;public HeapFileIterator(TransactionId tid, HeapFile file) {this.tid = tid;this.file = file;}@Overridepublic void open() throws DbException, TransactionAbortedException {pageNo = 0;it = getTupleIterator(pageNo);}/*** 根据pageNo从buffer pool 或者磁盘读出HeapPage并返回其tuple的迭代器* @param pageNo* @return* @throws TransactionAbortedException* @throws DbException*/private Iterator<Tuple> getTupleIterator(int pageNo) throws TransactionAbortedException, DbException{if(pageNo >= 0 && pageNo < file.numPages()) {HeapPageId pid = new HeapPageId(file.getId(), pageNo);HeapPage page = (HeapPage) Database.getBufferPool().getPage(tid, pid, Permissions.READ_ONLY);if(page == null) throw new DbException("get iterator fail! pageNo #" + pageNo + "# is invalid!");return page.iterator();}throw new DbException("get iterator fail!!! pageNo #" + pageNo + "# is invalid!");}@Overridepublic boolean hasNext() throws DbException, TransactionAbortedException {//需要先判断文件有没有打开if(it == null) return false;if(pageNo >= file.numPages()) return false;if(!it.hasNext() && pageNo == file.numPages() - 1) return false;return true;}@Overridepublic Tuple next() throws DbException, TransactionAbortedException, NoSuchElementException {if(it == null) throw new NoSuchElementException("file not open!");if(!it.hasNext()) {if(pageNo < file.numPages() - 1) {pageNo ++;it = getTupleIterator(pageNo);return it.next();}else {return null;}}return it.next();}@Overridepublic void rewind() throws DbException, TransactionAbortedException {close();open();}@Overridepublic void close() {it = null;}}











报告撰写时间:10.03 14点到16点

