我在想要将gem5中建立好的内核利用mcpat进行统计功耗以及面积时,遇到了一些问题,进行一些记录以及对应的解决方案。
我的目录是这样的:

一、文件目录

gem5/|__其他文件||__path             #存放输出的信息以及mcpat相关内容| |__out          #存放输出信息| |   |__test|    |       |__config.ini           |   |       |__config.json| |       |__stats.txt|   |       |__mcpat-out.xml            #后续输出内容| |       |__mcpat.log                #后续输出内容| ||  |__gem5tomcpat| |   |__template-xeon.xml            #xml模板|  |   |__GEM5ToMcPAT.py               #生成mcpat-out.xml以及mcpat.log的python程序|    |   |__其他文件|    ||  |__MCPAT|   |   |__mcpat|   |       |__mcpat                    #可执行文件|  |       |__其他文件

二、搭建步骤

按照上述的目录建立文件目录,也可以用自己的,后面改改代码里的这个路径就行了。其中out下内容为每次使用gem5仿真时自动建立的文件,添加相应的参数即可,如添加:

-d path/out/test

就会在out目录下建立test目录,config.ini、config.json、stats.txt都是仿真的结果。
MCPAT文件中的mcpat需要在mcpat的官网进行下载、安装,出错的话可能需要安装:

sudo apt-get install gcc-multilib
sudo apt-get install g++-multilib

重点是gem5tomcpat中的内容,主要有两个问题:
1、由于gem5的改动,gem5tomcpat官网以及网络上我能找到的一些资料中template-xeon.xml并不能适用于目前的gem5版本,需要自己一个一个去修改,比较麻烦。
2、目前需要自己调相应的参数,才能比较好的适配相关论文中的面积。
针对这两个点,我调整了相应的template-xeon.xml,并且在GEM5ToMcPAT.py中修改以及增加了一部分代码,使得可以一次性生成mcpat.log,简化了流程。但是目前还是存在问题,mcpat.log中的面积和论文中相比个部分比例大致相当,但是在同一工艺水平下,整体扩大了10倍,暂时未找到相应解决办法。提醒:适用于基于DerivO3CPU上改动的模型!!!,因为minorcpu以及其他的中很多参数都没有或者参数名字不一样,需要自己去调整。

其中template-xeon.xml以及GEM5ToMcPAT.py的代码如下:
template-xeon.xml:代码更新了一下,之前传错了一个,更新时间2022/3/15

This XML file does not appear to have any style information associated with it. The document tree is shown below.
<component id="root" name="root">
<component id="system" name="system">
<!-- McPAT will skip the components if number is set to 0  -->
<param name="number_of_cores" value="1"/>
<param name="number_of_L1Directories" value="0"/>
<param name="number_of_L2Directories" value="0"/>
<param name="number_of_L2s" value="1"/>
<!--  This number means how many L2 clusters in each cluster there can be multiple banks/ports  -->
<param name="Private_L2" value="0"/>
<!-- 1 Private, 0 shared/coherent  -->
<param name="number_of_L3s" value="0"/>
<!--  This number means how many L3 clusters  -->
<param name="number_of_NoCs" value="0"/>
<param name="homogeneous_cores" value="1"/>
<!-- 1 means homo  -->
<param name="homogeneous_L2s" value="1"/>
<param name="homogeneous_L1Directories" value="1"/>
<param name="homogeneous_L2Directories" value="1"/>
<param name="homogeneous_L3s" value="1"/>
<param name="homogeneous_ccs" value="1"/>
<!-- cache coherence hardware  -->
<param name="homogeneous_NoCs" value="1"/>
<param name="core_tech_node" value="45"/>
<!--  nm  -->
<param name="target_core_clockrate" value="1e-6/config.system.cpu_clk_domain.clock"/>
<!-- MHz  -->
<param name="temperature" value="380"/>
<!--  Kelvin  -->
<param name="number_cache_levels" value="1"/>
<param name="interconnect_projection_type" value="1"/>
<!-- 0: aggressive wire technology; 1: conservative wire technology  -->
<param name="device_type" value="1"/>
<!-- 0: HP(High Performance Type); 1: LSTP(Low standby power) 2: LOP (Low Operating Power)   -->
<param name="longer_channel_device" value="1"/>
<!--  0 no use; 1 use when possible  -->
<param name="power_gating" value="0"/>
<!--  0 not enabled; 1 enabled  -->
<param name="machine_bits" value="64"/>
<param name="Embedded" value="1"/>
<param name="virtual_address_width" value="32"/>
<param name="physical_address_width" value="40"/>
<param name="virtual_memory_page_size" value="4096"/>
<!--  address width determines the tag_width in Cache, LSQ and buffers in cache controller default value is machine_bits, if not set  -->
<stat name="total_cycles" value="stats.system.cpu.numCycles"/>
<stat name="idle_cycles" value="stats.system.cpu.idleCycles"/>
<stat name="busy_cycles" value="stats.system.cpu.numCycles - stats.system.cpu.idleCycles"/>
<!-- This page size(B) is complete different from the page size in Main memo section. this page size is the size of virtual memory from OS/Archi perspective; the page size in Main memo section is the actual physical line in a DRAM bank   -->
<!--  *********************** cores *******************  -->
<component id="system.core0" name="core0">
<!--  Core property  -->
<param name="clock_rate" value="1e-6/config.system.cpu_clk_domain.clock"/>
<param name="vdd" value="1.25"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="opt_local" value="0"/>
<!--  for cores with unknown timing, set to 0 to force off the opt flag  -->
<param name="instruction_length" value="32"/>
<param name="opcode_width" value="7"/>
<param name="x86" value="0"/>
<param name="micro_opcode_width" value="8"/>
<param name="machine_type" value="0"/>
<!--  inorder/OoO; 1 inorder; 0 OOO -->
<param name="number_hardware_threads" value="config.system.cpu.numThreads"/>
<!--  number_instruction_fetch_ports(icache ports) is always 1 in single-thread processor,it only may be more than one in SMT processors. BTB ports always equals to fetch ports since branch information in consecutive branch instructions in the same fetch group can be read out from BTB once. -->
<param name="fetch_width" value="config.system.cpu.fetchWidth"/>
<!--  fetch_width determines the size of cachelines of L1 cache block  -->
<param name="number_instruction_fetch_ports" value="1"/>
<param name="decode_width" value="config.system.cpu.decodeWidth"/>
<!--  decode_width determines the number of ports of the renaming table (both RAM and CAM) scheme  -->
<param name="issue_width" value="config.system.cpu.issueWidth"/>
<param name="peak_issue_width" value="config.system.cpu.issueWidth"/>
<!--  issue_width determines the number of ports of Issue window and other logic as in the complexity effective processors paper; issue_width==dispatch_width  -->
<param name="commit_width" value="config.system.cpu.commitWidth"/>
<!--  commit_width determines the number of ports of register files  -->
<param name="fp_issue_width" value="2"/>
<param name="prediction_width" value="1"/>
<!--  number of branch instructions can be predicted simultaneously -->
<!--  Current version of McPAT does not distinguish int and floating point pipelines Theses parameters are reserved for future use. -->
<param name="pipelines_per_core" value="1,1"/>
<!-- integer_pipeline and floating_pipelines, if the floating_pipelines is 0, then the pipeline is shared -->
<param name="pipeline_depth" value="7,7"/>
<!--  pipeline depth of int and fp, if pipeline is shared, the second number is the average cycles of fp ops  -->
<!--  issue and exe unit -->
<param name="ALU_per_core" value="1"/>
<!--  contains an adder, a shifter, and a logical unit  -->
<param name="MUL_per_core" value="1"/>
<!--  For MUL and Div  -->
<param name="FPU_per_core" value="1"/>
<!--  buffer between IF and ID stage  -->
<param name="instruction_buffer_size" value="32"/>
<!--  buffer between ID and sche/exe stage  -->
<param name="decoded_stream_buffer_size" value="16"/>
<param name="instruction_window_scheme" value="0"/>
<!--  0 PHYREG based, 1 RSBASED -->
<!--  McPAT support 2 types of OoO cores, RS based and physical reg based -->
<param name="instruction_window_size" value="24"/>
<param name="fp_instruction_window_size" value="8"/>
<!--  the instruction issue Q as in Alpha 21264; The RS as in Intel P6  -->
<param name="ROB_size" value="config.system.cpu.numROBEntries"/>
<!--  each in-flight instruction has an entry in ROB  -->
<!--  registers  -->
<param name="archi_Regs_IRF_size" value="16"/>
<!--  X86-64 has 16GPR  -->
<param name="archi_Regs_FRF_size" value="32"/>
<!--  MMX + XMM  -->
<!--   if OoO processor, phy_reg number is needed for renaming logic, renaming logic is for both integer and floating point insts.   -->
<param name="phy_Regs_IRF_size" value="256"/>
<param name="phy_Regs_FRF_size" value="256"/>
<!--  rename logic  -->
<param name="rename_scheme" value="0"/>
<!--  can be RAM based(0) or CAM based(1) rename scheme RAM-based scheme will have free list, status table;CAM-based scheme have the valid bit in the data field of the CAM both RAM and CAM need RAM-based checkpoint table, checkpoint_depth=# of in_flight instructions;Detailed RAT Implementation see TR  -->
<param name="register_windows_size" value="0"/>
<!--  how many windows in the windowed register file, sun processors;no register windowing is used when this number is 0  -->
<!--  In OoO cores, loads and stores can be issued whether inorder(Pentium Pro) or (OoO)out-of-order(Alpha),They will always try to execute out-of-order though.  -->
<param name="LSU_order" value="outoforder"/>
<param name="store_buffer_size" value="config.system.cpu.SQEntries"/>
<!--  By default, in-order cores do not have load buffers  -->
<param name="load_buffer_size" value="config.system.cpu.LQEntries"/>
<!--  number of ports refer to sustain-able concurrent memory accesses  -->
<param name="memory_ports" value="2"/>
<!--  max_allowed_in_flight_memo_instructions determines the # of ports of load and store bufferas well as the ports of Dcache which is connected to LSU  -->
<!--  dual-pumped Dcache can be used to save the extra read/write ports  -->
<param name="RAS_size" value="config.system.cpu.branchPred.RASSize"/>
<!--  general stats, defines simulation periods;require total, idle, and busy cycles for sanity check   -->
<!--  please note: if target architecture is X86, then all the instructions refer to (fused) micro-ops  -->
<stat name="total_instructions" value="stats.system.cpu.numInsts"/>
<stat name="int_instructions" value="stats.system.cpu.intAluAccesses"/>
<stat name="fp_instructions" value="stats.system.cpu.fpAluAccesses"/>
<stat name="branch_instructions" value="stats.system.cpu.numBranches"/>
<stat name="branch_mispredictions" value="stats.system.cpu.branchMispredicts"/>
<stat name="load_instructions" value="stats.system.cpu.numLoadInsts"/>
<stat name="store_instructions" value="stats.system.cpu.numStoreInsts"/>
<stat name="committed_instructions" value="stats.system.cpu.committedInsts"/>
<stat name="committed_int_instructions" value="stats.system.cpu.commit.integer"/>
<stat name="committed_fp_instructions" value="stats.system.cpu.commit.floating"/>
<stat name="pipeline_duty_cycle" value="1"/>
<!-- <=1, runtime_ipc/peak_ipc; averaged for all cores if homogeneous  -->
<!--  the following cycle stats are used for heterogeneous cores only, please ignore them if homogeneous cores  -->
<stat name="total_cycles" value="stats.system.cpu.numCycles"/>
<stat name="idle_cycles" value="stats.system.cpu.idleCycles"/>
<stat name="busy_cycles" value="stats.system.cpu.numCycles - stats.system.cpu.idleCycles"/>
<!--  instruction buffer stats  -->
<!--  ROB stats, both RS and Phy based OoOs have ROBperformance simulator should capture the difference on accesses,otherwise, McPAT has to guess based on number of committed instructions.  -->
<stat name="ROB_reads" value="stats.system.cpu.rob.reads"/>
<stat name="ROB_writes" value="stats.system.cpu.rob.writes"/>
<!--  RAT accesses  -->
<stat name="rename_reads" value="stats.system.cpu.rename.intLookups"/>
<!-- lookup in renaming logic  -->
<stat name="rename_writes" value="int(stats.system.cpu.rename.renamedOperands * stats.system.cpu.rename.intLookups / stats.system.cpu.rename.lookups)"/>
<!-- update dest regs. renaming logic  -->
<stat name="fp_rename_reads" value="stats.system.cpu.rename.fpLookups"/>
<stat name="fp_rename_writes" value="int(stats.system.cpu.rename.renamedOperands * stats.system.cpu.rename.fpLookups / stats.system.cpu.rename.lookups)"/>
<!--  decode and rename stage use this, should be total ic - nop  -->
<!--  Inst window stats  -->
<stat name="inst_window_reads" value="stats.system.cpu.intInstQueueReads"/>
<stat name="inst_window_writes" value="stats.system.cpu.intInstQueueWrites"/>
<stat name="inst_window_wakeup_accesses" value="stats.system.cpu.intInstQueueWakeupAccesses"/>
<stat name="fp_inst_window_reads" value="stats.system.cpu.fpInstQueueReads"/>
<stat name="fp_inst_window_writes" value="stats.system.cpu.fpInstQueueWrites"/>
<stat name="fp_inst_window_wakeup_accesses" value="stats.system.cpu.fpInstQueueWakeupAccesses"/>
<!--   RF accesses  -->
<stat name="int_regfile_reads" value="stats.system.cpu.intRegfileReads"/>
<stat name="float_regfile_reads" value="stats.system.cpu.fpRegfileReads"/>
<stat name="int_regfile_writes" value="stats.system.cpu.intRegfileWrites"/>
<stat name="float_regfile_writes" value="stats.system.cpu.fpRegfileWrites"/>
<!--  accesses to the working reg  -->
<stat name="function_calls" value="stats.system.cpu.commit.functionCalls"/>
<stat name="context_switches" value="stats.system.cpu.workload.numSyscalls"/>
<!--  Number of Windows switches (number of function calls and returns) -->
<!--  Alu stats by default, the processor has one FPU that includes the divider and multiplier. The fpu accesses should include accesses to multiplier and divider   -->
<stat name="ialu_accesses" value="stats.system.cpu.intAluAccesses"/>
<stat name="fpu_accesses" value="stats.system.cpu.fpAluAccesses"/>
<stat name="mul_accesses" value="0"/>
<stat name="cdb_alu_accesses" value="0"/>
<stat name="cdb_mul_accesses" value="0"/>
<stat name="cdb_fpu_accesses" value="0"/>
<!--  multiple cycle accesses should be counted multiple times, otherwise, McPAT can use internal counter for different floating point instructions to get final accesses. But that needs detailed info for floating point inst mix  -->
<!--   currently the performance simulator should make sure all the numbers are final numbers, including the explicit read/write accesses, and the implicit accesses such as replacements and etc.Future versions of McPAT may be able to reason the implicit accessbased on param and stats of last level cacheThe same rule applies to all cache access stats too!   -->
<!--  following is AF for max power computation. Do not change them, unless you understand them -->
<stat name="IFU_duty_cycle" value="0.25"/>
<!-- depends on Icache line size and instruction issue rate  -->
<stat name="LSU_duty_cycle" value="0.25"/>
<stat name="MemManU_I_duty_cycle" value="0.25"/>
<stat name="MemManU_D_duty_cycle" value="0.25"/>
<stat name="ALU_duty_cycle" value="1"/>
<stat name="MUL_duty_cycle" value="0.3"/>
<stat name="FPU_duty_cycle" value="0.3"/>
<stat name="ALU_cdb_duty_cycle" value="1"/>
<stat name="MUL_cdb_duty_cycle" value="0.3"/>
<stat name="FPU_cdb_duty_cycle" value="0.3"/>
<param name="number_of_BPT" value="2"/>
<component id="system.core0.predictor" name="PBT">
<!--  branch predictor; tournament predictor see Alpha implementation  -->
<param name="local_predictor_size" value="10,3"/>
<param name="local_predictor_entries" value="1024"/>
<param name="global_predictor_entries" value="4096"/>
<param name="global_predictor_bits" value="2"/>
<param name="chooser_predictor_entries" value="4096"/>
<param name="chooser_predictor_bits" value="2"/>
<!--  These parameters can be combined like below in next version<param name="load_predictor" value="10,3,1024"/><param name="global_predictor" value="4096,2"/><param name="predictor_chooser" value="4096,2"/>-->
</component>
<component id="system.core0.itlb" name="itlb">
<param name="number_entries" value="config.system.cpu.mmu.itb.size"/>
<stat name="total_accesses" value="stats.system.cpu.itb_walker_cache.tags.tagAccesses"/>
<stat name="total_misses" value="stats.system.cpu.mmu.itb.misses"/>
<stat name="conflicts" value="0"/>
<!--  there is no write requests to itlb although writes happen to itlb after miss, which is actually a replacement  -->
</component>
<component id="system.core0.icache" name="icache">
<!--  there is no write requests to itlb although writes happen to it after miss, which is actually a replacement  -->
<param name="icache_config" value="config.system.cpu.icache.size,config.system.cpu.icache.tags.block_size,config.system.cpu.icache.assoc,1,1,config.system.cpu.icache.response_latency,config.system.cpu.icache.tags.block_size,0"/>
<!--  the parameters are capacity,block_width, associativity, bank, throughput w.r.t. core clock, latency w.r.t. core clock,output_width, cache policy,   -->
<!--  cache_policy;//0 no write or write-though with non-write allocate;1 write-back with write-allocate  -->
<param name="buffer_sizes" value="config.system.cpu.icache.mshrs,config.system.cpu.icache.mshrs,config.system.cpu.icache.mshrs,config.system.cpu.icache.mshrs"/>
<!--  cache controller buffer sizes: miss_buffer_size(MSHR),fill_buffer_size,prefetch_buffer_size,wb_buffer_size -->
<stat name="read_accesses" value="stats.system.cpu.icache.ReadReq.accesses::total"/>
<stat name="read_misses" value="stats.system.cpu.icache.ReadReq.misses::total"/>
<stat name="conflicts" value="stats.system.cpu.icache.replacements"/>
</component>
<component id="system.core0.dtlb" name="dtlb">
<param name="number_entries" value="config.system.cpu.mmu.dtb.size"/>
<!-- dual threads -->
<stat name="total_accesses" value="stats.system.cpu.dtb_walker_cache.tags.tagAccesses"/>
<stat name="total_misses" value="stats.system.cpu.mmu.dtb.misses"/>
<stat name="conflicts" value="0"/>
</component>
<component id="system.core0.dcache" name="dcache">
<!--  all the buffer related are optional  -->
<param name="dcache_config" value="config.system.cpu.dcache.size,config.system.cpu.dcache.tags.block_size,config.system.cpu.dcache.assoc,1,1,config.system.cpu.dcache.response_latency,config.system.cpu.dcache.tags.block_size,0"/>
<param name="buffer_sizes" value="config.system.cpu.dcache.mshrs,config.system.cpu.dcache.mshrs,config.system.cpu.dcache.mshrs,config.system.cpu.dcache.mshrs"/>
<!--  cache controller buffer sizes: miss_buffer_size(MSHR),fill_buffer_size,prefetch_buffer_size,wb_buffer_size -->
<stat name="read_accesses" value="stats.system.cpu.dcache.ReadReq.accesses::total"/>
<stat name="write_accesses" value="stats.system.cpu.dcache.WriteReq.hits::total"/>
<stat name="read_misses" value="stats.system.cpu.dcache.ReadReq.misses::total"/>
<stat name="write_misses" value="stats.system.cpu.dcache.WriteReq.misses::total"/>
<stat name="conflicts" value="stats.system.cpu.dcache.replacements"/>
</component>
<param name="number_of_BTB" value="2"/>
<component id="system.core0.BTB" name="BTB">
<!--  all the buffer related are optional  -->
<param name="BTB_config" value="5120,4,2,1, 1,3"/>
<!-- should be 4096 + 1024  -->
<!--  the parameters are capacity,block_width,associativity,bank, throughput w.r.t. core clock, latency w.r.t. core clock, -->
<stat name="read_accesses" value="stats.system.cpu.branchPred.BTBLookups"/>
<!-- See IFU code for guideline  -->
<stat name="write_accesses" value="stats.system.cpu.commit.branches"/>
</component>
</component>
<component id="system.L1Directory0" name="L1Directory0">
<param name="Directory_type" value="1"/>
<!-- 0 cam based shadowed tag. 1 directory cache  -->
<param name="Dir_config" value="4096,2,0,1,100,100, 8"/>
<!--  the parameters are capacity,block_width, associativity,bank, throughput w.r.t. core clock, latency w.r.t. core clock, -->
<param name="buffer_sizes" value="8, 8, 8, 8"/>
<!--  all the buffer related are optional  -->
<param name="clockrate" value="1e-6/config.system.cpu_clk_domain.clock"/>
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="ports" value="1,1,1"/>
<!--  number of r, w, and rw search ports  -->
<param name="device_type" value="0"/>
<!--  although there are multiple access types, Performance simulator needs to cast them into reads or writese.g. the invalidates can be considered as writes  -->
<stat name="read_accesses" value="800000"/>
<stat name="write_accesses" value="27276"/>
<stat name="read_misses" value="1632"/>
<stat name="write_misses" value="183"/>
<stat name="conflicts" value="20"/>
</component>
<component id="system.L2Directory0" name="L2Directory0">
<param name="Directory_type" value="1"/>
<!-- 0 cam based shadowed tag. 1 directory cache  -->
<param name="Dir_config" value="524288,16,8,1,2, 100"/>
<!--  the parameters are capacity,block_width, associativity,bank, throughput w.r.t. core clock, latency w.r.t. core clock, -->
<param name="buffer_sizes" value="8, 8, 8, 8"/>
<!--  all the buffer related are optional  -->
<param name="clockrate" value="1e-6/config.system.cpu_clk_domain.clock"/>
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="ports" value="1,1,1"/>
<!--  number of r, w, and rw search ports  -->
<param name="device_type" value="0"/>
<!--  altough there are multiple access types, Performance simulator needs to cast them into reads or writese.g. the invalidates can be considered as writes  -->
<stat name="read_accesses" value="58824"/>
<stat name="write_accesses" value="27276"/>
<stat name="read_misses" value="1632"/>
<stat name="write_misses" value="183"/>
<stat name="conflicts" value="100"/>
</component>
<component id="system.L20" name="L20">
<!--  all the buffer related are optional  -->
<param name="L2_config" value="config.system.l2.size,config.system.cache_line_size,config.system.l2.assoc,config.system.l2.assoc, 2, config.system.l2.response_latency,16,1"/>
<!--  the parameters are capacity,block_width, associativity, bank, throughput w.r.t. core clock, latency w.r.t. core clock,output_width, cache policy  -->
<param name="buffer_sizes" value="config.system.l2.mshrs|16, config.system.l2.write_buffers|16, 16, 16"/>
<!--  cache controller buffer sizes: miss_buffer_size(MSHR),fill_buffer_size,prefetch_buffer_size,wb_buffer_size -->
<param name="clockrate" value="1500"/>
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="ports" value="1,1,1"/>
<!--  number of r, w, and rw ports  -->
<param name="device_type" value="0"/>
<!-- <stat name="read_accesses" value="stats.system.l2.ReadReq_accesses"/> -->
<!-- <stat name="write_accesses" value="stats.system.l2.ReadExReq_accesses"/> -->
<!-- <stat name="read_misses" value="stats.system.l2.ReadReq_misses"/> -->
<!-- <stat name="write_misses" value="stats.system.l2.ReadExReq_misses"/> -->
<!-- <stat name="conflicts" value="stats.system.l2.replacements"/>   -->
<stat name="duty_cycle" value="0.5"/>
</component>
<!-- ********************************************************************** -->
<component id="system.L30" name="L30">
<param name="L3_config" value="16777216,64,16, 16, 16, 100,1"/>
<!--  the parameters are capacity,block_width, associativity,bank, throughput w.r.t. core clock, latency w.r.t. core clock, -->
<param name="clockrate" value="850"/>
<param name="ports" value="1,1,1"/>
<!--  number of r, w, and rw ports  -->
<param name="device_type" value="0"/>
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="buffer_sizes" value="16, 16, 16, 16"/>
<!--  cache controller buffer sizes: miss_buffer_size(MSHR),fill_buffer_size,prefetch_buffer_size,wb_buffer_size -->
<stat name="read_accesses" value="11824"/>
<stat name="write_accesses" value="11276"/>
<stat name="read_misses" value="1632"/>
<stat name="write_misses" value="183"/>
<stat name="conflicts" value="0"/>
<stat name="duty_cycle" value="1"/>
</component>
<!-- ********************************************************************** -->
<component id="system.NoC0" name="noc0">
<param name="clockrate" value="1e-6/config.system.clk_domain.clock"/>
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="type" value="0"/>
<!-- 0:bus, 1:NoC , for bus no matter how many nodes sharing the busat each time only one node can send req  -->
<param name="horizontal_nodes" value="1"/>
<param name="vertical_nodes" value="1"/>
<param name="has_global_link" value="0"/>
<!--  1 has global link, 0 does not have global link  -->
<param name="link_throughput" value="1"/>
<!-- w.r.t clock  -->
<param name="link_latency" value="1"/>
<!-- w.r.t clock  -->
<!--  throughput >= latency  -->
<!--  Router architecture  -->
<param name="input_ports" value="1"/>
<param name="output_ports" value="1"/>
<!--  For bus the I/O ports should be 1  -->
<param name="flit_bits" value="256"/>
<param name="chip_coverage" value="1"/>
<!--  When multiple NOC present, one NOC will cover part of the whole chip. chip_coverage <=1  -->
<param name="link_routing_over_percentage" value="0.5"/>
<!--  Links can route over other components or occupy whole area.by default, 50% of the NoC global links routes over other components  -->
<stat name="total_accesses" value="100000"/>
<!--  This is the number of total accesses within the whole network not for each router  -->
<stat name="duty_cycle" value="1"/>
</component>
<!-- ********************************************************************** -->
<component id="system.mc" name="mc">
<!--  Memory controllers are for DDR(2,3...) DIMMs  -->
<!--  current version of McPAT uses published values for base parameters of memory controllerimprovements on MC will be added in later versions.  -->
<param name="type" value="0"/>
<!--  1: low power; 0 high performance  -->
<param name="mc_clock" value="200"/>
<!-- DIMM IO bus clock rate MHz -->
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="peak_transfer_rate" value="3200"/>
<!-- MB/S -->
<param name="block_size" value="64"/>
<!-- B -->
<param name="number_mcs" value="0"/>
<!--  current McPAT only supports homogeneous memory controllers  -->
<param name="memory_channels_per_mc" value="1"/>
<param name="number_ranks" value="2"/>
<param name="withPHY" value="0"/>
<!--  # of ranks of each channel -->
<param name="req_window_size_per_channel" value="32"/>
<param name="IO_buffer_size_per_channel" value="32"/>
<param name="databus_width" value="128"/>
<param name="addressbus_width" value="51"/>
<!--  McPAT will add the control bus width to the address bus width automatically  -->
<stat name="memory_accesses" value="33333"/>
<stat name="memory_reads" value="16667"/>
<stat name="memory_writes" value="16667"/>
<!--  McPAT does not track individual mc, instead, it takes the total accesses and calculate the average power per MC or per channel. This is sufficient for most application. Further track down can be easily added in later versions.  -->
</component>
<!-- ********************************************************************** -->
<component id="system.niu" name="niu">
<!--  On chip 10Gb Ethernet NIC, including XAUI Phy and MAC controller   -->
<!--  For a minimum IP packet size of 84B at 10Gb/s, a new packet arrives every 67.2ns. the low bound of clock rate of a 10Gb MAC is 150Mhz  -->
<param name="type" value="0"/>
<!--  1: low power; 0 high performance  -->
<param name="clockrate" value="350"/>
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="number_units" value="0"/>
<!--  unlike PCIe and memory controllers, each Ethernet controller only have one port  -->
<stat name="duty_cycle" value="1.0"/>
<!--  achievable max load <= 1.0  -->
<stat name="total_load_perc" value="0.7"/>
<!--  ratio of total achieved load to total achieve-able bandwidth   -->
<!--  McPAT does not track individual nic, instead, it takes the total accesses and calculate the average power per nic or per channel. This is sufficient for most application.  -->
</component>
<!-- ********************************************************************** -->
<component id="system.pcie" name="pcie">
<!--  On chip PCIe controller, including Phy -->
<!--  For a minimum PCIe packet size of 84B at 8Gb/s per lane (PCIe 3.0), a new packet arrives every 84ns. the low bound of clock rate of a PCIe per lane logic is 120Mhz  -->
<param name="type" value="0"/>
<!--  1: low power; 0 high performance  -->
<param name="withPHY" value="1"/>
<param name="clockrate" value="350"/>
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<param name="number_units" value="0"/>
<param name="num_channels" value="8"/>
<!--  2 ,4 ,8 ,16 ,32  -->
<stat name="duty_cycle" value="1.0"/>
<!--  achievable max load <= 1.0  -->
<stat name="total_load_perc" value="0.7"/>
<!--  Percentage of total achieved load to total achieve-able bandwidth   -->
<!--  McPAT does not track individual pcie controllers, instead, it takes the total accesses and calculate the average power per pcie controller or per channel. This is sufficient for most application.  -->
</component>
<!-- ********************************************************************** -->
<component id="system.flashc" name="flashc">
<param name="number_flashcs" value="0"/>
<param name="type" value="1"/>
<!--  1: low power; 0 high performance  -->
<param name="withPHY" value="1"/>
<param name="peak_transfer_rate" value="200"/>
<!-- Per controller sustain-able peak rate MB/S  -->
<param name="vdd" value="0"/>
<!--  0 means using ITRS default vdd  -->
<param name="power_gating_vcc" value="-1"/>
<!--  "-1" means using default power gating virtual power supply voltage constrained by technology and computed automatically  -->
<stat name="duty_cycle" value="1.0"/>
<!--  achievable max load <= 1.0  -->
<stat name="total_load_perc" value="0.7"/>
<!--  Percentage of total achieved load to total achieve-able bandwidth   -->
<!--  McPAT does not track individual flash controller, instead, it takes the total accesses and calculate the average power per fc or per channel. This is sufficient for most application  -->
</component>
<!-- ********************************************************************** -->
</component>
</component>

常数的选项根据自己的需求改就好了。具体的参数都有哪些可以参考MCPAT/mcpat/XML_Parse.cc中的情况
GEM5ToMcPAT.py:

#!/usr/bin/python
from optparse import OptionParser
import sys
import re
import json
import types
import math
from xml.etree import ElementTree as ET
import os
outname = "mcpat-out.xml"
#This is a wrapper over xml parser so that
#comments are preserved.
#source: http://effbot.org/zone/element-pi.htm
class PIParser(ET.XMLTreeBuilder):def __init__(self):ET.XMLTreeBuilder.__init__(self)# assumes ElementTree 1.2.Xself._parser.CommentHandler = self.handle_commentself._parser.ProcessingInstructionHandler = self.handle_piself._target.start("document", {})def close(self):self._target.end("document")return ET.XMLTreeBuilder.close(self)def handle_comment(self, data):self._target.start(ET.Comment, {})self._target.data(data)self._target.end(ET.Comment)def handle_pi(self, target, data):self._target.start(ET.PI, {})self._target.data(target + " " + data)self._target.end(ET.PI)def parse(source):return ET.parse(source, PIParser())def main():global opts# usage = "usage: %prog [options] <gem5 stats file> <gem5 config file (json)> <mcpat template file>"# parser = OptionParser(usage=usage)# parser.add_option("-q", "--quiet",#     action="store_false", dest="verbose", default=True,#     help="don't print status messages to stdout")# parser.add_option("-o", "--out", type="string",#     action="store", dest="out", default="mcpat-out.xml",#     help="output file (input to McPAT)")#下面这一段是我加的,可以自动生成mcpat.log,不会覆盖已经生成的,要生成新的需要把旧的删掉或者改名字,有需求的自己改改就好,也比较简单for root, dirs, files in os.walk("../out"):if 'config.json' in files and 'stats.txt' in files and outname not in files and 'not.txt' not in files:opts = {"verbose": True, "out": root + '/' + outname}args = [root+'/stats.txt',root + '/config.json', 'template-xeon.xml']print(root)runoa(opts,args)print(root)for root, dirs, files in os.walk("../out"):if outname in files and 'mcpat.log' not in files:opts = {"verbose": True, "out": root + '/' + outname}command = "../McPAT/mcpat/mcpat -infile  " + opts["out"] + " -print_level 5 >  "+ root + "/mcpat.log"os.system(command)#
def runoa(opts,args):# args = ['../out/bsrandinst2000unit16delay/stats.txt', '../out/bsrandinst2000unit16delay/config.json', 'template-xeon.xml']# # (opts, args) = parser.parse_args()# if len(args) != 3:#     parser.print_help()#     sys.exit(1)readStatsFile(args[0])readConfigFile(args[1])readMcpatFile(args[2])dumpMcpatOut(opts["out"])def dumpMcpatOut(outFile):rootElem = templateMcpat.getroot()configMatch = re.compile(r'config\.([a-zA-Z0-9_:\.]+)')#replace params with values from the GEM5 config filefor param in rootElem.iter('param'):name = param.attrib['name']value = param.attrib['value']if 'config' in value:allConfs = configMatch.findall(value)for conf in allConfs:confValue = getConfValue(conf)value = re.sub("config."+ conf, str(confValue), value)if "," in value:exprs = re.split(',', value)for i in range(len(exprs)):exprs[i] = str(eval(exprs[i]))param.attrib['value'] = ','.join(exprs)else:param.attrib['value'] = str(eval(str(value)))#replace stats with values from the GEM5 stats filestatRe = re.compile(r'stats\.([a-zA-Z0-9_:\.]+)')for stat in rootElem.iter('stat'):name = stat.attrib['name']value = stat.attrib['value']if 'stats' in value:allStats = statRe.findall(value)expr = valuefor i in range(len(allStats)):if allStats[i] in stats:expr = re.sub('stats.%s' % allStats[i], stats[allStats[i]], expr)else:print("***WARNING: %s does not exist in stats***" % allStats[i])print ("\t Please use the right stats in your McPAT template file")if 'config' not in expr and 'stats' not in expr:stat.attrib['value'] = str(eval(expr))#Write out the xml fileif opts["verbose"]: print ("Writing input to McPAT in: %s" % outFile )templateMcpat.write(outFile)def getConfValue(confStr):spltConf = re.split('\.', confStr) currConf = configcurrHierarchy = ""for x in spltConf:currHierarchy += xif x not in currConf:if isinstance(currConf, types.ListType):#this is mostly for system.cpu* as system.cpu is an array#This could be made betterfor i in range(len(currConf)):if x not in currConf[i]:print(i)#print "%s does not exist in config" % currHierarchyelse:currConf = currConf[i][x]else:print  ("***WARNING: %s does not exist in config.***" % currHierarchy )print ("\t Please use the right config param in your McPAT template file")else:currConf = currConf[x]currHierarchy += "."if type(currConf) == list:return currConf[0]return currConfdef readStatsFile(statsFile):global statsstats = {}if opts["verbose"]: print "Reading GEM5 stats from: %s" %  statsFileF = open(statsFile)ignores = re.compile(r'^---|^$')statLine = re.compile(r'([a-zA-Z0-9_\.:-]+)\s+([-+]?[0-9]+\.[0-9]+|[-+]?[0-9]+| nan|inf)')count = 0 for line in F:#ignore empty lines and lines starting with "---"if not ignores.match(line):count += 1statKind = statLine.match(line).group(1)statValue = statLine.match(line).group(2)if statValue == 'nan':# print ("\tWarning (stats): %s is nan. Setting it to 0" % statKind)statValue = '0'stats[statKind] = statValueF.close()def readConfigFile(configFile):global configif opts["verbose"]: print ("Reading config from: %s" % configFile)F = open(configFile)config = json.load(F)#print config#print config["system"]["membus"]#print config["system"]["cpu"][0]["clock"]F.close()def readMcpatFile(templateFile):global templateMcpat if opts["verbose"]: print ("Reading McPAT template from: %s" % templateFile )templateMcpat = parse(templateFile)#print dir(templateMcpat)if __name__ == '__main__':main()

三、使用步骤

在建立了相应的目录之后,只需要在gem5中进行模拟也就是

./build/RISCV/gem5.opt -d path/out/***** configs/example/se.py --num-cpus=1 --sys-clock='2.2GHz' --caches --l2cache --cpu-type=MinorCPU -c=tests/***** --l1d_size='32kB' --l1i_size='32kB' --l2_size='512kB' --l1d_assoc=8 --l1i_assoc=8 --l2_assoc=8

其中*****写自己的路径以及测试程序地址。然后在运行一下GEM5ToMcPAT.py就行,我是在pycharm中点一下运行。

RISC-V指令集的gem5与mcpat联合仿真相关推荐

  1. RISC V (RV32+RV64) 架构 整体介绍

    文章目录 riscv 市场 芯片介绍 软件介绍 开发板介绍 PC介绍 riscv 架构 编程模型(指令集/寄存器/ABI/SBI) 运行状态 指令集 寄存器 riscv32和riscv64两者的区别 ...

  2. 计组学习笔记2(RISC v版)

    指令集解释 (规定:R[r]表示通用寄存器r的内容,M[addr]表示存储单元addr的内容,SEXT[imm]表示对imm进行符号扩展,ZEXT[imm]表示对imm进行零扩展) 整数运算类 -U型 ...

  3. MATLAB 与Modelsim之间通过Linker的联合仿真

    Link for ModelSim介绍 ** Link for ModelSim®是一个把MATLAB/Simulink和针对FPGA 和ASIC的硬件设计流程无缝连结起来的联合仿真的接口扩展模块.它 ...

  4. modelsim和matlab联合仿真,Modelsim与Matlab联合仿真

    正 文 Modelsim与Matlab联合仿真(2009/11/12 16:28) 评 论 4楼 cheney1982 发表于 2009/11/14 10:23 回复 啥博客啊?编辑了我半天怎么换个行 ...

  5. vivado和modelsim联合仿真实现占空比1:15的分频

    上一讲我们看了偶数分频 vivado和modelsim联合仿真实现偶分频,只需要在clk计数到一半时进行翻转即可,这是占空比为50%的情况,如果占空比任意数值就需要重新设计 16需要从0计数到15=4 ...

  6. vivado和modelsim联合仿真实现偶分频

    首先创建一个工程,因为不在硬件上实现所以芯片型号随便选一个就行 创建design文件div6.v,代码来源于以下视频,稍作修改. 在复位时给输出信号clk6赋值为0,不然输出不确定 判断条件cnt为1 ...

  7. Quartus与Modelsim联合仿真ROM IP时输出波形一直为零的问题以及ROM配置仿真教程

    本人近期在使用Quartus Prime与 ModelSim联合对ROM IP进行仿真时,遇到了一个问题,仿真输出波形一直为零.如下图,其中neur_W即为ROM的输出. 在反复确认本人代码没有问题后 ...

  8. matlab与flightGear联合仿真

    一.安装软件 FlightGear2019.1.1 (exe应用程序,下载地址:https://www.flightgear.org/) Matlab版本:2017b(下载地址:https://pan ...

  9. Quartus 13.0和Modelsim SE 10.1a 联合仿真

    Quartus 13.0和Modelsim SE 10.1a联合仿真 1.首先在Quartus建立工程,编写HDL文件,进行编译:编译通过后编写testbench文件,再进行编译,直到通过没有错误.. ...

最新文章

  1. Ultimate SLAM?利用事件相机解锁高速运动、高动态范围场景
  2. bzoj1066 蜥蜴 (dinic)
  3. 第一个WDM驱动崩溃...
  4. 德国市占率第一的科沃斯携最新扫地机器人亮相IFA展
  5. 异步错误处理 -- 时机
  6. 自动驾驶汽车想成为主流?先过了这十二关再说
  7. SQL SERVER逆向工程将数据库导入PowerDesigner
  8. docx命令运行Java_使用Java将DOC文件转换为DOCX
  9. DataSet之间的赋值
  10. 关于WORD VBA学习使用心得
  11. 前端架构组件化开发系列二 (基于VUE 扩展组件)
  12. 谷歌地图网页版_安卓版谷歌地图新增专用的街景图层
  13. CTF密码图鉴(持续更新)
  14. vue移动端深坑之微信浏览器相关优化方案
  15. viroblast搭建blast网页
  16. php反向解析ip,什么是反向IP查找
  17. 互联网广告的发展现状与趋势分析
  18. spinner实现下拉菜单
  19. ueif的stall的实现
  20. Vue服务端渲染(Nodejs)

热门文章

  1. 身份证,手机号码打码显示
  2. 【智慧农业】温室集成控制系统
  3. 全能工具箱,用它可以少装几十个APP
  4. 富特科技在创业板IPO过会:计划募资约9亿元,股东包括小米等
  5. 303_S32K144运行模式切换
  6. 负载均衡研究之 基础篇
  7. Python读取DJI无人机拍摄照片中的DJI自定义EXIF信息
  8. EOS技术开发资料汇总
  9. 手动提高CPU利用率到接近100%
  10. JavaAssist的进阶使用