Computer Organization/Architecture 计算机组织/架构/结构 重要观念和笔记(陆续更新中,2024/04/17周三,已更新)

前情提要:我的说法比较白话,希望可以更好理解其中一些观念,这篇会以中文为主,专有名词还是用英文,好吧应该会中英穿插,自己学的时候感觉听中文会吸收比较快,也可能是我英文比较烂的关系 ̄□ ̄||欢迎点赞收藏评论给建议,感谢~

  • 2024/04/17周三,已更新

问1: 为什么要学计算机架构Computer Architecture(CA)?

答:
In broadest definition,CA是allow us using manufacturing technologies来efficiently execution information processing application的abstraction/implementation layer;CA主要指ISA和microarchitecture;本质上是software和hardware的contract契约及interface,包括software怎么看到hardware,hardware有哪些部分是software visible的,它们有哪些互动interaction等。
在这里插入图片描述

问2: Architecture/ISA vs Microarchitecture/Organization?

答:
1. 观念: 清楚什么是ISA定义的,什么是Microarchitecture定义的即可。
2. ISA定义的: input/output;data types/sizes;operations(instructions and how they work);execution semantics(interrupt);programmer visible state(memory and register)。
3. Microarchitecture定义的: implement ISA for some metrics(speed,energy,cost)的tradeoffs;e.g., pipline number and pipline depth,cache size,silicon area,bus widths,ALU widths,exe ordering,peak power。
4. Microarchitecture像是implement ISA的choice,ISA一样的computer可以有很多不一样的microarchitecture,取决于computer要用在embeding space还是high performance space。

问3: pipline哪一个cycle哪一条instruction在什么stage一定要清楚?

例题: 来自《Computer Architecture a Quantitative Approach》第六版,Page C-71,Problem C.1 a,b,c,d,e,f,g
在这里插入图片描述

a. Data hazards are caused by data dependences in the code. Whether a dependency causes a hazard depends on the machine implementation (i.e., number of pipeline stages). List all of the data dependences in the code above. Record the register, source instruction, and destination instruction; for example, there is a data dependency for register x1 from the ld to the addi.

b. Show the timing of this instruction sequence for the 5-stage RISC pipeline without any forwarding or bypassing hardware but assuming that a register read and a write in the same clock cycle “forwards” through the register file, as between the add and or shown in Figure C.5. Use a pipeline timing chart like that in Figure C.8. Assume that the branch is handled by flushing the pipeline. If all memory references take 1 cycle, how many cycles does this loop take to execute?

c. Show the timing of this instruction sequence for the 5-stage RISC pipeline with full forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.8. Assume that the branch is handled by predicting it as not taken. If all memory references take 1 cycle, how many cycles does this loop take to execute?

d. Show the timing of this instruction sequence for the 5-stage RISC pipeline with full forwarding and bypassing hardware, as shown in Figure C.6. Use a pipeline timing chart like that shown in Figure C.8. Assume that the branch is handled by predicting it as taken. If all memory references take 1 cycle, how many cycles does this loop take to execute?

e. High-performance processors have very deep pipelines—more than 15 stages. Imagine that you have a 10-stage pipeline in which every stage of the 5-stage pipeline has been split in two. The only catch is that, for data forwarding, data are forwarded from the end of a pair of stages to the beginning of the two stages where they are needed. For example, data are forwarded from the output of the second execute stage to the input of the first execute stage, still causing a 1-cycle delay. Show the timing of this instruction sequence for the 10-stage RISC pipeline with full forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure C.8 (but with stages labeled IF1, IF2, ID1, etc.). Assume that the branch is handled by predicting it as taken. If all memory references take 1 cycle, how many cycles does this loop take to execute?

f. Assume that in the 5-stage pipeline, the longest stage requires 0.8 ns, and the pipeline register delay is 0.1 ns. What is the clock cycle time of the 5-stage pipeline? If the 10-stage pipeline splits all stages in half, what is the cycle time of the 10-stage machine?

g. Using your answers from parts (d) and (e), determine the cycles per instruction (CPI) for the loop on a 5-stage pipeline and a 10-stage pipeline. Make sure you count only from when the first instruction reaches the write-back stage to the end. Do not count the start-up of the first instruction. Using the clock cycle time calculated in part (f), calculate the average instruction execute time for each machine.

答:
a.
在这里插入图片描述
b.
Forwarding is performed only via the register file. Branch outcomes and targets are not known until the end of the execute stage. All instructions introduced to the pipeline prior to this point are flushed.
在这里插入图片描述
Since the initial value of x3 is x2+396 and equal instances of the loop add 4 to x2, the total number of iterations is 99. It takes 16 cycles between loop instances. The last loop takes two addition cycles since this latency cannot be overlapped with additional loop instances. The total number of cycles is 16×98+18 =1584.

c.
在这里插入图片描述
Assumes branch resolved in decode stage and no delay slots. Branch outcomes and targets are known now at the end of decode. Resolving branch in decode requires zero detect after bypass. The total number of cycles is 8×98+11=795.

d.
在这里插入图片描述
Assumes branch resolved in decode stage and no delay slots, and early pre-decode to determine and fetch target of branch in fetch stage. The total number of cycles is 7×98+11=697.

e.
在这里插入图片描述
The total number of cycles is 12×98+21=1197.

f.
5-stage: 0.8+0.1=0.9(ns); 10-stage: 0.8/2+0.1=0.5(ns)

g.
5-stage:
The 5th cycle to the 11th cycle took a total of 7 cycles and involved the execution of 6 instructions.
CPI = 7(cycles) / 6 (instructions) = 1.16
Average Instruction Execution Time = 1.16×0.9=1.044

10-stage:
The 10th cycle to the 21st cycle took a total of 12 cycles and involved the execution of 6 instructions.
CPI = 12(cycles) / 6 (instructions) = 2
Average Instruction Execution Time = 2×0.5=1

  • 2024/00/00周x,未更新

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/553650.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

每日算法4/17

1552. 两球之间的磁力 题目 在代号为 C-137 的地球上,Rick 发现如果他将两个球放在他新发明的篮子里,它们之间会形成特殊形式的磁力。Rick 有 n 个空的篮子,第 i 个篮子的位置在 position[i] ,Morty 想把 m 个球放到这些篮子里&…

点击广告就能日赚收益1000+?开发一款看广告赚收益的APP靠谱吗?

APP对接广告变现是开发者获得收益的重要方式之一,对一些体量较小的APP来说,甚至是唯一的收益来源。开发者是否可以单独开发一款全是广告的APP,拿出一部分的广告收益给点击者,类似在快手极速版里看广告获得金币一个原理&#xff0c…

聚观早报 | 小度推出DuerOS X;问界新M5开启预定

聚观早报每日整理最值得关注的行业重点事件,帮助大家及时了解最新行业动态,每日读报,就读聚观365资讯简报。 整理丨Cutie 4月18日消息 小度推出DuerOS X 问界新M5开启预定 库克访问印尼 方程豹产品矩阵正式发布 苹果折叠屏iPhone新专利…

ESP-IDF下载与安装完整流程

本文主要看参考官网说明,如下: Windows 平台工具链的标准设置 - ESP32 - — ESP-IDF 编程指南 latest 文档 (espressif.com) 一、概述 ESP-IDF需要安装一些必备工具,才能围绕ESP32构建固件,包括: PythonGit交叉编译…

steam加速器哪个好 2024最新steam好用加速器推荐

steam加速器哪个好 2024最新steam好用加速器推荐 对于热爱游戏的玩家们来说,steam这个游戏平台对于大家来说肯定不陌生,但是由于平台服务器的原因,我们在大陆登录平台经常会出现卡顿掉线等原因,那么今天小编就为大家带来可以流畅…

Redis入门到通关之解决Redis缓存一致性问题

文章目录 ☃️概述☃️数据库和缓存不一致采用什么方案☃️代码实现☃️其他 ☃️概述 由于我们的 缓存的数据源来自于数据库, 而数据库的 数据是会发生变化的, 因此,如果当数据库中 数据发生变化,而缓存却没有同步, 此时就会有 一致性问题存在, 其后果是: 用户使用缓存中的过…

python-使用bottle时间简易服务器

python-使用bottle时间简易服务器 背景调试读取文本所有内容字段解释json字符串解析追加写入文件 整理后整理后写入文件方法将目录下所有文本的内容批量追加到一个文本搜索字符串方法实现简易服务器通过浏览器访问 背景 202310.txt内容是一段json字符串,目的是通过…

MySQL连接查询之等值连接

连接查询 又称多表查询,当查询结果来自多张数据表的时候,就需要用到连接查询。 建一个新库包含两张表实验(ta:id,age字段,tb:id,name,ta_id): …

基于SpringBoot的“人职匹配推荐系统”的设计与实现(源码+数据库+文档+PPT)

基于SpringBoot的“人职匹配推荐系统”的设计与实现(源码数据库文档PPT) 开发语言:Java 数据库:MySQL 技术:SpringBoot 工具:IDEA/Ecilpse、Navicat、Maven 系统展示 网上商城购物系统结构图 管理员登录界面图 个…

WebKit内核游览器

WebKit内核游览器 基础概念游览器引擎Chromium 浏览器架构Webkit 资源加载这里就不得不提到http超文本传输协议这个概念了: 游览器多线程HTML 解析总结 基础概念 百度百科介绍 WebKit 是一个开源的浏览器引擎,与之相对应的引擎有Gecko(Mozil…

CTFHUB RCE作业

题目地址:CTFHub 完成情况如图: 知识点: preg_match_all 函数 正则匹配函数 int preg_match_all ( string $pattern , string $subject [, array &$matches [, int $flags PREG_PATTERN_ORDER [, int $offset 0 ]]] )搜索 subject 中…

rust学习(BorrowMut异常)

现象: 编译没有问题,运行时出现: 代码: pub fn do_test() {let v Arc::new(RefCell::new(100));let v1 v.try_borrow_mut().unwrap();let v2 v.try_borrow_mut().unwrap(); } 原因: 一个cell貌似不能同时被借用…

实际案例分享:如何利用上位机优化生产过程

在现代工业生产中,利用上位机进行生产过程优化已成为一种常见的做法。通过实时监控、数据采集和远程控制,上位机可以帮助企业提高生产效率、降低成本,并实现生产过程的自动化和智能化。本文将通过一个实际案例来介绍如何利用上位机优化生产过…

Nature 哈佛新型超材料Metafluid粘度、透明度、弹性可变,可用于编程液压机器人

液体都有“智能”、可编程了? 最近,一种被称为“智能"液体的多功能可编程的新型超材料——Metafluid,登上了Nature。 它由哈佛大学SEAS的研究团队研发,据说可自由调节弹性、光学特性、粘度。 甚至能够在牛顿流体和非牛顿流…

RHCE第二次作业

一.配置server主机要求如下: 1.server主机的主机名称为 ntp_server.example.com 2.server主机的IP为: 172.25.254.100 3.server主机的时间为1984-11-11 11:11:11 4.配置server主机的时间同步服务要求可以被所有人使用 二.设定cli…

【Python】异常处理结构

文章目录 1.python异常2.try_except异常处理结构3.try... 多个except异常处理4.try_except_else异常处理结构5.try_except_finally异常处理结构6.常见报错类型 在运行代码时,总是遇到各种异常,且出现异常时,脚本就会自动的的停止运行&#xf…

网络编程day5

目录 使用select实现TCP并发服务器 使用poll实现TCP客户端 UDP实现网络聊天室 服务器 ser.h main.c func_ser.c makefile 客户端 cli.h main.c func_cli.c makfile 思维导图 使用select实现TCP并发服务器 #include <myhead.h> int main(int argc, const c…

Create AI大会|人人皆可成为开发者,探索人工智能新趋势

在数字化浪潮的推动下&#xff0c;人工智能&#xff08;AI&#xff09;技术正以前所未有的速度渗透到我们生活的方方面面。2024年4月16日&#xff0c;备受瞩目的Create 2024百度AI开发者大会在深圳国际会展中心&#xff08;宝安&#xff09;盛大开幕。大会以“创造未来”为主题…

47.基于SpringBoot + Vue实现的前后端分离-校园外卖服务系统(项目 + 论文)

项目介绍 本站是一个B/S模式系统&#xff0c;采用SpringBoot Vue框架&#xff0c;MYSQL数据库设计开发&#xff0c;充分保证系统的稳定性。系统具有界面清晰、操作简单&#xff0c;功能齐全的特点&#xff0c;使得基于SpringBoot Vue技术的校园外卖服务系统设计与实现管理工作…

人才测评的方法有哪些?

人才测评是企业在筛选人才的时候必然会使用的策略&#xff0c;为了节省企业HR在招聘时的成本&#xff0c;又极大提高了人才和岗位的匹配度&#xff0c;从企业发展和员工个人发展来看&#xff0c;起到了双赢的作用&#xff0c;在线人才测评是现代企业招聘&#xff0c;人才选拔&a…