当前位置: 首页 > news >正文

Computer Architecture

System Evaluation Metrics

Cost Metrics

The cost of a chip includes:

  • Design cost: non-recurring engineering (NRE), can be amortized well if there is high volume;
  • Manufacturing cost: depends on area;
    • Manufacturing Semiconductor Chips: Ingot → Wafer → Die (unpackaged chip) → Chip
    • To measure the production efficiency of semiconductor manufacturing, we use the metric yield: the portion of good chips per wafer.
  • Testing cost: depends on yield and test time;
  • Packaging cost: depends on die size, number of pins, power delivery, ...

The cost of a system includes:

  • Power cost;
  • Cooling cost;
  • Total Cost of Ownership (TCO) of datacenters:
    • Capital expenses (CAPEX): facilities, assembly & installation, compute, storage,
      networking, software, …
    • Operational expenses (OPEX): energy, rent, maintenance, employee salaries, …
  • System availability: Downtime is expensive​​ and results in a direct loss of revenue. Redundancy​​ (adding backup components) improves availability but also increases the initial capital cost.

Performance Metrics

Performance metrics:

  • Latency: time to complete a task;
  • Throughput: tasks completed per unit time;

Improving latency often reduces throughput, but not vice versa. For example, inter-task parallelization improves throughput but not latency of a task, while intra-task parallelization improves both.

Buffering/queuing/batching improves throughput but may hurt latency, leads to the tradeoff between latency and throughput.

Digital systems (e.g., processors) operate using a constant-rate clock:

  • Clock cycle time (CCT): duration of a clock cycle;
  • Clock frequency (rate): cycles per second.

To compute the execution time of a program, we first compute the number of instructions (IC), which is fixed for a given program. Then we compute the average number of cycles per instruction (CPI), which depends on the system architecture and implementation. All together, we have

\[\text{Execution Time} = \frac{\text{Instructions}}{\text{Program}} \times \frac{\text{Cycles}}{\text{Instruction}} \times \frac{\text{Time}}{\text{Cycle}}= \text{IC} \times \text{CPI} \times \text{CCT}. \]

Roughly speaking, software determines IC, ISA determines CPI, and microarchitecture/circuit determines CCT.

So far we only discuss the performance on processors. What about memory? It could be reflected on CPI. We know that

\[\text{Runtime}=\max(\text{#ops}/\text{processor throughput},\text{#bytes}/\text{memory bandwidth}). \]

Denote operational Intensity (OI) as \(\frac{\text{#ops}}{\text{#bytes}}\), we have

\[\begin{align*} \text{Perf}=&\ \text{#ops}/\text{Runtime}\\ =&\ \min(\text{processor throughput},\text{memory bandwidth}\times\text{operational intensity}). \end{align*} \]

Drawing the graph of performance vs. operational intensity, we have the roofline model (for certain system):

Power and Energy Metrics

Dynamic/active power: \(C\times V_{dd}^2\times f_{0\to 1}=\alpha C V_{dd}^2 f\), where \(C\) is the capacitance being switched, \(V_{dd}\) is the supply voltage, \(f_{0\to 1}\) is the frequency of 0-to-1 transitions, \(\alpha\) is the activity factor (the fraction of capacitance being switched), and \(f\) is the clock frequency.

Static/leakage power: \(V_{dd}I_{leak}\), where \(I_{leak}\) is the leakage current.

Therefore, total power is

\[\text{Power}=\alpha C V_{dd}^2 f + V_{dd} I_{leak}. \]

And

\[\text{Energy}=\text{Power}\times \text{Time}. \]

Limiting factors of power, energy, and power density:

  • Power is limited by infrastructure, e.g., power supply;
  • Power density is limited by thermal dissipation, e.g., fans, liquid cooling;
  • Energy is limited by battery capacity or electrical bill.

Power scaling:

  • Dennard scaling (1974-2005): If the feature size scales by \(1/S\), the supply voltage and current can scale by \(1/S\);
  • Post-Dennard scaling (2006-now): Power limits performance scaling (power wall), so we need to slow down frequency scaling or reduce chip utilization.

Normalize performance to power:

\[\text{Energy Efficiency}=\frac{\text{Performance}}{\text{Power}}=\frac{\text{Operations}/\text{Time}}{\text{Energy}/\text{Time}}=1/\frac{\text{Energy}}{\text{Operations}}. \]

For certain task, choose the "optimal" design to trade off performance and energy.

Scalability

Scalability measures the speedup achieved by using \(N\) processors compared to using just \(1\) processor.

Two settings to evaluate scalability:

  • Strong scaling: speedup on \(N\) processors with fixed total workload size
  • Weak scaling: speedup on \(N\) processors with fixed per-processor workload size

How to balance the workload?

  • Static load balancing: to partition input as evenly as possible
  • Dynamic load balancing, e.g., work dispatch, work stealing

Suppose that an optimization accelerates a fraction \(f\) of a program by a factor of \(S\), then the overall speedup is given by Amdahl's Law:

\[\text{Speedup}=\frac{1}{(1-f)+\frac{f}{S}}. \]

Benchmark

Benchmark is a carefully selected programs used to measure performance. And benchmark suite is a collection of benchmarks.

To report the average performance on a benchmark suite, we may use three types of means: arithmetic (for absolutes), geometric (for rates) and harmonic (for ratios).

http://www.gsyq.cn/news/10491.html

相关文章:

  • Nordic 的支持对Matter 协议的支持;
  • Avalonia 学习笔记06. Page Layout(页面布局)
  • NRF54L15 两者结合的jlink保护机制(硬件+软件)
  • 个人对软件工程的理解
  • 用C/C++重构PowerShell:全面绕过安全机制的技术解析
  • Which side of a 2d curve is a point on
  • HTTPS 映射如何做?(HTTPS 映射配置、SNI 映射、TLS 终止、内网映射与 iOS 真机验证实战) - 指南
  • 大三上第一篇日志
  • 0923模拟赛总结
  • Hive采用Tez引擎出现OOM的处理办法
  • VMware之后下一个消失的永久许可,Citrix Netscaler VPX旧版许可已经失效了!你升级了吗?
  • Feminism in China
  • 大模型微调示例四之Llama-Factory-DPO - 教程
  • n8n+MySQL实现数据库查询!
  • firewalld 端口流量转发
  • Day20封装的初步认识
  • 【Qt开发】显示类控件(三)-> QProgressBar - 详解
  • 完整教程:数据结构与算法-树和二叉树-二叉树的存储结构(Binary Tree)
  • 工业相机与镜头靶面尺寸的关系:从原理到选型的避坑指南 - 教程
  • 提供优雅报错能力
  • Security Onion Solution
  • 详细介绍:MySQL进阶学习
  • 时序数据库 TimechoDB V2.0.6 发布 | 新增查询写回、黑白名单等功能
  • 第二篇
  • EasyDSS “进度条预览”黑科技,如何重塑视频点播的交互体验?
  • AI重塑招聘:从筛简历到做决策,HR如何借技术提效35%?
  • 直播点播之外,EasyDSS如何开辟“实时协作”第三极?它的会议功能,远比你想象的强大
  • 抖音视频关键词批量下载工具分享|分享痛点|
  • 第二部分:VTK核心类详解(第38章 vtkPointData点数据类) - 教程
  • 使用ai来搭建测试用例1