用戶:AnthonyDonlon/archives/張量處理單元

**Tensor processing unit (TPU)**
推出年份	May 2016
設計公司	Google
體系結構類型	Neural network; Machine learning

張量處理單元（英文：Tensor Processing Unit，簡稱：TPU），也稱張量處理器，是 Google 開發的專用集成電路（ASIC），專門用於加速機器學習。^[1]自 2015 年起，谷歌就已經開始在內部使用 TPU，並於 2018 年將 TPU 提供給第三方使用，既將部分 TPU 作為其雲基礎架構的一部分，也將部分小型版本的 TPU 用於銷售。

總覽

2016 年 5 月，Google 在 Google I/O 上宣佈了張量處理單元，並表示 TPU 已經在其數據中心內部使用了超過一年。^[2]^[3]該晶片是專門為 Google 的 TensorFlow 框架（一個符號數學庫，用於機器學習應用程式，如神經網絡）設計的。^[4]不過，截至 2017 年，Google 也將 CPU 和 GPU 用於其他類型的機器學習。^[2]其他供應商也設計了自己的 AI 加速器，並針對嵌入式和機械人市場。

Google 的 TPU 是專有的，一些 TPU 的型號已經上市。在 2018 年 2 月 12 日，紐約時報報道稱 Google 將「允許其他公司通過其雲計算服務購買對這些晶片的訪問權」。^[5]Google 曾稱，它們已用於 AlphaGo 與李世乭的人機圍棋對戰^[3]以及 AlphaZero 系統中。Google還使用 TPU 進行 Google 街景中的文本處理，並且能夠在不到五天的時間內找到 Google 街景數據庫中的所有文本。在 Google 相冊中，單個 TPU 每天可以處理超過1億張照片。TPU 也被用在 Google 用來提供搜索結果的 RankBrain 中。^[6]

與圖形處理單元（GPU）相比，TPU 被設計用於進行大量的低精度計算（如 8 位的低精度）^[7]，每焦耳功耗下的輸入/輸出操作更多，但缺少用於光柵化/紋理映射的硬件。^[3]

根據 Norman Jouppi（英語：Norman Jouppi）的說法，TPU 可以安裝在散熱器組件中，從而可以安裝在數據中心機架上的硬盤驅動器插槽中。^[2]

產品

	TPUv1	TPUv2	TPUv3
推出時間	2016年	2017年	2018年
製程	28 nm	20 nm?	12 nm?
裸晶尺寸/mm²	331	？	？
片上儲存/MiB	28	？	？
時鐘速度/MHz	700	？	？
內存/GB	8GB DDR3	16GB HBM	32GB HBM
熱設計功耗/W	40	200	250
TOPS	23	45	90

第一代 TPU

第一代TPU是一個 8 位矩陣乘法的引擎，使用複雜指令集，並由主機通過 PCIe 3.0 總線驅動。它採用28 nm工藝製造，裸晶尺寸小於 331 mm²，時鐘速度為 700 MHz，熱設計功耗為 28–40 W。它有28 MiB 的片上存儲和 4 MiB 的 32位累加器，取 8 位乘法器的 256×256 脈動陣列的計算結果。^[8]TPU 還封裝了 8 GiB 的雙通道 2133 MHz DDR3 SDRAM，帶寬達到 34 GB/s。^[9]TPU 的指令向主機進行數據的收發，執行矩陣乘法和卷積運算，並應用激活函數。^[8]

第二代 TPU

第二代 TPU 於 2017 年 5 月發佈。^[10]Google 表示，第一代 TPU 的設計受到了內存帶寬的限制，因此在第二代設計中使用 16 GB 的高帶寬內存，可將帶寬提升到 600 GB/s，性能從而可達到 45 TFLOPS。^[9]TPU 晶片隨後被排列成性能為 180 TFLOPS 的四晶片模塊^[10]，並將其中的 64 個這樣的模塊組裝成 256 晶片的 Pod，性能達到 11.5 PFLOPS。^[10]值得注意的是，第一代 TPU 只能進行整數運算，但第二代 TPU 還可以進行浮點運算。這使得第二代 TPU 對於機器學習模型的訓練和推理都非常有用。谷歌表示，這些第二代TPU將可在 Google 計算引擎上使用，以用於 TensorFlow 應用程式中。^[11]

第三代 TPU

第三代 TPU 於 2018 年 5 月 8 日發佈。^[12]谷歌宣佈第三代 TPU 的性能是第二代的兩倍，並將部署在晶片數量是上一代的四倍的 Pod 中。^[13]^[14]與部署的第二代 TPU 相比，這使每個 Pod 的性能提高了8倍（每個 Pod 中最多裝有 1,024 個晶片）。

參見

視覺處理單元，一種專門用於視覺處理的小型設備
TrueNorth，一種用於模擬脈衝神經網絡，而非計算低精度張量的小型設備
神經處理單元
認知計算
Tensor Core（英語：Tensor Core）, 英偉達公司提出的類似的架構

參考文獻

^ 云张量处理单元 (TPU) | Cloud TPU. Google Cloud. [2020-07-20] （中文（中國大陸））.
^ ^2.0 ^2.1 ^2.2 Google's Tensor Processing Unit explained: this is what the future of computing looks like. TechRadar. [2017-01-19] （英語）.
^ ^3.0 ^3.1 ^3.2 Jouppi, Norm. Google supercharges machine learning tasks with TPU custom chip. Google Cloud Platform Blog. May 18, 2016 [2017-01-22] （美國英語）.
^ "TensorFlow: Open source machine learning" "It is machine learning software being used for various kinds of perceptual and language understanding tasks" — Jeffrey Dean, minute 0:47 / 2:17 from Youtube clip
^ Google Makes Its Special A.I. Chips Available to Others. The New York Times. [2018-02-12] （英語）.
^ Google's Tensor Processing Unit could advance Moore's Law 7 years into the future. PCWorld. [2017-01-19] （英語）.
^ Armasu, Lucian. Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated). Tom's Hardware. 2016-05-19 [2016-06-26].
^ ^8.0 ^8.1 Jouppi, Norman P.; Young, Cliff; Patil, Nishant; Patterson, David; Agrawal, Gaurav; Bajwa, Raminder; Bates, Sarah; Bhatia, Suresh; Boden, Nan; Borchers, Al; Boyle, Rick; Cantin, Pierre-luc; Chao, Clifford; Clark, Chris; Coriell, Jeremy; Daley, Mike; Dau, Matt; Dean, Jeffrey; Gelb, Ben; Ghaemmaghami, Tara Vazir; Gottipati, Rajendra; Gulland, William; Hagmann, Robert; Ho, C. Richard; Hogberg, Doug; Hu, John; Hundt, Robert; Hurt, Dan; Ibarz, Julian; Jaffey, Aaron; Jaworski, Alek; Kaplan, Alexander; Khaitan, Harshit; Koch, Andy; Kumar, Naveen; Lacy, Steve; Laudon, James; Law, James; Le, Diemthu; Leary, Chris; Liu, Zhuyuan; Lucke, Kyle; Lundin, Alan; MacKean, Gordon; Maggiore, Adriana; Mahony, Maire; Miller, Kieran; Nagarajan, Rahul; Narayanaswami, Ravi; Ni, Ray; Nix, Kathy; Norrie, Thomas; Omernick, Mark; Penukonda, Narayana; Phelps, Andy; Ross, Jonathan; Ross, Matt; Salek, Amir; Samadiani, Emad; Severn, Chris; Sizikov, Gregory; Snelham, Matthew; Souter, Jed; Steinberg, Dan; Swing, Andy; Tan, Mercedes; Thorson, Gregory; Tian, Bo; Toma, Horia; Tuttle, Erick; Vasudevan, Vijay; Walter, Richard; Wang, Walter; Wilcox, Eric; Yoon, Doe Hyun. In-Datacenter Performance Analysis of a Tensor Processing Unit™. Toronto, Canada. June 26, 2017. arXiv:1704.04760  .
^ ^9.0 ^9.1 Kennedy, Patrick. Case Study on the Google TPU and GDDR5 from Hot Chips 29. Serve The Home. 22 August 2017 [23 August 2017].
^ ^10.0 ^10.1 ^10.2 Bright, Peter. Google brings 45 teraflops tensor flow processors to its compute cloud. Ars Technica. 17 May 2017 [30 May 2017].
^ Kennedy, Patrick. Google Cloud TPU Details Revealed. Serve The Home. 17 May 2017 [30 May 2017].
^ Frumusanu, Andre. Google I/O Opening Keynote Live-Blog. 8 May 2018 [9 May 2018].
^ Feldman, Michael. Google Offers Glimpse of Third-Generation TPU Processor. Top 500. 11 May 2018 [14 May 2018].
^ Teich, Paul. Tearing Apart Google's TPU 3.0 AI Coprocessor. The Next Platform. 10 May 2018 [14 May 2018].

外部連結

Cloud Tensor Processing Units (TPUs) (Google Cloud 上的文檔)
Photo of Google's TPU chip and board
Photo of Google's TPU v2 board
Photo of Google's TPU v3 board
Photo of Google's TPU v2 pod

[1] 云张量处理单元 (TPU) | Cloud TPU. Google Cloud. [2020-07-20] （中文（中國大陸））.

[:0-2] 2.0 ^2.1 ^2.2 Google's Tensor Processing Unit explained: this is what the future of computing looks like. TechRadar. [2017-01-19] （英語）.

[GCP_blog_2016-3] 3.0 ^3.1 ^3.2 Jouppi, Norm. Google supercharges machine learning tasks with TPU custom chip. Google Cloud Platform Blog. May 18, 2016 [2017-01-22] （美國英語）.

[YoutubeClip-4] "TensorFlow: Open source machine learning" "It is machine learning software being used for various kinds of perceptual and language understanding tasks" — Jeffrey Dean, minute 0:47 / 2:17 from Youtube clip

[5] Google Makes Its Special A.I. Chips Available to Others. The New York Times. [2018-02-12] （英語）.

[6] Google's Tensor Processing Unit could advance Moore's Law 7 years into the future. PCWorld. [2017-01-19] （英語）.

[7] Armasu, Lucian. Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated). Tom's Hardware. 2016-05-19 [2016-06-26].

[InDatacenterPerformanceAnalysisOfATensorProcessingUnit-2017-8] 8.0 ^8.1 Jouppi, Norman P.; Young, Cliff; Patil, Nishant; Patterson, David; Agrawal, Gaurav; Bajwa, Raminder; Bates, Sarah; Bhatia, Suresh; Boden, Nan; Borchers, Al; Boyle, Rick; Cantin, Pierre-luc; Chao, Clifford; Clark, Chris; Coriell, Jeremy; Daley, Mike; Dau, Matt; Dean, Jeffrey; Gelb, Ben; Ghaemmaghami, Tara Vazir; Gottipati, Rajendra; Gulland, William; Hagmann, Robert; Ho, C. Richard; Hogberg, Doug; Hu, John; Hundt, Robert; Hurt, Dan; Ibarz, Julian; Jaffey, Aaron; Jaworski, Alek; Kaplan, Alexander; Khaitan, Harshit; Koch, Andy; Kumar, Naveen; Lacy, Steve; Laudon, James; Law, James; Le, Diemthu; Leary, Chris; Liu, Zhuyuan; Lucke, Kyle; Lundin, Alan; MacKean, Gordon; Maggiore, Adriana; Mahony, Maire; Miller, Kieran; Nagarajan, Rahul; Narayanaswami, Ravi; Ni, Ray; Nix, Kathy; Norrie, Thomas; Omernick, Mark; Penukonda, Narayana; Phelps, Andy; Ross, Jonathan; Ross, Matt; Salek, Amir; Samadiani, Emad; Severn, Chris; Sizikov, Gregory; Snelham, Matthew; Souter, Jed; Steinberg, Dan; Swing, Andy; Tan, Mercedes; Thorson, Gregory; Tian, Bo; Toma, Horia; Tuttle, Erick; Vasudevan, Vijay; Walter, Richard; Wang, Walter; Wilcox, Eric; Yoon, Doe Hyun. In-Datacenter Performance Analysis of a Tensor Processing Unit™. Toronto, Canada. June 26, 2017. arXiv:1704.04760  .

[TPU_memory-9] 9.0 ^9.1 Kennedy, Patrick. Case Study on the Google TPU and GDDR5 from Hot Chips 29. Serve The Home. 22 August 2017 [23 August 2017].

[TFP_v2-10] 10.0 ^10.1 ^10.2 Bright, Peter. Google brings 45 teraflops tensor flow processors to its compute cloud. Ars Technica. 17 May 2017 [30 May 2017].

[11] Kennedy, Patrick. Google Cloud TPU Details Revealed. Serve The Home. 17 May 2017 [30 May 2017].

[12] Frumusanu, Andre. Google I/O Opening Keynote Live-Blog. 8 May 2018 [9 May 2018].

[13] Feldman, Michael. Google Offers Glimpse of Third-Generation TPU Processor. Top 500. 11 May 2018 [14 May 2018].

[14] Teich, Paul. Tearing Apart Google's TPU 3.0 AI Coprocessor. The Next Platform. 10 May 2018 [14 May 2018].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]