The Computer Architecure Seminar Series brings to the UT campus leading
researchers from universities and companies to describe their research to
UT's faculty, graduate students, and computer engineers and scientists from
the Austin community who are welcome attend at no charge. The individual
talks are hosted by computer architecture related faculty members in both
the ECE and CS departments at UT. Seminars deal with problems in computer
architecture, microarchitecture, compiler technology, and advances in both
hardware and software that impact the field of computer architecture.
The series has been running uninterrupted for almost 20 years, sponsored
by companies who have design centers in Austin.
Innovative Applications and Technology Pivots – A Perfect Storm in Computing 8/30/2016
Since early 2000, we have been experiencing two very important developments in computing. One is that a tremendous amount of resources have been invested into innovative applications such as first-principle based models, deep learning and cognitive computing. Many application domains are questioning the conventional “it is too expensive” thinking that led to inaccuracies and missed opportunities. The other part is that the industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. Today, most of the top supercomputers in the world are heterogeneous parallel computing systems. New standards such as the Heterogeneous Systems Architecture (HSA) are emerging to facilitate software development. Much has been and needs to be learned about of algorithms, languages, compilers and hardware architecture in these movements. What are the applications that continue to drive the technology development? How hard is it to program these systems today? How will we programming these systems in the future? How will innovations in memory and storage devices present further opportunities and challenges? What is the impact on long-term software engineering cost on applications? In this talk, I will present some research opportunities and challenges that are brought about by this perfect storm.
Toward Extreme-Scale Manicure Architectures 9/06/2016
As transistor sizes continue to scale, we are about to witness stunning levels of chip integration, with 1,000 (simple) cores on a single die, and increasing levels of die stacking. Transistors may not be much faster, but there will be many more of them. In these architectures, energy and power will be the main constraint, efficient communication and synchronization a major challenge, and programmability an unknown.
In this context, this talk presents some of the technologies that we will need to deploy to exploit these architectures. Cores need to flexibly operate at a range of voltages, and techniques for efficient energy use such as power gating and voltage speculation need to be widespread. To enable data sharing, we need to rethink synchronization and fence hardware for scalability. Hardware extensions to ease programming will provide a competitive edge. A combination of all of these techniques --and additional disruptive technologies--are needed.
Josep Torrellas is the Saburo Muroga Professor of Computer Science at the University of Illinois at Urbana-Champaign. He leads the Center for Programmable Extreme-Scale Computing, a center focused on architectures for extreme energy and power efficiency. He has been the director of the Intel-Illinois Parallelism Center (I2PC), a center created by Intel to advance parallel computing. He has made contributions to parallel computer architecture in the areas of shared memory multiprocessor organizations, cache hierarchies and coherence protocols, thread-level speculation, and hardware and software reliability. He is a Fellow of IEEE and ACM. He received the 2015 IEEE CS Technical Achievement Award.
Processors for the Data Center and Cloud of the Future
Current-day data centers and IaaS clouds (e.g. Amazon EC2, MS Azure, Google GCE) use microprocessors that are very similar to or the same as
those used in small servers and desktops. This work rethinks the design of microprocessors specifically for data center use along with how
microprocessors are affected by the novel economic models that have been popularized by IaaS clouds. This talk will describe several
architectural changes including how a processor can be decomposed into sub-components (e.g. ALU, Cache, Fetch Unit) that can be individually
rented in IaaS clouds, how running similar programs can be taken advantage of in the data center, how architectural features such as the
flavor of memory bandwidth (bursty vs. bulk) can be provisioned and sold in the data center, and novel memory architectures that enable the
creation of sub-coherence domains of cache coherence across the data center.
This work has not only been simulated, but many of the discussed ideas have been implemented in one of the largest academic processors ever
built, the Princeton Piton Processor. Piton is a 25-core manycore built in IBM's 32nm process technology containing over 460 Million
transistors. This talk will discuss Piton along with what it takes to tape-out a complex microprocessor in an academic setting. Last, Piton
has been recently open sourced as the OpenPiton (http://www.openpiton.org) project which is a expandable manycore
platform which includes RTL, thousands of tests, and implementation scripts. The talk will conclude by discussing how OpenPiton is able to
contribute to the burgeoning field of open source hardware.
Datacenter Computers: Modern Challenges in CPU Design 2/21/2017
Computers used as datacenter servers have usage patterns that differ substantially from those of desktop or laptop computers. We discuss four key differences in usage and their first-order implications for designing computers that are particularly well-suited as servers: data movement, thousands of transactions per second, program isolation, and measurement underpinnings.
Maintaining high-bandwidth data movement requires coordinated design decisions throughout the memory system, instruction-issue system, and even instruction set. Serving thousands of transactions per second requires continuous attention to all sources of delay – causes of long-latency transactions. Unrelated programs running on shared hardware produce delay through undesired interference; isolating programs from one another needs further hardware help. And finally, when running datacenter servers as a business it is vital to be able to observe and hence decrease inefficiencies across dozens of layers of software and thousands of interacting servers. There are myriad open research problems related to these issues.
Dick Sites most recently was a Visiting Professor teaching a graduate course on Datacenter Software at the National University of Singapore.
Before that he was a Senior Staff Engineer at Google for 12 years. He previously worked at Adobe Systems, Digital Equipment Corporation, Hewlett-Packard, Burroughs, and IBM. His accomplishments include co-architecting the DEC Alpha computers and building various computer performance monitoring and tracing tools at the above companies. He also taught Computer Science for four years at UC/San Diego in the 1970s. His work at Google has included understanding CPU disk and network performance anomalies, disk and network hardware design, web-page language detection, and downloadable Google Translate dictionaries. Dr. Sites holds a PhD degree in Computer Science from Stanford and a BS degree in Mathematics from MIT. He also attended the Master's program in Computer Science at UNC Chapel Hill 1969-70. He holds 38 patents and is a member of the U.S. National Academy of Engineering.
A Convergent Architecture for Big Data, Machine Learning and Real-Time Computing 2/28/2017
In the quest for more intelligent consumer devices, machine learning lets appliances understand what is happening around the computer and what is asked of it, while big data provides the history and context of the environment. But devices must also react to be useful, and for many applications the reaction needs to happen on human timescale to be valuable. For example, an advertisement beacon must beam a discount coupon to the shopper's cellphone in a few hundred milliseconds or the shopper will walk past. Today many people prefer to use large shared data centers remotely accessed through the public internet for big data analytics and machine learning because this is the most cost-effective and energy efficient way to do large-scale computing. But integrating real-time computing with big data and machine learning may make that impractical because exchanging messages through the internet may itself consume a substantial fraction of a second, leaving almost no time for computing if you want to guarantee application response time of a few hundred milliseconds. In this talk I propose a FLASH-based parallel computer using large numbers of low-power processor chips with vector units. Such a system is very much smaller, cheaper and lower power than one with equal memory capacity and instruction throughput made entirely with DRAM, x86 processors and GPUs. It is small enough to install locally in retail and office locations or mobile platforms such as trucks and ships, and inexpensive enough that it need not be a shared computing resource. Yet because it uses primarily FLASH memory, which is extremely dense, the storage capacity can be as big or bigger than any DRAM-based in-memory big data analytic server.
Energy-Efficient Hardware for Embedded Vision and Deep Convolutional Neural Networks 5/1/2017
Visual object detection and recognition are needed for a wide range of applications including robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. For many of these applications, local embedded processing is preferred due to privacy or latency concerns. In this talk, we will describe how joint algorithm and hardware design can be used to reduce the energy consumption of object detection and recognition while delivering real-time and robust performance. We will discuss several energy-efficient techniques that exploit sparsity, reduce data movement and storage costs, and show how they can be applied to popular forms of object detection and recognition, including those that use deep convolutional neural nets. We will present results from recently fabricated ASICs (e.g. our deep CNN accelerator named “Eyeriss”) that demonstrate these techniques in real-time computer vision systems.
Vivienne Sze is an Assistant Professor at MIT in the Electrical Engineering and Computer Science Department. Her research interests include energy-aware signal processing algorithms, and low-power circuit and system design for multimedia applications. Prior to joining MIT, she was a Member of Technical Staff in the R&D Center at TI, where she developed algorithms and hardware for the latest video coding standard H.265/HEVC. She is a co-editor of the book entitled “High Efficiency Video Coding
(HEVC): Algorithms and Architectures” (Springer, 2014).
Dr. Sze received the B.A.Sc. degree from the University of Toronto in 2004, and the S.M. and Ph.D. degree from MIT in 2006 and 2010, respectively. In 2011, she was awarded the Jin-Au Kong Outstanding Doctoral Thesis Prize in electrical engineering at MIT for her thesis on “Parallel Algorithms and Architectures for Low Power Video Decoding”. She is a recipient of the 2016 AFOSR Young Investigator Award, the 2016 3M Non-tenured Faculty Award, the 2014 DARPA Young Faculty Award, the 2007 DAC/ISSCC Student Design Contest Award and a co-recipient of the 2008 A-SSCC Outstanding Design Award.
For more information about research in the Energy-Efficient Multimedia Systems Group at MIT visit: