This video belongs to the openHPI course Future of Computing - IBM POWER9 and beyond. Do you want to see more?
An error occurred while loading the video player, or it takes a long time to initialize. You can try clearing your browser cache. Please try again later and contact the helpdesk if the problem persists.
Scroll to current position
- 00:01Welcome to this part
- 00:03of the lecture series.
- 00:05My name is Arni Ingimundarson and I want to
- 00:07introduce you to the POWER process of
- 00:09microarchitecture and giving you a
- 00:12few samples from the history of
- 00:15the POWER processor.
- 00:19A few words about me.
- 00:21I said, my name is Arni Ingimundarson and I studied
- 00:23electrical engineering at the Technical University
- 00:26of Darmstadt, specializing
- 00:28on computer system and formal verification.
- 00:33After 11 years at Texas Instruments, developing
- 00:36ultra-lowPOWER microcontroller and secure
- 00:38microcontrollers, I've been
- 00:40with IBM since 2015 developing
- 00:43arithmetic logic units and working with
- 00:46SAP HANA on POWER team,
- 00:50porting and supporting HANA on the POWER
- 00:52architecture.
- 00:58I want to split my
- 01:02subject into four parts.
- 01:04In this video, I will give you an overview of the
- 01:06DLX, microarchitecture.
- 01:09The second video, I will show
- 01:12you the POWER4 and POWER5
- 01:15microarchitecture.
- 01:17The third video, I will give you an overview of the
- 01:19POWER7 architecture, including
- 01:23touching on the subject of symmetric multiprocessing
- 01:26and the last lecture, I will give you an overview of
- 01:28the POWER8 and POWER9.
- 01:35The DLX processor architecture
- 01:37was designed by Hennessy and
- 01:40Patterson.
- 01:42It's architecture is used mainly in the academic
- 01:45world for teaching.
- 01:47It is a simple 32 bit RISC
- 01:50architecture based on the MIPS architecture.
- 01:55It has 32 bit fixed-width
- 01:58instructions set.
- 02:00It has three instructions types and
- 02:03the instructions are being processed by
- 02:06the processor in a five-stage pipeline.
- 02:15The instructions types of the
- 02:20DLX architecture has three
- 02:22types, and it shows
- 02:25a significant and typical
- 02:29way how the RISC instruction
- 02:32sets usually are defined.
- 02:33We see this as a homogenic set up with the
- 02:36opcode always in the same position, that position
- 02:39for all three types.
- 02:41We also see that the source registered
- 02:44bits defining the source register also on
- 02:47the same type and then we have a varying
- 02:50fields for the other types.
- 02:52The three types are the
- 02:56beginning with the I-type instruction, which is used
- 02:58for load and store instruction, conditional branch
- 03:01and jump to register instructions.
- 03:04And here we see that we
- 03:07have a field defining the source
- 03:09register and the destination register, which
- 03:12is unused, for example, for jump instruction.
- 03:17The R-type instructions
- 03:20type is used for all instructions
- 03:23where we have to register operations.
- 03:26We have an opcode,
- 03:29we have a two-source register
- 03:32defined and we have one destination register.
- 03:36The function field at the end is
- 03:39showing you what is defining which
- 03:42or further defining the opcode, for example,
- 03:46the specific types of arithmetic logic
- 03:48unit operations.
- 03:50And the third and the last instruction type
- 03:54is the J-type instructions, which is used for
- 03:57jump and link instructions.
- 04:05The difference between CISC and
- 04:09RISC architecture is
- 04:12it's mainly in how the instructions words are
- 04:15defined.
- 04:16On the RISC side, we usually have a simpler set of
- 04:19instructions and we have specific
- 04:21instructions for memory access.
- 04:26While on the CISC side, you can
- 04:30often have special
- 04:33operands of an instruction referencing
- 04:36memory directly.
- 04:38The one advantage of the RISC type instruction set
- 04:42is that it has fixed width,
- 04:46bit width which means that
- 04:48if you have a window for
- 04:51the instruction fetch that you always have a fixed
- 04:54set of instructions being fetched with every window
- 04:57while on the CISC side you have a variable-length
- 05:00instructions.
- 05:02So you have a little bit more effort
- 05:05on the decoder side to decode the instruction.
- 05:10Today, most modern processor,
- 05:13at least internally, use a RISC like
- 05:17micro instructions inside, which is not visible to
- 05:19the outside.
- 05:26The five-stage pipeline of a DLX
- 05:29microprocessor or microarchitecture
- 05:32is a
- 05:35good example of one pipeline for a RISC
- 05:38microcontroller or microprocessor.
- 05:43We start with an instruction fetch
- 05:45where the instructions are loaded from the memory.
- 05:50When the instruction word has been
- 05:53fetched, we need to decode it to
- 05:56decide what to do.
- 05:57And the third stage we execute that instruction.
- 06:02And in the fourth state, we
- 06:05execute memory accesses,
- 06:08which not all does not apply to all instructions.
- 06:12And in the fifth cycle, the Write Back cycles,
- 06:16the results of the operation are written back into
- 06:18the register field.
- 06:21And this applies both to the memory as well
- 06:24as register to register operations, the
- 06:27results are written in the Write Back cycle.
- 06:33Now, as you can see
- 06:36that you have five cycles executing
- 06:39one instructions, and
- 06:42since each stage is usually
- 06:44a separate logic function,
- 06:47we can have
- 06:51what is called pipelining of instruction.
- 06:53So as soon as the first instruction has been fetched
- 06:58and it's in the decode cycle, we
- 07:01can use parallel to the decode
- 07:04cycle of the first instruction.
- 07:05We can fetch the next instruction.
- 07:10And this can be done even
- 07:13further for all five stages
- 07:17so that we can have five instructions
- 07:20in flight and in
- 07:23parallel.
- 07:25If pipelining is implemented, depends on
- 07:28on the implementation of the architecture,
- 07:31there are implementations available that do not
- 07:34parallelize the instructions or pipelining
- 07:37instructions, but there are quite
- 07:40a few that will do so.
- 07:47I want to take at this point
- 07:50to show you a little bit more details on how
- 07:55such a pipeline is
- 07:58on high level implemented in the hardware
- 08:01and this diagram I'm
- 08:04showing, the five
- 08:07pipeline stages,
- 08:10these symbols here,
- 08:13signals registers or latches
- 08:16in the hardware which
- 08:20store the data
- 08:24and output of constant data over the whole clock
- 08:26cycles. And they are updated
- 08:30with every clock cycle.
- 08:32So what happens in these systems
- 08:36is that we have for the fetch
- 08:39instruction for we have, for example, the
- 08:42instruction address and this register
- 08:45and the instruction address is then indexed
- 08:49into the memory.
- 08:52And given this address, the memory will
- 08:56output the corresponding
- 08:59words stored as that address.
- 09:02And within the next rising
- 09:04edge of the clock over the next clock cycle, we
- 09:06store the results in this register here.
- 09:10And parallel to that, the address
- 09:13instruction address is being incremented
- 09:17by the offset of the size of one
- 09:20instruction words.
- 09:22And in the DLX case, we have
- 09:26a 32 bit instruction as of four bytes per
- 09:28instruction.
- 09:31The second
- 09:33stage, the instruction decode stage,
- 09:36here we have two operations
- 09:40happening in parallel.
- 09:41There is we are decoding the instructions, what
- 09:44we need to do, as well
- 09:47as reading all the register
- 09:50referenced by that instruction word.
- 09:55In the third cycle,
- 09:57we take the data that has
- 10:00been stored in this register
- 10:03or in the snatch, and
- 10:06we select the correct operands
- 10:10depending on how the interaction was defined,
- 10:13and we run it through the ALU, the arithmetic
- 10:15logic unit, which performs additions,
- 10:18subtractions and so on, depending on what
- 10:21instruction types have been or instruction functions
- 10:24have been implemented.
- 10:29In the fourth cycle, we have a memory access
- 10:32cycle, so for
- 10:35read instructions, we are only
- 10:38specifying the address here, which will give us
- 10:40the data that was stored at that
- 10:43address. For
- 10:46store instructions, we both have the address as
- 10:49well as the data to be stored
- 10:52and the last cycle is showing us
- 10:55that we have a path back to the register
- 10:58file where the updated
- 11:01results are being stored.
- 11:05And just as a reminder,
- 11:08looking at how the pipeline worked, these
- 11:11five stages can be pipelined.
- 11:13And we see everything is working in parallel.
- 11:19I would like to point out one
- 11:22minor thing here, which is actually
- 11:25quite typical for high-level block
- 11:28diagrams of
- 11:32computer architecture or many things.
- 11:35Usually, they show the bare minimum
- 11:38of things that you want to show, which implies
- 11:41that there are a lot of other things that are
- 11:42hidden.
- 11:44And one thing that I want to point out that is
- 11:47hidden is that
- 11:51the branch decision
- 11:53or branch execution happens also
- 11:56in the execution cycle.
- 11:59And if we have a pipeline architecture,
- 12:04we see that if
- 12:07and if this let's assume that
- 12:10we want that the second instruction
- 12:13is a branch instruction.
- 12:15We see that there are two more instructions that are
- 12:18being fetched or worked on
- 12:21while until these second instructions
- 12:25in the execution phase where we actually have an
- 12:27updated address or instruction
- 12:30pointer and
- 12:32this path is missing from this diagram.
- 12:39And
- 12:44that is a quite critical part of when
- 12:46reading block diagrams of microarchitecture,
- 12:49there is so many things that is implied
- 12:52in the drawings that we need to think about.
- 12:56And so please
- 12:59keep that in mind when reading block diagrams, even
- 13:02if things seems hard to understand.
- 13:05That is often because there are something missing in
- 13:07the diagram.
- 13:12So,
- 13:17before we finish this
- 13:20video, I want to give you a
- 13:23short overview of the history of the
- 13:26POWER processors, what the history of the releases
- 13:29of the different versions.
- 13:33The POWER4 processor, which we will talk about
- 13:35in the next video.
- 13:37That is a consolidation of a longer
- 13:40history within IBM with collaborations with
- 13:42other companies
- 13:46which resulted in the POWER4
- 13:50chips and
- 13:53IBM has been steadily developing
- 13:56and inventing in the POWER architecture until we
- 13:59have POWER9
- 14:01released in 2017.
- 14:08And as I said, we will touch upon
- 14:10part of those in the next videos.
- 14:17I have here a short list
- 14:20of material for further
- 14:22reading, which
- 14:25I can quite recommend.
- 14:29And with that, I
- 14:32want to thank you for your attention this time and
- 14:34hope to see you in the next video.
To enable the transcript, please select a language in the video player settings menu.