Performance Monitoring Events — AMD Family 11h Processors

This section describes processor performance monitor events available for performance analysis and tuning for AMD Family 11h processors. The AMD Family 11h processors provide four 48-bit performance counters per available core, which allows four types of events to be monitored simultaneously. The performance counters are not guaranteed to be fully accurate and should be used as a relative measure of performance to assist in application tuning. Unlisted event numbers are reserved and their results are undefined.

Conventions, Definitions, and Special Notes

The Event Select value is used to select the event to be monitored. The Unit Mask is used to further qualify the event selected by the Event Select value. The Mask Value given here is an index and corresponds to actual 8-bit Unit Mask as specified in the following table.

Mask value Unit Mask
0 0x01
1 0x02
2 0x04
3 0x08
4 0x10
5 0x20
6 0x40
7 0x80

Unless otherwise stated, the Unit Mask values shown may be combined to select any desired combination of the sub-events for a given event. For events where no Unit Mask table is shown, the Unit Mask is not applicable and the results are undefined.

Speculative vs. Retired events: Several events may include speculative activity, meaning the events may be associated with false-path instructions that are ultimately discarded due to a branch misprediction. Events associated with "Retire" reflect actual program execution. For events where the distinction may matter, these are explicitly labeled as one or the other.

Dual-core operation: In AMD64 dual-core processors, each core has its own set of event counters. However, each core shares the event-select logic for events in the shared Northbridge logic, allowing an overwrite of a Northbridge event select (including unit mask) that was previously set up by the other core, changing the event that the first core thinks it is counting.

Note: This conflict between cores occurs between corresponding event counters, e.g., PMC0 vs. PMC0. So both cores cannot simultaneously monitor different Northbridge events using the same counter. When using the performance counters simultaneously in both cores, care must be taken to avoid this conflict, such as by having one core monitor the desired Northbridge events and the other core either monitor events internal to itself, or not use the corresponding event counters.

For detailed information, refer to the BIOS and Kernel Developer's Guide for AMD Athlon™ 64 and AMD Opteron™ Processors, order# 26094.

Floating point events

Event Select 0x000 Dispatched FPU ops

Abbreviation: FPU ops

The number of operations (uops) dispatched to the FPU execution pipelines. This event reflects how busy the FPU pipelines are. This includes all operations done by x87, MMX® and SSE instructions, including moves. Each increment represents a one-cycle dispatch event; packed 128-bit SSE operations count as two ops; scalar operations count as one. Speculative. (See also event CBh). Note: Since this event includes non-numeric operations it is not suitable for measuring MFLOPs.

Note: Since this event includes non-numeric operations it is not suitable for measuring MFLOPs.

Value Unit mask description
0 Add pipe ops excluding junk ops
1 Multiply pipe ops excluding junk ops
2 Store pipe ops excluding junk ops
3 Add pipe load ops
4 Multiply pipe load ops
5 Store pipe load ops

Event Select 0x001 Cycles in which the FPU is empty

Abbreviation: No FPU op cycles

The number of cycles in which the FPU is empty.

Event Select 0x002 Dispatched fast flag FPU operations

Abbreviation: Fast flag FPU ops

The number of FPU operations that use the fast flag interface (e.g. FCOMI, COMISS, COMISD, UCOMISS, UCOMISD). This event is a speculative event.

Load/store and TLB events

Event Select 0x020 Segment register loads

Abbreviation: Seg reg loads

The number of segment register loads performed.

Value Unit mask description
0 ES
1 CS
2 SS
3 DS
4 FS
5 GS
6 HS

Event Select 0x021 Pipeline restart due to self-modifying code

Abbreviation: Restart self-mod code

The number of pipeline restarts that were caused by self-modifying code (a store that hits any instruction that's been fetched for execution beyond the instruction doing the store).

Event Select 0x022 Pipeline restart due to probe hit

Abbreviation: Restart probe hit

The number of pipeline restarts caused by an invalidating probe hitting on a speculative out-of-order load.

Event Select 0x023 LS buffer 2 full

Abbreviation: LS2 buffer full

The number of cycles that the LS2 buffer is full. This buffer holds stores waiting to retire as well as requests that missed the data cache and are waiting on a refill. This condition will stall further data cache accesses, although such stalls may be overlapped by independent instruction execution.

Event Select 0x024 Locked operations

Abbreviation: Locked ops

This event covers locked operations performed and their execution time. The execution time represented by the cycle counts is typically overlapped to a large extent with other instructions. The non-speculative cycles event is suitable for event-based profiling of lock operations that tend to miss in the cache.

Value Unit mask description
0 Number of locked instructions executed
1 Number of cycles spent in speculative phase
2 Number of cycles spent in non-speculative phase

Data cache events

Event Select 0x040 Data cache accesses

Abbreviation: DC accesses

The number of accesses to the data cache for load and store references. This may include certain microcode scratchpad accesses, although these are generally rare. Each increment represents an eight-byte access, although the instruction may only be accessing a portion of that. This event is a speculative event.

Event Select 0x041 Data cache misses

Abbreviation: DC misses

The number of data cache references which missed in the data cache. This event is a speculative event.

Except in the case of streaming stores, only the first miss for a given line is included - access attempts by other instructions while the refill is still pending are not included in this event. So in the absence of streaming stores, each event reflects one 64-byte cache line refill, and counts of this event are the same as, or very close to, the combined count for event 42h.

Streaming stores however will cause this event for every such store, since the target memory is not refilled into the cache. Hence this event should not be used as an indication of data cache refill activity - event 42h should be used for such measurements. (See event 65h for an indication of streaming store activity.) A large difference between events 41h (with all UNIT_MASK bits set) and 42h would be due mainly to streaming store activity.

Event Select 0x042 Data cache refills from L2 or system

Abbreviation: DC refills L2/sys

The number of data cache refills satisfied from the L2 cache (and/or the system), per the UNIT_MASK. UNIT_MASK bits 4:1 allow a breakdown of refills from the L2 by coherency state. UNIT_MASK bit 0 reflects refills which missed in the L2, and provides the same measure as the combined sub-events of event 43h. Each increment reflects a 64-byte transfer. This event is a speculative event.

Value Unit mask description
0 Refill from system
1 Shared-state line from L2
2 Exclusive-state line from L2
3 Owned-state line from L2
4 Modified-state line from L2

Event Select 0x043 Data cache refills from system

Abbreviation: DC refills sys

The number of L1 cache refills satisfied from the system (system memory or another cache), as opposed to the L2. The UNIT_MASK selects lines in one or more specific coherency states. Each increment reflects a 64-byte transfer. This event is a speculative event.

Value Unit mask description
0 Invalid
1 Shared
2 Exclusive
3 Owned
4 Modified

Event Select 0x044 Data cache lines evicted

Abbreviation: DC evicted

The number of L1 data cache lines written to the L2 cache or system memory, having been displaced by L1 refills. The UNIT_MASK may be used to count only victims in specific coherency states. Each increment represents a 64-byte transfer. This event is a speculative event.

In most cases, L1 victims are moved to the L2 cache, displacing an older cache line there. Lines brought into the data cache by PrefetchNTA instructions, however, are evicted directly to system memory (if dirty) or invalidated (if clean). There is no provision for measuring this component by itself. The Invalid case (UNIT_MASK value 01h) reflects the replacement of lines that would have been invalidated by probes for write operations from another processor or DMA activity.

Value Unit mask description
0 Invalid
1 Shared
2 Exclusive
3 Owned
4 Modified

Event Select 0x045 L1 DTLB miss and L2 DTLB hit

Abbreviation: DTLB L1M L2H

The number of data cache accesses that miss in the L1 DTLB and hit in the L2 DTLB. This event is a speculative event.

Event Select 0x046 L1 DTLB miss and L2 DTLB miss

Abbreviation: DTLB L1M L2M

The number of data cache accesses that miss in both the L1 and L2 DTLBs. This event is a speculative event.

Event Select 0x047 Misaligned accesses

Abbreviation: Misalign access

The number of data cache accesses that are misaligned. These are accesses which cross an eight-byte boundary. They incur an extra cache access (reflected in event 40h), and an extra cycle of latency on reads. This event is a speculative event.

Event Select 0x048 Microarchitectural late cancel of an acess

Abbreviation: Late cancel

Event Select 0x049 Microarchitectural early cancel of an access

Abbreviation: Early cancel

Event Select 0x04A Single-bit ECC errors recorded by scrubber

Abbreviation: 1-bit ECC errors

The number of single-bit errors corrected by either of the error detection/correction mechanisms in the data cache.

Value Unit mask description
0 Scrubber error
1 Piggyback scrubber errors

Event Select 0x04B Prefetch instructions dispatched

Abbreviation: Prefetch inst

The number of prefetch instructions dispatched by the decoder. Such instructions may or may not cause a cache line transfer. All Dcache and L2 accesses, hits and misses by prefetch instructions, except for prefetch instructions that collide with an outstanding hardware prefetch, are included in these events. This event is a speculative event.

Value Unit mask description
0 Load (Prefetch, PrefetchT0/T1/T2
1 Store (PrefetchW)
2 NTA (PrefetchNTA)

Event Select 0x04C DCACHE misses by locked instructions

Abbreviation: DC misses locked inst

The number of data cache misses incurred by locked instructions. (The total number of locked instructions may be obtained from event 24h.)

Such misses may be satisfied from the L2 or system memory, but there is no provision for distinguishing between the two. When used for event-based profiling, this event will tend to occur very close to the offending instructions. (See also event 24h.) This event is also included in the basic Dcache miss event (event 41h).

Value Unit mask description
1 Data cache misses by locked instructions

L2 cache and system interface events

Event Select 0x065 Number of memory type requests

Abbreviation: Mem type req

These events reflect accesses to uncachable (UC) or write-combining (WC) memory regions (as defined by MTRR or PAT settings) and Streaming Store activity to WB memory. Both the WC and Streaming Store events reflect Write Combining buffer flushes, not individual store instructions. WC buffer flushes which typically consist of one 64-byte write to the system for each flush (assuming software typically fills a buffer before it gets flushed). A partially-filled buffer will require two or more smaller writes to the system. The WC event reflects flushes of WC buffers that were filled by stores to WC memory or streaming stores to WB memory. The Streaming Store event reflects only flushes due to streaming stores (which are typically only to WB memory). The difference between counts of these two events reflects the true amount of write events to WC memory.

Value Unit mask description
0 Requests to non-cacheable (UC) memory
1 Requests to write-combining (WC) memory or WC buffer flushes to WB memory
7 Streaming store (SS) requests

Event Select 0x067 Data prefetcher

Abbreviation: Data prefetcher

These events reflect requests made by the data prefetcher. UNIT_MASK bit 1 counts total prefetch requests, while bit 0 counts requests where the target block is found in the L2 or data cache. The difference between the two represents actual data read (in units of 64-byte cache lines) from the system by the prefetcher. This is also included in the count of event 7Fh, UNIT_MASK bit 0 (combined with other L2 fill events).

Value Unit mask description
0 Cancelled prefetches
1 Prefetch attempts

Event Select 0x06C System read responses by coherency state

Abbreviation: Sys read resp

The number of responses from the system for cache refill requests. The UNIT_MASK may be used to select specific cache coherency states. Each increment represents one 64-byte cache line transferred from the system (DRAM or another cache, including another core on the same node) to the data cache, instruction cache or L2 cache (for data prefetcher and TLB table walks). Modified-state responses may be for Dcache store miss refills, PrefetchW software prefetches, hardware prefetches for a store-miss stream, or Change-to-Dirty requests that get a dirty (Owned) probe hit in another cache. Exclusive responses may be for any Icache refill, Dcache load miss refill, other software prefetches, hardware prefetches for a load-miss stream, or TLB table walks that miss in the L2 cache; Shared responses may be for any of those that hit a clean line in another cache.

Value Unit mask description
0 Exclusive
1 Modified
2 Shared
4 Data error

Event Select 0x06D Quadwords written to system

Abbreviation: Quad written to sys

The number of quadword (8-byte) data transfers from the processor to the system. These may be part of a 64-byte cache line writeback or a 64-byte dirty probe hit response, each of which would cause eight increments; or a partial or complete Write Combining buffer flush (Sized Write), which could cause from one to eight increments.

Value Unit mask description
0 Quadword write transfer

Event Select 0x07D Requests to L2 cache

Abbreviation: L2 requests

The number of requests to the L2 cache for Icache or Dcache fills, or page table lookups for the TLB. These events reflect only read requests to the L2; writes to the L2 are indicated by event 7Fh. These include some amount of retries associated with address or resource conflicts. Such retries tend to occur more as the L2 gets busier, and in certain extreme cases (such as large block moves that overflow the L2) these extra requests can dominate the event count.

These extra requests are not a direct indication of performance impact - they simply reflect opportunistic accesses that don't complete. But because of this, they are not a good indication of actual cache line movement. The Icache and Dcache miss and refill events (81h, 82h, 83h, 41h, 42h, 43h) provide a more accurate indication of this, and are the preferred way to measure such traffic.

Value Unit mask description
0 IC fill
1 DC fill
2 TLB fill (page table walks)
3 Tag snoop request
4 Cancelled request

Event Select 0x07E L2 cache missess

Abbreviation: L2 misses

The number of requests that miss in the L2 cache. This may include some amount of speculative activity, as well as some amount of retried requests as described in event 7Dh. The IC-fill-miss and DC-fill-miss events tend to mirror the Icache and Dcache refill-from-system events (83h and 43h, respectively), and tend to include more speculative activity than those events.

Value Unit mask description
0 IC fill
1 DC fill (includes possible replays)
2 TLB page table walk

Event Select 0x07F L2 fill/writeback

Abbreviation: L2 fill/write

The number of lines written into the L2 cache due to victim writebacks from the Icache or Dcache, TLB page table walks and the hardware data prefetcher (UNIT_MASK bit 0); or writebacks of dirty lines from the L2 to the system (UNIT_MASK bit 1). Each increment represents a 64-byte cache line transfer.

Note: Victim writebacks from the Dcache may be measured separately using event 44h. However this is not quite the same as the Dcache component of event 7Fh, the main difference being PrefetchNTA lines. When these are evicted from the Dcache due to replacement, they are written out to system memory (if dirty) or simply invalidated (if clean), rather than being moved to the L2 cache.

Value Unit mask description
0 L2 fills (victims from L1 caches, TLB page table walks and data prefetches)
1 L2 writebacks to system

Instruction cache events

Event Select 0x080 Instruction cache fetches

Abbreviation: IC fetches

The number of instruction cache accesses by the instruction fetcher. Each access is an aligned 16 byte read, from which a varying number of instructions may be decoded.

Event Select 0x081 Instruction cache misses

Abbreviation: IC misses

The number of instruction fetches that miss in the instruction cache. This is typically equal to or very close to the sum of events 82h and 83h. Each miss results in a 64-byte cache line refill.

Event Select 0x082 Instruction cache refills from L2

Abbreviation: IC refills from L2

The number of instruction cache refills satisfied from the L2 cache. Each increment represents one 64-byte cache line transfer.

Event Select 0x083 Instruction cache refills from system

Abbreviation: IC refills from sys

The number of instruction cache refills from system memory (or another cache). Each increment represents one 64-byte cache line transfer.

Event Select 0x084 L1 ITLB miss and L2 ITLB hit

Abbreviation: ITLB L1M L2H

The number of instruction fetches that miss in the L1 ITLB but hit in the L2 ITLB.

Event Select 0x085 L1 ITLB miss and L2 ITBL miss

Abbreviation: ITLB L1M L2M

The number of instruction fetches that miss in both the L1 and L2 ITLBs.

Event Select 0x086 Pipeline restart due to instruction stream probe

Abbreviation: Restart i-stream probe

The number of pipeline restarts caused by invalidating probes that hit on the instruction stream currently being executed. This would happen if the active instruction stream was being modified by another processor in an MP system - typically a highly unlikely event.

Event Select 0x087 Instruction fetch stall

Abbreviation: Inst fetch stall

The number of cycles the instruction fetcher is stalled. This may be for a variety of reasons such as branch predictor updates, unconditional branch bubbles, far jumps and cache misses, among others. May be overlapped by instruction dispatch stalls or instruction execution, such that these stalls don't necessarily impact performance.

Event Select 0x088 Return stack hits

Abbreviation: RET stack hits

The number of near return instructions (RET or RET Iw) that get their return address from the return address stack (i.e. where the stack has not gone empty). This may include cases where the address is incorrect (return mispredicts). This may also include speculatively executed false-path returns. Return mispredicts are typically caused by the return address stack underflowing, however they may also be caused by an imbalance in calls vs. returns, such as doing a call but then popping the return address off the stack.

Note: This event cannot be reliably compared with events C9h and CAh (such as to calculate percentage of return mispredicts due to an empty return address stack), since it may include speculatively executed false-path returns that are not included in those retire-time events.

Event Select 0x089 Return stack overflows

Abbreviation: RET stack overflows

The number of (near) call instructions that cause the return address stack to overflow. When this happens, the oldest entry is discarded. This count may include speculatively executed calls.

Execution unit events

Event Select 0x026 Retired CLFLUSH instructions

Abbreviation: Ret CLFLUSH inst

The number of CLFLUSH instructions retired.

Event Select 0x027 Retired CPUID instructions

Abbreviation: Ret CPUID inst

The number of CPUID instructions retired.

Event Select 0x076 CPU clocks not halted (cycles)

Abbreviation: CPU clocks

The number of clocks that the CPU is not in a halted state (due to STPCLK or a HALT instruction). Note: this event allows system idle time to be automatically factored out from IPC (or CPI) measurements, providing the OS halts the CPU when going idle. If the OS goes into an idle loop rather than halting, such calculations will be influenced by the IPC of the idle loop.

Event Select 0x0C0 Retired instructions

Abbreviation: Ret inst

The number of instructions retired (execution completed and architectural state updated). This count includes exceptions and interrupts - each exception or interrupt is counted as one instruction.

Event Select 0x0C1 Retired uops

Abbreviation: Ret uops

The number of micro-ops retired. This includes all processor activity (instructions, exceptions, interrupts, microcode assists, etc.).

Event Select 0x0C2 Retired branch instructions

Abbreviation: Ret branch

The number of branch instructions retired. This includes all types of architectural control flow changes, including exceptions and interrupts.

Event Select 0x0C3 Retired mispredicted branch instructions

Abbreviation: Ret misp branch

The number of branch instructions retired, of any type, that were not correctly predicted. This includes those for which prediction is not attempted (far control transfers, exceptions and interrupts).

Event Select 0x0C4 Retired taken branch instructions

Abbreviation: Ret taken branch

The number of taken branches that were retired. This includes all types of architectural control flow changes, including exceptions and interrupts.

Event Select 0x0C5 Retired taken branch instructions mispredicted

Abbreviation: Ret taken branch misp

The number of retired taken branch instructions that were mispredicted.

Event Select 0x0C6 Retired far control transfers

Abbreviation: Ret far xfers

The number of far control transfers retired including far call/jump/return, IRET, SYSCALL and SYSRET, plus exceptions and interrupts. Far control transfers are not subject to branch prediction.

Event Select 0x0C7 Retired branch resyncs

Abbreviation: Ret branch resyncs

The number of resync branches. These reflect pipeline restarts due to certain microcode assists and events such as writes to the active instruction stream, among other things. Each occurrence reflects a restart penalty similar to a branch mispredict. Relatively rare.

Event Select 0x0C8 Retired near returns

Abbreviation: Ret near RET

The number of near return instructions (RET or RET Iw) retired.

Event Select 0x0C9 Retired near returns mispredicted

Abbreviation: Ret near RET misp

The number of near returns retired that were not correctly predicted by the return address predictor. Each such mispredict incurs the same penalty as a mispredicted conditional branch instruction.

Event Select 0x0CA Retired indirect branches mispredicted

Abbreviation: Ret ind branch misp

The number of indirect branch instructions retired where the target address was not correctly predicted.

Event Select 0x0CB Retired MMX/FP instructions

Abbreviation: Ret MMX/FP inst

The number of MMX®, SSE or X87 instructions retired. The UNIT_MASK allows the selection of the individual classes of instructions as given in the table. Each increment represents one complete instruction.

Note: Since this event includes non-numeric instructions it is not suitable for measuring MFLOPS.

Value Unit mask description
0 x87 instructions
1 MMX and 3DNow instructions
2 Packed SSE and SSE2 instructions
3 Scalar SSE and SSE2 instructions

Event Select 0x0CC Retired fastpath double op instructions

Abbreviation: Ret fastpath double op

Value Unit mask description
0 With low op in position 0
1 With low op in position 1
2 With low op in position 2

Event Select 0x0CD Interrupts-masked cycles

Abbreviation: Int-masked cycles

The number of processor cycles where interrupts are masked (EFLAGS.IF = 0). Using edge-counting with this event will give the number of times IF is cleared; dividing the cycle-count value by this value gives the average length of time that interrupts are disabled on each instance. Compare the edge count with event CFh to determine how often interrupts are disabled for interrupt handling vs. other reasons (e.g. critical sections).

Event Select 0x0CE Interrupts-masked cycles with interrupt pending

Abbreviation: Int-masked pending

The number of processor cycles where interrupts are masked (EFLAGS.IF = 0) and an interrupt is pending. Using edge-counting with this event and comparing the resulting count with the edge count for event CDh gives the proportion of interrupts for which handling is delayed due to prior interrupts being serviced, critical sections, etc. The cycle count value gives the total amount of time for such delays. The cycle count divided by the edge count gives the average length of each such delay.

Event Select 0x0CF Interrupts taken

Abbreviation: Int taken

The number of hardware interrupts taken. This does not include software interrupts (INT n instruction).

Event Select 0x0D0 Decoder empty

Abbreviation: Decoder empty

The number of processor cycles where the decoder has nothing to dispatch (typically waiting on an instruction fetch that missed the Icache, or for the target fetch after a branch mispredict).

Event Select 0x0D1 Dispatch stalls

Abbreviation: Dispatch stalls

The number of processor cycles where the decoder is stalled for any reason (has one or more instructions ready but can't dispatch them due to resource limitations in execution). This is the combined effect of events D2h - DAh, some of which may overlap; this event reflects the net stall cycles. The more common stall conditions (events D5h, D6h, D7h, D8h, and to a lesser extent D2) may overlap considerably. The occurrence of these stalls is highly dependent on the nature of the code being executed (instruction mix, memory reference patterns, etc.).

Event Select 0x0D2 Dispatch stall for branch abort to retire

Abbreviation: Stall branch abort

The number of processor cycles the decoder is stalled waiting for the pipe to drain after a mispredicted branch. This stall occurs if the corrected target instruction reaches the dispatch stage before the pipe has emptied. See also event D1h.

Event Select 0x0D3 Dispatch stall for serialization

Abbreviation: Stall serialization

The number of processor cycles the decoder is stalled due to a serializing operation, which waits for the execution pipeline to drain. Relatively rare; mainly associated with system instructions. See also event D1h.

Event Select 0x0D4 Dispatch stall for segment load

Abbreviation: Stall seg load

The number of processor cycles the decoder is stalled due to a segment load instruction being encountered while execution of a previous segment load operation is still pending. Relatively rare except in 16-bit code. See also event D1h.

Event Select 0x0D5 Dispatch stall for reorder buffer full

Abbreviation: Stall reorder full

The number of processor cycles the decoder is stalled because the reorder buffer is full. May occur simultaneously with certain other stall conditions; see event D1h.

Event Select 0x0D6 Dispatch stall for reservation station full

Abbreviation: Stall res station full

The number of processor cycles the decoder is stalled because a required integer unit reservation stations is full. May occur simultaneously with certain other stall conditions; see event D1h.

Event Select 0x0D7 Dispatch stall for FPU full

Abbreviation: Stall FPU full

The number of processor cycles the decoder is stalled because the scheduler for the Floating Point Unit is full. This condition can be caused by a lack of parallelism in FP-intensive code, or by cache misses on FP operand loads (which could also show up as event D8h instead, depending on the nature of the instruction sequences). May occur simultaneously with certain other stall conditions; see event D1h

Event Select 0x0D8 Dispatch stall for LS full

Abbreviation: Stall LS full

The number of processor cycles the decoder is stalled because the Load/Store Unit is full. This generally occurs due to heavy cache miss activity. May occur simultaneously with certain other stall conditions; see event D1h.

Event Select 0x0D9 Dispatch stall waiting for all quiet

Abbreviation: Stall waiting quiet

The number of processor cycles the decoder is stalled waiting for all outstanding requests to the system to be resolved. Relatively rare; associated with certain system instructions and types of interrupts. May partially overlap certain other stall conditions; see event D1h.

Event Select 0x0DA Dispatch stall for far control transfer or resync to retire

Abbreviation: Stall far/resync

The number of processor cycles the decoder is stalled waiting for the execution pipeline to drain before dispatching the target instructions of a far control transfer or a Resync (an instruction stream restart associated with certain microcode assists). Relatively rare; does not overlap with other stall conditions. See also event D1h.

Event Select 0x0DB FPU exceptions

Abbreviation: FPU except

The number of floating point unit exceptions for microcode assists. The UNIT_MASK may be used to isolate specific types of exceptions.

Value Unit mask description
0 x87 reclass microfaults
1 SSE retype microfaults
2 SSE reclass microfaults
3 SSE and x87 microtraps

Event Select 0x0DC DR0 breakpoint matches

Abbreviation: DR0 matches

The number of matches on the address in breakpoint register DR0, per the breakpoint type specified in DR7. The breakpoint does not have to be enabled. Each instruction breakpoint match incurs an overhead of about 120 cycles; load/store breakpoint matches do not incur any overhead.

Event Select 0x0DD DR1 breakpoint matches

Abbreviation: DR1 matches

The number of matches on the address in breakpoint register DR1. See notes for event DCh.

Event Select 0x0DE DR2 breakpoint matches

Abbreviation: DR2 matches

The number of matches on the address in breakpoint register DR2. See notes for event DCh.

Event Select 0x0DF DR3 breakpoint matches

Abbreviation: DR3 matches

The number of matches on the address in breakpoint register DR3. See notes for event DCh.

Memory controller events

Event Select 0x0E0 DRAM accesses

Abbreviation: DRAM accesses

The number of memory accesses performed by the local DRAM controller. The UNIT_MASK may be used to isolate the different DRAM page access cases. Page miss cases incur an extra latency to open a page; page conflict cases incur both a page-close as well as page-open penalties. These penalties may be overlapped by DRAM accesses for other requests and don't necessarily represent lost DRAM bandwidth. The associated penalties are as follows:

Page miss: Trcd (DRAM RAS-to-CAS delay)

Page conflict: Trp + Trcd (DRAM row-precharge time plus RAS-to-CAS delay)

Each DRAM access represents one 64-byte block of data transferred if the DRAM is configured for 64-byte granularity, or one 32-byte block if the DRAM is configured for 32-byte granularity. (The latter is only applicable to single-channel DRAM systems, which may be configured either way.)

Value Unit mask description
0 DCT0 page hit
1 DCT0 page miss
2 DCT0 page conflict
3 DCT1 page hit
4 DCT1 page miss
5 DCT1 page conflict
6 Write request
7 Read request

Event Select 0x0E1 DRAM controller page table events

Abbreviation: Page table overflows

The number of page table overflows in the local DRAM controller. This table maintains information about which DRAM pages are open. An overflow occurs when a request for a new page arrives when the maximum number of pages are already open. Each occurrence reflects an access latency penalty equivalent to a page conflict.

Value Unit mask description
0 DCT page table overflow
1 Number of stale table entry hits (hit on a page closed too soon)
2 Page table idle cycle limit incremented
3 Page table idle cycle limit decremented

Event Select 0x0E3 Memory controller turnarounds

Abbreviation: Turnarounds

The number of turnarounds on the local DRAM data bus. The UNIT_MASK may be used to isolate the different cases. These represent lost DRAM bandwidth, which may be calculated as follows (in bytes per occurrence):

DIMM turnaround: DRAM_width_in_bytes * 2 edges_per_memclk * 2

R/W turnaround: DRAM_width_in_bytes * 2 edges_per_memclk * 1

R/W turnaround: DRAM_width_in_bytes * 2 edges_per_memclk * (Tcl-1)

where DRAM_width_in_bytes is 8 or 16 (for single- or dual-channel systems), and Tcl is the CAS latency of the DRAM in memory system clock cycles (where the memory clock for DDR-400, or PC3200 DIMMS, for example, would be 200 MHz).

Value Unit mask description
0 DIMM (chip select) turnaround
1 Read to write turnaround
2 Write to read turnaround

Event Select 0x0E4 Memory controller RBD queue events

Abbreviation: XXXX

Value Unit mask description
2 F2x[1,0]94[DcqBypassMax] counter reached

Event Select 0x0E8 Thermal Status

Abbreviation: Thermal/ECC errors

Value Unit mask description
0 Revision A: Reserved, Revision B: Number of clocks MEMHOT_L is asserted
2 Number of times the HTC transitions from inactive to active
5 Number of clocks HTC P-state is inactive
6 Number of clocks HTC P-state is active
7 PROCHOT_L asserted by an external source and P-state change occurred

Event Select 0x0E9 CPU/IO requests to memory/IO

Abbreviation: CPU/IO req mem/IO

These events reflect request flow between units and nodes, as selected by the UNIT_MASK. The UNIT_MASK is divided into two fields: request type (CPU or I/O access to I/O or Memory) and source/target location (local vs. remote). One or more requests types must be enabled via bits 3:0, and at least one source and one target location must be selected via bits 7:4. Each event reflects a request of the selected type(s) going from the selected source(s) to the selected target(s).

Not all possible paths are supported. The following table shows the UNIT_MASK values that are valid for each request type: Any of the mask values shown may be logically ORed to combine the events. For instance, local CPU requests to both local and remote nodes would be A8h | 98h = B8h. Any CPU to any I/O would be A4h | 94h | 64h = F4h (but remote CPU to remote I/O requests would not be included).

Request type CPU to Memory
CPU to memory A8h
CPU to IO A4h
IO to memory A2h
IO to IO A1h

Note: It is not possible to tell from these events how much data is going in which direction, as there is no distinction between reads and writes. Also, particularly for I/O, the requests may be for varying amounts of data, anywhere from one to sixty-four bytes. Event E5h provides an indication of 32- and 64-byte read and write transfers for such requests (although from the target point of view). For a direct measure of the amount and direction of data flowing between nodes, use events F6h, F7h and F8h.

Value Unit mask description
0 I/O to I/O
1 I/O to memory
2 CPU to I/O
3 CPU to memory

Event Select 0x0EA Cache block commands

Abbreviation: Cache block cmd

The number of requests made to the system for cache line transfers or coherency state changes, by request type. Each increment represents one cache line transfer, except for Change-to-Dirty. If a Change-to-Dirty request hits on a line in another processor's cache that's in the Owned state, it will cause a cache line transfer, otherwise there is no data transfer associated with Change-to-Dirty requests.

Value Unit mask description
0 Victim block (writeback)
2 Read block (Dcache load miss refill)
3 Read block shared (ICache refill)
4 Read block modified (DCache store miss refill)
5 Change to Dirty (first store to clean block in cache)

Event Select 0x0EB Sized commands

Abbreviation: Sized cmd

The number of Sized Read/Write commands handled by the System Request Interface (local processor and hostbridge interface to the system). These commands may originate from the processor or hostbridge. Typical uses of the various Sized Read/Write commands are given in the UNIT_MASK table. See also event E5h, which covers commonly-used block sizes for these requests, and event ECh, which provides a separate measure of Hostbridge accesses.

Value Unit mask description
0 NonPosted SzWr byte (1-32 bytes)
1 NonPosted SzWr DWORD (1-16 DWORDs)
2 Posted SzWr byte (1-32 bytes)
3 Posted SzWr DWORD (1-16 DWORDs)
4 SzRd byte (4 bytes)
5 SzRd DWORD (1-16 DWORDs)

Event Select 0x0EC Probe responses and upstream requests

Abbreviation: Probe resp/up req

This covers two unrelated sets of events: cache probe results, and requests received by the Hostbridge from devices on non-coherent links.

Probe results: These events reflect the results of probes sent from a memory controller to local caches. They provide an indication of the degree data and code is shared between processors (or moved between processors due to process migration). The dirty-hit events indicate the transfer of a 64-byte cache line to the requestor (for a read or cache refill) or the target memory (for a write). The system bandwidth used by these, in terms of bytes per unit of time, may be calculated as 64 times the event count, divided by the elapsed time. Sized writes to memory that cover a full cache line do not incur this cache line transfer -- they simply invalidate the line and are reported as clean hits. Cache line transfers will occur for Change2Dirty requests that hit cache lines in the Owned state. (Such cache lines are counted as Modified-state refills for event 6Ch, System Read Responses.)

Upstream requests: The upstream read and write events reflect requests originating from a device on a local IO link. The two read events allow display refresh traffic in a UMA system to be measured separately from other DMA activity. Display refresh traffic is typically dominated by 64-byte transfers. Non-display-related DMA accesses may be anywhere from 1 to 64 bytes in size, but may be dominated by a particular size such as 32 or 64 bytes, depending on the nature of the devices.

Value Unit mask description
0 Probe miss
1 Probe hit clean
2 Probe hit dirty without memory cancel
3 Probe hit dirty with memory cancel
4 Upstream display refresh/ISOC reads
5 Upstream non-display refresh reads
6 Upstream ISOC writes
7 Upstream non-ISOC writes

Event Select 0x0EE DEV events

Abbreviation: DEV events

Value Unit mask description
4 DEV hit
5 DEV miss
6 DEV error

Event Select 0x01F0 Memory controller requests

Abbreviation: MCT requests

Value Unit mask description
3 32 bytes sized writes
4 64 bytes sized writes
5 32 bytes sized reads
6 64 bytes sized reads

Crossbar events

Event Select 0x1E9 Sideband signals and special cycles

Abbreviation: Sideband signals

Value Unit mask description
0 HALT
1 STOPGRANT
2 SHUTDOWN
3 WBINVD
4 INVD

Event Select 0x1EA Interrupt events

Abbreviation: Int events

Value Unit mask description
0 Fixed
1 LPA
2 SMI
3 NMI
4 INIT
5 STARTUP
6 INT
7 EOI

Link events

Event Select 0x0F6 HyperTransport™ link 0 transmit bandwidth

Abbreviation: HT0 bandwidth

Value Unit mask description
0 Command DWORD sent
1 Address DWORD sent
2 Data DWORD sent
3 Buffer release DWORD sent
4 NOP DWORD sent (idle)
5 Per packet CRC sent