ARM Memory Management Unit
ARM Memory Management Unit
8-1
8-2
Bits
M,A,S,R 31 .. 14 31 .. 0 8 .. 0 31 .. 0 31 .. 0 31 & 5 .. 0
8-3
8-4
8-5
Table Index
Section Index
Translation Base
12 18
31 14 13 2 1 0
Translation Base
Table Index
0 0
8-6
31
20 19
12 11 10 9
Domain 1 C B 1 0 1 1
8-7
8-8
8-9
Virtual Address
31 20 19 0
Table Index
Section Index
Translation Base
12 18
31 14 13 2 1 0
Translation Base
Table Index
0 0
AP
Domain 1 C B 1 0
20
12
31
Physical Address
20 19 0
Section Index
8-10
0 0 Large Page Base Address Small Page Base Address ap3 ap2 ap1 ap0 C B 0 1 ap3 ap2 ap1 ap0 C B 1 0 1 1
8-11
Table Index
12
L2 Table Index
8
Page Index
12
Translation Base
18
31 14 13 2 1 0
Translation Base
Table Index
0 0
Domain
0 1
31
10 9
L2 Table Index
0 0
Physical Address
31 12 11 0
Page Index
8-12
Table Index
12
L2 Table Index
8
Page Index
12
Translation Base
18
31 14 13 2 1 0
Translation Base
Table Index
0 0
Domain
0 1
31
10 9
L2 Table Index
0 0
Physical Address
31 16 15 0
Page Index
8-13
Note
8-14
8-15
8-16
Source
Priority
Domain[3:0]
FAR
highest priority
Terminal Exception Vector Exception Alignment External Abort on Translation First level Second level Translation Domain Permission External Abort on linefetch External Abort on non-linefetch Section Page Section Page Section Page Section Page Section Page 0b0010 0b0000 0b00x1 0b1100 0b1110 0b0101 0b0111 0b1001 0b1011 0b1101 0b1111 0b0100 0b0110 0b1000 0b1010 invalid invalid invalid invalid valid invalid valid valid valid valid valid valid valid valid valid VA of start of cache line being written-back VA of access causing abort VA of access causing abort VA of access causing abort VA of access causing abort VA of access causing abort VA of access causing abort VA of start of cache line being loaded VA of access causing abort
lowest priority
8-17
8-18
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10
15
14
13
12
11
10
Figure 8-8: Domain Access Control Register format Table 8-7: Interpreting access bits in Domain Access Control Register denes how the bits within each domain are interpreted to specify the access permissions.
Value 00 01 10 11 Meaning No Access Client Reserved Manager Notes Any access will generate a Domain Fault. Accesses are checked against the access permission bits in the Section or Page descriptor. Reserved. Currently behaves like the no access mode. Accesses are NOT checked against the access Permission bits so a Permission fault cannot be generated.
8-19
Virtual Address
26 bit data access to vecs Vector Fault
misaligned
Alignment Fault
invalid
invalid
no access(00) reserved(10)
no access(00) reserved(10)
client(01)
client(01)
violation
violation
Physical Address
Figure 8-9: Sequence for checking faults
8-20
8-21
8-22
8-23
8-24
Care must be taken if the translated address differs from the untranslated address as severalinstructions following the enabling of the MMU mayhave been fetched using flat translation and enabling the MMU may be considered as a branch with delayed execution. A similar situation occurs when the MMU is disabled. Consider the following code sequence:
MOV MCR Fetch Flat Fetch Flat Fetch Translated R1, #0x1 15,0,R1,0,0 ; Enable MMU
To disable the MMU: Disable Branch prediction, if it is enabled, by using the code sequence given in 6.3.3 Turning off Branch Prediction. 2 Disable the WB by clearing bit 3 in the Control Register. 3 Disable the IDC by clearing bit 2 in the Control Register. 4 Disable the MMU by clearing bit 0 in the Control Register. Note that if the MMU is enabled, then disabled and subsequently re-enabled the contents of the TLB will have been preserved. If these are now invalid, the TLB should be flushed before re-enabling the MMU. Disabling of all three functions described in steps 2, 3 and 4 may be done simultaneously. 1
8-25
8-26
9
This chapter describes the Write Buffer (WB). 9.1 Cacheable and Bufferable bits 9.2 Write Buffer Operation
Write Buffer
9-3 9-4
9-1
Write Buffer
The ARM810 write buffer is provided to improve system performance. It can buffer up to 8 words of data, and 4 independent addresses. It may be enabled or disabled via the W bit (bit 3) in the ARM810 Control Register and the buffer is disabled and ushed on reset. The operation of the write buffer is further controlled by the C and B bits which are stored in the Memory Management Page Tables. For this reason, in order to use the write buffer, the MMU must be enabled. The two functions may however be enabled simultaneously, with a single write to the Control Register. For a write to use the write buffer, both the W bit in the Control Register and either the C or B bit in the corresponding page table must be set. It is not possible to abort buffered writes externally; the abort pin will be ignored. Areas of memory which may generate aborts should be marked as unbufferable in the MMU page tables.
9-2
Write Buffer
9.1 Cacheable and Bufferable bits
These bits controls whether a write operation may or may not use the write buffer. Typically main memory will be cacheable and bufferable and I/O space unbufferable. The C and B bits can be congured for both pages and sections. This is decribed in section 8.11 Cacheable and Bufferable Status of Memory Regions on page 8-147.
9-3
Write Buffer
9.2 Write Buffer Operation
If the write buffer is enabled and the processor performs a write to a bufferable area, the data is placed in the write buffer at FCLK (MCLK if running with fastbus extension) speeds and the CPU continues execution. The write buffer then performs the external write in parallel. If however the write buffer is full (either because there are already 8 words of data in the buffer, or because there is no slot for the new address) then the processor is stalled until there is sufcient space in the buffer.
9.2.3 Read-lock-write
The write phase of a read-lock-write sequence is treated as an Unbuffered write, even if it is marked as buffered.
9-4
Write Buffer
Note:
A single write requires one address slot and one data slot in the write buffer; a sequential write of n words requires one address slot and n data slots. The total of 8 data slots in the buffer may be used as required. So for instance there could be 3 non-sequential writes and one sequential write of 5 words in the buffer, and the processor could continue as normal: a 5th write or a 6th word in the 4th write would stall the processor until the first write had completed.
9-5
Write Buffer
9-6
10
Coprocessors
This chapter describes use of coprocessors with the ARM810. 10.1 Overview 10-2
10-1
Coprocessors
10.1 Overview
The ARM810 has no external coprocessor interface, so it is not possible to add external coprocessors to ARM810. ARM810 has an internal coprocessor, called the System Control Coprocessor designated as coprocessor number 15. The System Control Coprocessor is used to control the conguration of the device, including the endianness setting, enabling of the Cache, MMU, Writebuffer, Branch Prediction, and the control of the Cache and MMU. The System Control coprocessor is documented in detail in Chapter 5, Configuration and in the chapters on those parts of the ARM810 it controls: Chapter 7, Instruction and Data Cache (IDC), Chapter 8, Memory Management Unit,Chapter 9, Write Buffer, Chapter 6, The Prefetch Unit.
10-2
11
ARM810 Clocking
This chapter describes the bus interface clocking: 11.1 The Bus Clock 11.2 The Processor Clock 11.3 Generation of the Fast Clock 11.4 Forced Processor Clock from the Bus Clock 11.5 Low Power Idle and Sleep 11-3 11-4 11-6 11-9 11-10
11-1
ARM810 Clocking
The ARM810 uses two clock signals: bus clock fast clock These clocks are derived from external inputs to the processor with congurations dened by external pins and the on-chip programmable registers. The fast clock can be selected from three sources: bus clock on-chip PLL external reference clock When the fast clock is sourced from the bus clock, operation is equivalent to ARM710a's Fastbus mode. When the fast clock is sourced from the external reference clock, the operation is equivalent to ARM710a's Standard bus mode. The following sections explain how these clocks are made and describe their expected usage. In particular, note the addition of a clock multiplier (PLL) in this design.
11-2
ARM810 Clocking
11.1 The Bus Clock
The external bus clock is used to cycle the external bus interface. This clock is sourced directly from external input pins of the device. See Figure 11-1: Generating the external bus interface clock. Bus Clock
11-3
ARM810 Clocking
11.2 The Processor Clock
The processor clock is used to cycle the internals of the processor, see Figure 11-2: Generating the Processor Clock. The processor clock can be sourced by one of two input clock signals to the synchroniser: bus clock fast clock
11-4
ARM810 Clocking
Synchronous operation If the S bit is HIGH, there must be a tightly dened relationship between the bus clock and the fast clock (if this relationship is not obeyed, then the S bit should be set LOW). With the S bit HIGH, the Synchroniser will not perform any synchronisation, and the bus clock may only make transitions on the falling edge of the fast clock. Please refer to Section 15.2 for the timing requirements. Asynchronous Operation If the S bit is LOW, there is no dened relationship between the bus clock and the fast clock - they are asynchronous. The synchroniser introduces a synchronisation penalty whenever the internal core clock switches between the two input clocks (bus clock (M) and fast clock (F)). This penalty is symmetric, and varies between nothing and a whole period of the clock to which the core is synchronising. For example, when changing from the fast clock to the bus clock, the average synchronisation penalty is half a bus clock period, and when changing from the bus clock to the fast clock, it is half a fast clock period.
11-5
ARM810 Clocking
11.3 Generation of the Fast Clock
The fast clock input to the synchroniser can be selected from three sources. These are all congured internally using Coprocessor 15, Register 15, bits 2 and 3: F0 and F1. See 11.5 Low Power Idle and Sleep on page 11-10 further details. During RESET, the bus clock is selected as the initial source for the fast clock.
11-6
ARM810 Clocking
11.3.2 Fast clock from the output of the PLL
This conguration (F0=1, F1=1) makes the output of the PLL clock multiplier the source for the fast clock (see Figure 11-4: Fast clock from the output of the PLL) When operating in this conguration, the S bit must be set LOW for asynchronous operation (S=0).
F0 = 1 F1 = 1 S=0 REFCLK REFCLKCFG[1:0] REFCLK Prescaler PLLCLKIn 1, 2, 4, 8 PLLRANGE PLLCFG[6:0] PLLSLEEP PLL
11-7
ARM810 Clocking
The fast clock output frequency is dened according to the following equation: fFastClock = fPLLCLKIn * M/2 where: fFastClock is the frequency of the fast clock output fPLLCLKIn is the frequency of PLLCLKIn, which is the frequency of REFCLK divided by 1, 2, 4 or 8. M is the value of the PLLCFG bus if interpreted as normal unsigned binary reporesentation. M is dened for the range M = 5, 6, 7 ..., 127. Values of M less than 5 are invalid. The output frequency range of the PLL must reside between certain limits. These limits are determined by the PLLRANGEpin shown in Table 11-2: Output frequency range.
PLLRANGE LOW HIGH Min Fast Clock (MHz) 45 22.5 Max Fast Clock (MHz) 100 50
11-8
ARM810 Clocking
11.4 Forced Processor Clock from the Bus Clock
Coprocessor 15, Register 15, bit 0 (the D bit) is used to override the internal request for external bus signal to the synchroniser (see Figure 11-2: Generating the Processor Clock on page 11-4) and force the processor clock to be sourced from the bus clock. At RESET, the D bit is set LOW, and so the processor clock is sourced from the bus clock until this bit is changed. Once the fast clock source has been congured, and is sufciently stable, the D bit should be set HIGH so the processor runs from the fast clock when not accessing the external bus.
11-9
ARM810 Clocking
11.5 Low Power Idle and Sleep
The D bit (see Section 11.4) can be employed to provide a clean transition to allow low-power idle or sleep mode. As the ARM810 is a fully static processor, stopping its clock when it has no work to do provides an ideal way to minimise power consumption and provide a fast start-up when it needs to operate again - all state is just frozen and does not need to be restored. The easiest means of stopping the processor (and associated system) is to stop the bus clock. To allow the system to be stopped with the processor state at a precisely dened point in program execution, the processor clock must be sourced from the bus clock and the bus clock stopped. This can be achieved by setting the D bit LOW (writing 0 to Coprocessor 15, Register 15, bit 0) and then stopping the bus clock externally. If the fast clock is being generated by the PLL clock multiplier and the PLL is left running while the bus clock is stopped, after restarting the bus clock, the D bit can be set HIGH and the processor clock sourced from an already locked fast clock PLL source. This could be implemented in the system for a fast wake-up-from-sleep interrupt response (though more power is consumed if the PLL is running continuously whilst the rest of the system is stopped). The PLL itself can be placed in Sleep mode (using the PLLSLEEP external input), where it stops running and therefore consuming power. On wake-up, the PLL will take time to lock, and the system must take this into account - more details to be advised in future.
11-10