FreeMiNT's new low level block cache
====================================
last update: 1998-11-23
Author: Frank Naumann <fnaumann@cs.uni-magdeburg.de>
notes:
I. Introduction
---------------
FreeMiNT 1.15 has a new global block cache. It's currently
used from the NEWFATFS and MinixFS 0.70.
The cache is global and does most things automatically.
It's very easy to support it and reduces also
programming overhead. For example, I added new block
cache support in MinixFS. For this I completely removed
the existing cache management in MinixFS and replaced
most of the calls read/write buffered blocks. This
reduces the binary size from 39 kb to 26 kb. Also the
cache management is very efficient and speeds up some
operations on MinixFS (I made some tests with
MinixFS 0.60 and MinixFS 0.70).
The cache can be increased at boot time with
the configuration keyword "CACHE=<size in kb>" in MiNT
configuration file.
For example: "CACHE=500" sets the cache to a size of
500 kb (if enough memory is available).
Default cache size is 100 kb. It's recommended to increase
the cache if you use many MinixFS 0.70 and NEWFATFS
partitions. Currently, the cache is first allocated from
TT-RAM and then from ST-RAM.
The cache is static. But if in the future
the cache becomes dynamic, all xfs that support the
new cache management, will remain compatible and actually will
support any improvements.
Note for removable medias: the cache automatically locks the
drive if there are unwritten sectors in cache.
II. Definition
--------------
call conventions:
- all arguments are on the stack
- return value is stored in d0
(cdecl call)
return value conventions:
- negative return values are ATARI error codes
- E_OK for succes
type conventions:
char 8 bit signed
unsigned char 8 bit unsigned
short 16 bit signed integer
unsigned short 16 bit unsigned integer
long 32 bit signed integer
unsigned long 32 bit unsigned integer
llong 64 bit signed integer
ullong 64 bit unsigned integer
with:
typedef struct { long hi; unsigned long low; } llong;
typedef struct { unsigned long hi; unsigned long low; } ullong;
III. interface description
--------------------------
1. introduction
===============
For the interface you need include/block_IO.h and some of the updated
FreeMiNT header files.
The kernel structure that is passed to a loadable XFS is extended
with a pointer to the block_IO functions.
See in MinixFS 0.70 for an example (kernel.h, main.c). The
pointer is valid since FreeMiNT 1.15.0. This must be
checked first before a XFS dereferences the pointer.
The block_IO function is a structure that contains various
data fields and function pomiter:
typedef struct
{
short version; /* buffer cache version */
short reserved; /* reserved for future */
long (*config) (const ushort drv, const long config, const long mode);
/* config: */
# define BIO_WP 1 /* configuring writeprotect feature */
# define BIO_WB 2 /* configuring writeback mode */
# define BIO_MAX_BLOCK 10 /* maximum cacheable blocksize */
# define BIO_DEBUGLOG 100 /* only for debugging, kernel internal */
# define BIO_DEBUG_T 101 /* only for debugging, kernel internal */
/* DI management */
DI * (*get_di) (ushort drv);
DI * (*res_di) (ushort drv);
void (*free_di) (DI *di);
/* physical/logical calculation init */
void (*set_pshift) (DI *di, ulong physical);
void (*set_lshift) (DI *di, ulong logical);
/* cached block I/O */
UNIT * (*lookup) (DI *di, long sector, long blocksize);
UNIT * (*getunit) (DI *di, long sector, long blocksize);
UNIT * (*read) (DI *di, long sector, long blocksize);
long (*write) (UNIT *u);
long (*l_read) (DI *di, long sector, long blocks, long blocksize, void *buf);
long (*l_write) (DI *di, long sector, long blocks, long blocksize, void *buf);
/* optional feature */
void (*pre_read) (DI *di, long *sector, long blocks, long blocksize);
/* synchronization */
void (*lock) (UNIT *u);
void (*unlock) (UNIT *u);
/* update functions */
void (*mark_modified)(UNIT *u);
void (*sync_drv) (DI *di);
/* cache management */
long (*validate) (DI *di, long maxblocksize);
void (*invalidate) (DI *di);
long res[6]; /* reserved for future */
} BIO;
The first thing is to check the block_IO version number. It's not guranted
that later versions are fully compatible.
This description refers to version 3 of the block_IO
interface.
--------------------------------------------------------------
The interface is designed to make your life easier. It maps automatically
all calls through XHDI or BIOS for example. It's also possible to
cache non BIOS devices. The block_IO maps logical sizes to
physical sizes automatically. Simple call set_lshift to
specify the logical format.
Conditions of use:
------------------
- the xfs only calls the block_IO functions for data I/O
- the xfs is fully reentrant
- the xfs don't modify data structures of the block_IO
module
- logical/physical translation only works for logical >= physical
All communications with the block_IO module goes through a so called
device identificator or DI:
typedef struct di DI;
/* device identificator */
struct di
{
DI *next; /* internal: next in linked list */
UNIT **table; /* internal: unit hash table */
UNIT *wb_queue; /* internal: writeback queue */
const ushort drv; /* internal: BIOS device number (unique) */
ushort major; /* XHDI */
ushort minor; /* XHDI */
ushort mode; /* internal: some flags */
# define BIO_WP_MODE 0x01 /* writeprotect bit (soft/hard) */
# define BIO_WB_MODE 0x02 /* writeback bit (soft) */
# define BIO_REMOVABLE 0x04 /* removable media */
# define BIO_LRECNO 0x10 /* lrecno supported */
ulong start; /* physical start sector */
ulong size; /* physical sectors */
ulong pssize; /* internal: physical sector size */
ushort pshift; /* internal: size to count calculation */
ushort lshift; /* internal: logical to physical recno calculation */
long (*rwabs)(DI *di, ushort rw, void *buf, ulong size, ulong lrecno);
long (*dskchng)(DI *di);
ushort valid; /* internal: DI valid */
ushort lock; /* internal: DI in use */
char id[4]; /* partition id (GEM, BGM, RAW, \0D6, ...) */
ushort key; /* XHDI key */
char res[18]; /* reserved for future */
};
2. DI handling
==============
The first thing to do is to get a DI. This is best placed in the root function of the xfs.
There are three functions for DI handling:
get_di():
---------
-
return: - a valid DI
- NULL if this DI is locked or not accessible through XHDI/BIOS
res_di():
---------
- reserves the DI, same as the previous function but
doesn't do anything except to lock the DI
- used for non-BIOS devices
- the xfs *must* fill out some data fields:
start, size, pssize, rwabs, dskchng
- pshift & lshift must also be called for a successful
initialization
return: - valid DI
- NULL if this DI is already locked (in use).
free_di():
----------
- unlock this DI, after this call the DI becomes invalid
and can't be used anymore
return: nothing
NOTE:
-----
After get/res_di() the DI for this device becomes locked
and is never returned by get/res_di() until it is unlocked
with free_di()
After get_di() logical to physical mapping is set to 1:1.
If you work with logical sizes you must call set_lshift to adjust the mapping.
After res_di() pssize is set to 512 and logical = physical.
3. logical/physical translation
===============================
set_pshift():
-------------
- sets physical sector size and adjusts shift values
(shift values are used for fast calculations)
It's not recommended to use this function in combination with get_di()
because the physical sector size is automatically determined through XHDI.
It will also create problems with XHDI/BIOS rwabs() wrapper.
set_lshift():
-------------
- sets logical sector size and adjusts shift values
If you always work with groups of sectors you can specify
this size.
For example, useful for TOS FAT filesystems that work with
logical sector sizes and clusters. Also used by the
MinixFS. MinixFS always works with blocks of 1024 bytes.
After this function, all block_IO calls map automatically
the given parameter to physical parameter.
NOTE:
-----
pshift/lshift in the DI structure are very sensitive and important values.
A mistake here will directly cause problems on the
corresponding device.
Bad written sectors for example.
Also start/size/pssize/pshift/lshift in the DI structure are used for
validation, cache consistency and so on. If you control
those variables by yourself (non-BIOS device -> res_di())
those values must be right.
Never set pshift/lshift directly, always use the
corresponding functions set_pshift() and set_lshift().
4. reading and writing
======================
lookup():
---------
- checks if a block is in the cache
return: - a ptr to the UNIT
- NULL if the UNIT is not in cache
getunit():
----------
- allocates a new cache UNIT for the given startsector
- useful for write only data
- checks with lookup() if the UNIT is already in the cache
return: - a ptr to the new UNIT, the data area is not cleared
- NULL if no free cache UNIT is found or any other error
read():
-------
- same as getunit but read the corresponding block
into the UNIT
- checks with lookup() if the UNIT is already in the cache
return: - a ptr to the new UNIT
- NULL for any error (read error, no free UNIT in cache)
write():
--------
- mark this UNIT as dirty in writeback mode
- write this UNIT back in writethrough mode
return: - E_OK or the Rwabs error number
l_read():
---------
- large read; reads a block directly to the buffer
- only useful for large blocks (to reduce I/O overhead)
- block_IO automatically syncs large transfers with existing
cached units (cache consistency)
return: - E_OK or Rwabs error number
l_write():
----------
- large write; write a block directly from the buffer
- mostly useful for large blocks (to reduce I/O overhead)
- also cache consistence is guranted
- small blocks will automatically be buffered
return: - E_OK or Rwabs error number
pre_read():
-----------
- not implemented at the moment
NOTE:
-----
read/write/l_read/l_write/pre_read can block the active
application until the transfer is done (background DMA).
That's why your xfs must be reentrant.
A UNIT is valid until the *next* block_IO call. It's possible to lock
UNITs. It's not allowed that an interrupt handler call the block_IO
module. A taskswitch never occurs if the we are in kernel mode.
5. synchronization
==================
lock():
-------
- increments the lock counter for the UNIT
unlock():
---------
- decrements the lock counter
NOTE:
-----
A locked UNIT is never invalidated. Useful for open directories and such
things if pointer references left. But be careful, this
slows down the search algorithm. Also the cache run out of
free UNITS if there are a lot of locked UNITS. A locked
UNIT must be unlocked, otherwise the memory is lost.
6. update
=========
mark_modified():
----------------
- marks a UNIT as modified; this action inserts the UNIT in
the writeback queue but doesn't writeback anything
- if the UNIT is already marked no action is performed
return: nothing, always successful
sync_drv():
-----------
- writes back all dirty UNITS of the specified DI
return: nothing, always successful
NOTE:
-----
It's strongly recommended to first mark all modified UNITS
as dirty and then write back all with sync_drv(). There is
a write back optimization that will reduce a lot of I/O
overhead in this case.
It's also strongly recommended to use the inline function: bio_MARK_MODIFIED()
instead of bio_mark_modified(). The inline function
first checks if the UNIT is already marked and call
mark_modified only if the UNIT is clean. This will
reduce function calls that are not necessary. Useful in write back mode.
Supporting user configurable Writeback mode is very easy.
The only thing to do is to use the inline function
bio_SYNC_DRV() instead of sync_drv().
bio_SYNC_DRV() checks if this drive is in WriteThrough
mode, if yes it calls sync_drv, otherwise nothing happens
(= WriteBack). Also Dcntl(V_CNTR_WB) must be supported.
Dcntl(V_CNTR_WB) only calls config() to change the
writeback bit. Take a look in the MinixFS source for an
example.
sync_drv() can also block the active application.
7. cache management
===================
validate():
-----------
- checks the given block size with the internal maximum
block size limit
return: - E_OK if those blocks sizes are supported
- ENSMEM if the block size is larger than the internal limit
invalidate():
-------------
- invalidates all cache UNITS for the given DI
NOTE:
-----
invalidate() does not free the DI, it only removes all
cache UNITS of this DI.
invalidate() also removes all modified UNITS. Those UNITS
are never written back by invalidate().
8. helper
=========
config():
---------
- internal configuration and information:
return the maximum block size for config = BIO_MAX_BLOCK (10)
change WriteBack mode for the given drv if config = BIO_WB (2)
to mode (ENABLED/DISABLED)
|