Evaluation of object formats:

RDF does the job; that's I'm using now. Very simple & compact, low overhead.
RDF allows me to compile modules under Unix or DOS, using NASM.


XCOM (at first glance) is almost as simple as RDF, more extensible, and
designed for a "no-kernel" OS, which is sorta what Retro is.  Whether or not
Retro uses it internally, it looks like a good interchange format for Retro,
Clementine, any other Tunes prototypes, UniOS, BRiX, and similar OS's. Also,
it supports cross-platform object migration.


Also, I may develop my own object format, to be used internally. This is
especially important, once I flesh out my object system and I really start
to diverge from the traditional software concepts. This format should be
even simpler & more compact than RDF or XCOM. It should also support
debugging information (XCOM does, but RDF doesn't).


#1. New module format:
---------------------- 

Goals:
	Compact
	Easy & efficient to link
	Hold a heirarchy of sub-modules

Module header:
	dd	length of module
	dd	CRC/checksum

* Each of the following tables has a header:
	dd	length of table (not including header?)

String Table (ST) - for symbol imports & exports
	db	type - determines what ST:data contains:
		  - Name string (null-terminated)
		  - Persistent Store address (i.e., disk database entry)
		  - Distributed Persistent Store address (w/ net addr)
	db	data - 

Name Table (NT) - list of pointers to all external names
	dd	index into ST

Address Table (AT) - addresses of external symbols
		*TEMPORARY* - created in memory during linking
	dd	address

Code Link Table (CLT) - for external calls (requires relative relocation)
	dd	location to link, relative to DATA
	dd	index into NT

Data Link Table (DLT) - for external memory references (absolute relocation)
	dd	location to link, relative to DATA
	dd	index into NT

Relocation Table (RT) - for internal memory references
	dd	location to link, relative to DATA

Sub-Module Table (SMT) - sub-module export info
	dd	type (function, atomic types, objects, whatever..)
	dd	location, relative to DATA
	dd	size
	dd	name (index into ST)
	dd	parent (index into SMT, or 0 for the main module)

Code & Data (DATA) -
	raw form
		Perhaps code & data should be separated, to allow better
		memory protection? (but no self-modifying code :)


Decoding the module format:
	1. Integrity check

	2. Move code & data, and account for moving it

	3. Import: For each name in NT, lookup the corresponding entry in ST
	   (load it from persistent storage, if necessary), and store the
	   address in AT.

	4. Link code. Perform relative relocation.

	5. Link data. Perform absolute relocation, that is, replace each
	   location with the object's address.

	6. Relocation - adjust internal memory references for new location

	7. Export: Add information from the SMT to the operating system's
	   module tables.


It should be possible to combine or separate modules, as desired.

What if a module is written back to storage & removed from RAM, and another
module in RAM is still referencing it? It should generate a page fault, so
the OS can reload it.

Assessment: This module format would be suitable for transporting a module
to other machines, but the native format should have the modules more
closely integrated with the persistent storage system.


#2. Another object/module format, for persistent storage:
---------------------------------------------------------

In this format, all the 'header information' is contained in the operating
system's object tables, which it uses the manage persistent storage as well
as linking. Only the code/data is separate.

Definitions:
	An _object_ is ONE function or piece of data (variable or constant).

	A _module_ is a set of objects.
		or
	A _module_ is an object and its sub-objects combined

All the objects in a module are stored next to each other in RAM and on
disk; this prevents the Store Manager from having to deal with every little
detail. Objects that naturally belong together should go in the same module.
Actually, the module itself is an object, and the objects within it are its
"children". (Should there be a provision for including non-child objects in
modules?)

In RAM, modules are aligned on 4K pages; on disk, modules are aligned on
512-byte sectors.

Object Tables:

	1. Fixed-length records (stored in the Object Database, ODB):
		Name (index into ST)
		Size
		Type (raw binary, various atomic types, etc...)
		Flags (in RAM, on DISK, module?, relink?, ...)
		Location & size of variable-length records
		RAM location (contiguous in paged memory space)
		DISK location (including a sector table..)
		Parent

	2. Variable-length records (stored in a binary object or file?):
		ST, NT, CLT, DLT, RT (as in previous module format)
		List of sub-objects

	3. Data (stored at the location(s) pointed to by the ODB entry)
		Code and/or data


	Object tables are objects, so they can be swapped out to disk when
	not in use. A heirarchical arrangement will help keep related data
	together, so it can all be swapped out at once.

	(Re-)linking is required when an object is
		a) compiled from source code,
		b) is imported from another machine,
		c) swapped in from disk (if anything changed location)

Normally, *every* variable is counted as an object, with its name stored in
the object tables; this is useful debugging information (like on a Lisp
Machine). To save space, internal object names can be stripped, but
preferably not as a rule, as in Unix -- only for floppy disks or machines
with *very* limited storage.


* Change: The Object Database needs to be hierarchical!
Wait, what about addressing...???
* Local addressing?
aaargh...

We could just keep entire the ODB in RAM at all times (and on disk, for
persistence).. after all, the fxied-length entries will only be about 32
bytes each (that's 128 per page)..

* Change: On further thought, the RDF/XCOM style of link table (where
various types of variable-length records are jumbled together) is better for
a reflective system, because it's easier to add records to an
already-compiled object... and we'll be doing that a lot under a reflective
system.



#3. Persistent Storage, as implemented in Retro 7
-------------------------------------------------

kernel/boot/odb.inc & link.inc
Still not quite settled.

Ideas from Distributed Persistent Storage (below) should help simplify
object addressing, even in non-distributed systems.

Another issue: Versioning!!



Murky waters: Distributed Persistent Storage
--------------------------------------------

This ties in with Distributed Processing.. but you can have one without the
other. (DPS is a prerequisite for *elegant* DP, though)

Addressing is gonna be a bitch! It might not be so difficult if all the
machines on the network have fixed addresses, but that isn't the case with
adaptable, self-managing networks (like my proposed radio network).  I think
I'll need to allow object addresses (in the ODB) of arbitrary length and
format.  I can use a few flags to specify what format it is: RAM, DISK,
numeric IP, mnemonic IP, or some as-yet undreamt of format.  32-bit formats
(RAM, numeric IP..) can fit nicely in the ODB's 'address' field.  Other
formats can allocate space in the string table, with 'address' pointing
there.

	RAM		32-bit linear address (on x86, anyway)
	DISK		drive#, sector# (32b*512B/sect=2TB, ok for PC's)
	IP		32-bit
	IP		string ptr.
	IPX		??
	ethernet	32-bit?
	*redundant*	List of addresses (w/ type/flags for each)

While DPS could be used in the near future on LAN's/clusters, I'm designing
it to meet the needs of *future* networks and *future* parallel computers
(which I'm also working on, as a commercial venture).  My solution won't be
expedient for existing distributed systems; hopefully it will at least be
efficient.


First, I'll experiment with simple fixed-address, single-level LAN's.  That
should be relatively easy.

I'm planning good Internet support... Internet DPS would really simplify
things like installing/upgrading software, managing software development
(what we use CVS for these days)... it would make a great replacement for
the WWW, too :-).  It'll use public-key authentication to identify machines
(like SSH does), with optional encryption.



#4. Improvements
----------------

Each object has two parts: the *Object Header* (what I've been calling the
ODB Entry) and the *Object Data*.  The header contains supervisory
information; it resides in supervisor space (PL 0).  The object data
contains the object itself, which may be code or data; it resides in user
space (PL 3).  Examples of Header information: type/flags, name, size,
address, linking records, parent, children, module...

The Object Headers need not be in an array; they could be in a list, heap,
tree, graph, whatever, all scattered throughout memory.  It doesn't matter
how big each Header.

Abandon memory-mapped schemes; use a high-level explicit scheme.

Explicit buffering: Programs must explicitly request access to an object;
the object will be loaded into a buffer, linked to the program, and remain
buffered until the program ends (or until it explicitly releases the
object).  Actually, the way an object is accessed (direct, buffered, message
passing, network..) is pretty much unknown to the program.  When an object
is written to, it must either be locked (so other programs can't write to
it) or indirectly accessed (through a server that handles multiple write
requests).

1. Interface (high-level, independent of implementation)
	* Request object (specify r/w)
	* Release object
   The programmer shouldn't have to specify these; request/releases should
   be automatically generated by the compiler/interpreter.

2. Implementation

   Note: Objects could be in a heterarchy (i.e., a graph). Therefore, a
   minimum spanning tree could be useful for searching all the objects..

- Pointer swizzling: Pointers to an object header should always trigger a
page fault (because the pages containing header information are never
mapped); the fault handler loads the object into local RAM (if it's not
already there) and changes the pointer to the object's actual address.  It
also keeps track of what pointers are pointing to the object, and if the
object is later removed from RAM, those pointers are changed to point back
to the object's header.


A Kernel Optimization:
	Have a "movable" flag for every object.  Movable objects may be
moved around in memory at any time, for GC purposes, etc.  Immovable objects
stay in the same RAM location from the time the machine is turned on to the
time it's turned off; GC won't touch them.  This allows direct references to
be made where speed is critical.
	Immovable objects are intended mainly for the kernel.  Currently all
objects are immovable and there is no GC.
	Actually, "immovable" objects can be moved or deleted, but only if
all direct references are accounted for.  
	This only applies to local objects in RAM.  An ODB entry for an
external object will always be marked "movable".

WAIT!! Just LOCK any objects that must be accessed directly... won't that
solve everything?


Object ID's:
	OID's can't change when programs are referencing them.  If we're
using the Object Header address as the OID, headers must stay in the same
place.  Allocate a chunk of linear memory for object headers (32 bytes
each).  Store the used parts of this to disk.  Try to group related object
headers in the same page.
	It would help if the OS had a way of knowing when all references to
an object have been "forgotten"... then, the memory occupied by that
object's header can be used for a new object's header.


Small objects:
	It would be inefficient if the object manager had to handle every
single little variable.  A program should have all its small internal
objects (integers, etc) wrapped in larger objects, which are handled by
persistent storage.  The language handles access to these internal objects.


----------------------------------------------------------------------------
                 Comprehensive MM/GC/PS Design for Retro #9
                                   6/5/99
----------------------------------------------------------------------------


                                System Layout
                                ~~~~~~~~~~~~~
high	|
level	| Programs           Dictionaries -- Forth, other languages, etc.
	|          \       /
Store	|            PS/GC
	|          /   |   \
storage	|       MM   Disks   Net -- distributed PS/GC
devices	|      /       |
	|   RAM   IDE/SCSI/etc.


                           PS - Persistent Storage
                           ~~~~~~~~~~~~~~~~~~~~~~~
Object header (ohdr):
	addr		RAM address (0 if not present in RAM)
	extaddr		Address in external storage
			extaddr points to a variable-length record which
			specifies where the object is stored - on disk, on
			another machine, or redundantly (a list of extaddrs)
	size
	flags

Object headers are allocated in a separate region of memory, 16-byte
aligned.  Extaddrs are also allocated and garbage-collected separately, on
disk and in RAM.  All this information is stored locally.  Both of these
regions are memory-mapped to disk, like virtual memory.

Note: Object ID's (OID's) - Hash table: input=OID, output=ohdr

Interface:
	new ( size flags -- obj )
	load(obj)	load object into RAM from external storage
	flush(obj)	write changed RAM pages of object to external storage
			Fault-tolerance: Keep data on disk consistent!
	unload(obj)	flush object & unload from RAM (keep in external storage)
	delete(obj)	completely delete an object

Storage device abstraction:
	type		RAM, Disk, distributed PS...
	total size
	free space
	transfer speed, recent average (i.e., in last 5 minutes)
	latency (ping), recent average


                           MM - Memory Management
                           ~~~~~~~~~~~~~~~~~~~~~~
Structures:
	free physical pages (ordered by size) - Heap?
	free logical pages (ordered by address) - Radix tree (Trie)? RBtree?
	used logical pages (ordered by address) - "
	page tables (CR3, PDEs, PTEs)
		Page flags (in each PTE):
		D dirty
		A accessed
		(3 more flags available for use)

Interface:
	alloc ( size -- addr )
	free ( addr -- )
	realloc ( addr1 size -- addr2 )

Algorithms:
	Heap search for alloc: best-fit with a limited # of loops?

Graphical display program, to observe allocation patterns in practice.
Expect to change MM routines a lot!


                                    Disks
                                    ~~~~~
Interface:
	load ( blocklist addr -- )
	save ( addr -- blocklist )


                           GC - Garbage Collection
                           ~~~~~~~~~~~~~~~~~~~~~~~
Tasks:
	"Collapsing" adjacent blocks of free storage
	Disk defragmenting

Exclusion:
	Cooperate with the scheduler? Allow GC to complete a cycle and call
	task_switch when done. (other interrupts could pose a problem, though)


                                Dictionaries
                                ~~~~~~~~~~~~

Structure: Patricia tree? Hash table?
	name
	type/flags
	linkages

Interface:
	insert ( object name flags linkages -- )
	search ( name -- object flags )
	change ( object flags -- )
