159 lines
5.3 KiB
Markdown
159 lines
5.3 KiB
Markdown
# State Encoding and Decoding
|
|
|
|
The state package implements the encoding and decoding of data structures for
|
|
`go_stateify`. This package is designed for use cases other than the standard
|
|
encoding packages, e.g. `gob` and `json`. Principally:
|
|
|
|
* This package operates on complex object graphs and accurately serializes and
|
|
restores all relationships. That is, you can have things like: intrusive
|
|
pointers, cycles, and pointer chains of arbitrary depths. These are not
|
|
handled appropriately by existing encoders. This is not an implementation
|
|
flaw: the formats themselves are not capable of representing these graphs,
|
|
as they can only generate directed trees.
|
|
|
|
* This package allows installing order-dependent load callbacks and then
|
|
resolves that graph at load time, with cycle detection. Similarly, there is
|
|
no analogous feature possible in the standard encoders.
|
|
|
|
* This package handles the resolution of interfaces, based on a registered
|
|
type name. For interface objects type information is saved in the serialized
|
|
format. This is generally true for `gob` as well, but it works differently.
|
|
|
|
Here's an overview of how encoding and decoding works.
|
|
|
|
## Encoding
|
|
|
|
Encoding produces a `statefile`, which contains a list of chunks of the form
|
|
`(header, payload)`. The payload can either be some raw data, or a series of
|
|
encoded wire objects representing some object graph. All encoded objects are
|
|
defined in the `wire` subpackage.
|
|
|
|
Encoding of an object graph begins with `encodeState.Save`.
|
|
|
|
### 1. Memory Map & Encoding
|
|
|
|
To discover relationships between potentially interdependent data structures
|
|
(for example, a struct may contain pointers to members of other data
|
|
structures), the encoder first walks the object graph and constructs a memory
|
|
map of the objects in the input graph. As this walk progresses, objects are
|
|
queued in the `pending` list and items are placed on the `deferred` list as they
|
|
are discovered. No single object will be encoded multiple times, but the
|
|
discovered relationships between objects may change as more parts of the overall
|
|
object graph are discovered.
|
|
|
|
The encoder starts at the root object and recursively visits all reachable
|
|
objects, recording the address ranges containing the underlying data for each
|
|
object. This is stored as a segment set (`addrSet`), mapping address ranges to
|
|
the of the object occupying the range; see `encodeState.values`. Note that there
|
|
is special handling for zero-sized types and map objects during this process.
|
|
|
|
Additionally, the encoder assigns each object a unique identifier which is used
|
|
to indicate relationships between objects in the statefile; see `objectID` in
|
|
`encode.go`.
|
|
|
|
### 2. Type Serialization
|
|
|
|
The enoder will subsequently serialize all information about discovered types,
|
|
including field names. These are used during decoding to reconcile these types
|
|
with other internally registered types.
|
|
|
|
### 3. Object Serialization
|
|
|
|
With a full address map, and all objects correctly encoded, all object encodings
|
|
are serialized. The assigned `objectID`s aren't explicitly encoded in the
|
|
statefile. The order of object messages in the stream determine their IDs.
|
|
|
|
### Example
|
|
|
|
Given the following data structure definitions:
|
|
|
|
```go
|
|
type system struct {
|
|
o *outer
|
|
i *inner
|
|
}
|
|
|
|
type outer struct {
|
|
a int64
|
|
cn *container
|
|
}
|
|
|
|
type container struct {
|
|
n uint64
|
|
elem *inner
|
|
}
|
|
|
|
type inner struct {
|
|
c container
|
|
x, y uint64
|
|
}
|
|
```
|
|
|
|
Initialized like this:
|
|
|
|
```go
|
|
o := outer{
|
|
a: 10,
|
|
cn: nil,
|
|
}
|
|
i := inner{
|
|
x: 20,
|
|
y: 30,
|
|
c: container{},
|
|
}
|
|
s := system{
|
|
o: &o,
|
|
i: &i,
|
|
}
|
|
|
|
o.cn = &i.c
|
|
o.cn.elem = &i
|
|
|
|
```
|
|
|
|
Encoding will produce an object stream like this:
|
|
|
|
```
|
|
g0r1 = struct{
|
|
i: g0r3,
|
|
o: g0r2,
|
|
}
|
|
g0r2 = struct{
|
|
a: 10,
|
|
cn: g0r3.c,
|
|
}
|
|
g0r3 = struct{
|
|
c: struct{
|
|
elem: g0r3,
|
|
n: 0u,
|
|
},
|
|
x: 20u,
|
|
y: 30u,
|
|
}
|
|
```
|
|
|
|
Note how `g0r3.c` is correctly encoded as the underlying `container` object for
|
|
`inner.c`, and how the pointer from `outer.cn` points to it, despite `system.i`
|
|
being discovered after the pointer to it in `system.o.cn`. Also note that
|
|
decoding isn't strictly reliant on the order of encoded object stream, as long
|
|
as the relationship between objects are correctly encoded.
|
|
|
|
## Decoding
|
|
|
|
Decoding reads the statefile and reconstructs the object graph. Decoding begins
|
|
in `decodeState.Load`. Decoding is performed in a single pass over the object
|
|
stream in the statefile, and a subsequent pass over all deserialized objects is
|
|
done to fire off all loading callbacks in the correctly defined order. Note that
|
|
introducing cycles is possible here, but these are detected and an error will be
|
|
returned.
|
|
|
|
Decoding is relatively straight forward. For most primitive values, the decoder
|
|
constructs an appropriate object and fills it with the values encoded in the
|
|
statefile. Pointers need special handling, as they must point to a value
|
|
allocated elsewhere. When values are constructed, the decoder indexes them by
|
|
their `objectID`s in `decodeState.objectsByID`. The target of pointers are
|
|
resolved by searching for the target in this index by their `objectID`; see
|
|
`decodeState.register`. For pointers to values inside another value (fields in a
|
|
pointer, elements of an array), the decoder uses the accessor path to walk to
|
|
the appropriate location; see `walkChild`.
|