gvisor/pkg/state
Jamie Liu 227fd9f1b0 //pkg/state fixes for VFS2.
- When encodeState.resolve() determines that the resolved reflect.Value is
  contained by a previously-resolved object, set wire.Ref.Type to the
  containing object's type (existing.obj.Type()) rather than the contained
  value's type (obj.Type()).

- When encodeState.resolve() determines that the resolved reflect.Value
  contains a previously-resolved object, handle cases where the new object
  contains *multiple* previously-resolved objects. (This may cause
  previously-allocated object IDs to become unused; to facilitate this, change
  encodeState.pending to a map, and change the wire format to prefix each
  object with its object ID.)

- Add encodeState.encodedStructs to avoid redundant encoding of structs, since
  deduplication of objects via encodeState.resolve() doesn't work for objects
  instantiated by StateSave() and passed to SaveValue() (i.e. fields tagged
  `state:".(whatever)"`).

- Make unexported array fields deserializable via slices that refer to them by
  casting away their unexportedness in decodeState.decodeObject().

Updates #1663

PiperOrigin-RevId: 338727687
2020-10-23 12:53:20 -07:00
..
pretty //pkg/state fixes for VFS2. 2020-10-23 12:53:20 -07:00
statefile Support for saving pointers to fields in the state package. 2020-06-23 23:34:06 -07:00
tests //pkg/state fixes for VFS2. 2020-10-23 12:53:20 -07:00
wire Support for saving pointers to fields in the state package. 2020-06-23 23:34:06 -07:00
BUILD //pkg/state fixes for VFS2. 2020-10-23 12:53:20 -07:00
README.md Support for saving pointers to fields in the state package. 2020-06-23 23:34:06 -07:00
decode.go //pkg/state fixes for VFS2. 2020-10-23 12:53:20 -07:00
decode_unsafe.go //pkg/state fixes for VFS2. 2020-10-23 12:53:20 -07:00
encode.go //pkg/state fixes for VFS2. 2020-10-23 12:53:20 -07:00
encode_unsafe.go Support for saving pointers to fields in the state package. 2020-06-23 23:34:06 -07:00
state.go //pkg/state fixes for VFS2. 2020-10-23 12:53:20 -07:00
state_norace.go Support for saving pointers to fields in the state package. 2020-06-23 23:34:06 -07:00
state_race.go Support for saving pointers to fields in the state package. 2020-06-23 23:34:06 -07:00
stats.go Support for saving pointers to fields in the state package. 2020-06-23 23:34:06 -07:00
types.go Add basic stateify annotations. 2020-09-24 10:13:04 -07:00

README.md

State Encoding and Decoding

The state package implements the encoding and decoding of data structures for go_stateify. This package is designed for use cases other than the standard encoding packages, e.g. gob and json. Principally:

  • This package operates on complex object graphs and accurately serializes and restores all relationships. That is, you can have things like: intrusive pointers, cycles, and pointer chains of arbitrary depths. These are not handled appropriately by existing encoders. This is not an implementation flaw: the formats themselves are not capable of representing these graphs, as they can only generate directed trees.

  • This package allows installing order-dependent load callbacks and then resolves that graph at load time, with cycle detection. Similarly, there is no analogous feature possible in the standard encoders.

  • This package handles the resolution of interfaces, based on a registered type name. For interface objects type information is saved in the serialized format. This is generally true for gob as well, but it works differently.

Here's an overview of how encoding and decoding works.

Encoding

Encoding produces a statefile, which contains a list of chunks of the form (header, payload). The payload can either be some raw data, or a series of encoded wire objects representing some object graph. All encoded objects are defined in the wire subpackage.

Encoding of an object graph begins with encodeState.Save.

1. Memory Map & Encoding

To discover relationships between potentially interdependent data structures (for example, a struct may contain pointers to members of other data structures), the encoder first walks the object graph and constructs a memory map of the objects in the input graph. As this walk progresses, objects are queued in the pending list and items are placed on the deferred list as they are discovered. No single object will be encoded multiple times, but the discovered relationships between objects may change as more parts of the overall object graph are discovered.

The encoder starts at the root object and recursively visits all reachable objects, recording the address ranges containing the underlying data for each object. This is stored as a segment set (addrSet), mapping address ranges to the of the object occupying the range; see encodeState.values. Note that there is special handling for zero-sized types and map objects during this process.

Additionally, the encoder assigns each object a unique identifier which is used to indicate relationships between objects in the statefile; see objectID in encode.go.

2. Type Serialization

The enoder will subsequently serialize all information about discovered types, including field names. These are used during decoding to reconcile these types with other internally registered types.

3. Object Serialization

With a full address map, and all objects correctly encoded, all object encodings are serialized. The assigned objectIDs aren't explicitly encoded in the statefile. The order of object messages in the stream determine their IDs.

Example

Given the following data structure definitions:

type system struct {
    o *outer
    i *inner
}

type outer struct {
    a  int64
    cn *container
}

type container struct {
    n    uint64
    elem *inner
}

type inner struct {
    c    container
    x, y uint64
}

Initialized like this:

o := outer{
    a: 10,
    cn: nil,
}
i := inner{
    x: 20,
    y: 30,
    c: container{},
}
s := system{
    o: &o,
    i: &i,
}

o.cn = &i.c
o.cn.elem = &i

Encoding will produce an object stream like this:

g0r1 = struct{
     i: g0r3,
     o: g0r2,
}
g0r2 = struct{
     a: 10,
     cn: g0r3.c,
}
g0r3 = struct{
     c: struct{
             elem: g0r3,
             n: 0u,
     },
     x: 20u,
     y: 30u,
}

Note how g0r3.c is correctly encoded as the underlying container object for inner.c, and how the pointer from outer.cn points to it, despite system.i being discovered after the pointer to it in system.o.cn. Also note that decoding isn't strictly reliant on the order of encoded object stream, as long as the relationship between objects are correctly encoded.

Decoding

Decoding reads the statefile and reconstructs the object graph. Decoding begins in decodeState.Load. Decoding is performed in a single pass over the object stream in the statefile, and a subsequent pass over all deserialized objects is done to fire off all loading callbacks in the correctly defined order. Note that introducing cycles is possible here, but these are detected and an error will be returned.

Decoding is relatively straight forward. For most primitive values, the decoder constructs an appropriate object and fills it with the values encoded in the statefile. Pointers need special handling, as they must point to a value allocated elsewhere. When values are constructed, the decoder indexes them by their objectIDs in decodeState.objectsByID. The target of pointers are resolved by searching for the target in this index by their objectID; see decodeState.register. For pointers to values inside another value (fields in a pointer, elements of an array), the decoder uses the accessor path to walk to the appropriate location; see walkChild.