1 Custom DataGrids
Sentimentron edited this page 2014-09-30 14:07:04 -07:00

The default DenseInstances type may not meet your application's needs. Fortunately, the DenseInstances type implements a FixedDataGrid, meaning it's easy to adapt for your own needs.

Interfaces

Here's the DataGrid for in golearn 0.1:

type DataGrid interface {
    // Retrieves a given Attribute's specification
    GetAttribute(Attribute) (AttributeSpec, error)
    // Retrieves details of every Attribute
    AllAttributes() []Attribute
    // Marks an Attribute as a class Attribute
    AddClassAttribute(Attribute) error
    // Unmarks an Attribute as a class Attribute
    RemoveClassAttribute(Attribute) error
    // Returns details of all class Attributes
    AllClassAttributes() []Attribute
    // Gets the bytes at a given position or nil
    Get(AttributeSpec, int) []byte
    // Convenience function for iteration.
    MapOverRows([]AttributeSpec, func([][]byte, int) (bool, error)) error
}

FixedDataGrid adds a few extra methods.

type FixedDataGrid interface {
    DataGrid
    // Returns a string representation of a given row
    RowString(int) string
    // Returns the number of Attributes and rows currently allocated
    Size() (int, int)
}

Refer to the automatically up-to-date documentation for more recent versions of GoLearn.

Functional description

GetAttribute

Attribute implementations in GoLearn describe features of the machine learning problem. As of GoLearn 0.1, the implementations that exist as part of base are CategoricalAttribute and FloatAttributes (both 64-bits), as well as BinaryAttribute. AttributeSpec structures link an Attribute to a implementation-specific idea of where the data underlying a given Attribute is located in memory. An example of their use in DenseInstances is to store the column offset. DataGrid implementations outside of base won't be able to add additional fields to an AttributeSpec but they can:

  • Maintain local map[AttributeSpec]int structures to offer fast resolution.
  • Extend AttributeSpec to add additional fields (untested).

When deciding which AttributeSpec to return, implementations should use strict equality (using Attribute.Equals, otherwise odd problems (like CategoricalAttributes having corrupted orderings) might cause odd behaviour.

AllAttributes

Simply returns a copy of all of the available Attributes. This is used for determining compatibility with other DataGrid implementations, and is usually a precursor to GetAttribute calls. It should occur in a fixed order.

AddClassAttribute

Each DataGrid implementation keeps track of which Attributes are designated class variables. Normally, this is done using a map[Attribute]bool structure.

RemoveClassAttribute

A call to this method means that the argument should no longer appear in calls to AllClassAttributes.

AllClassAttributes

This method returns every Attribute designated as a class Attribute via previous calls to AddClassAttribute.

Get

This method takes an AttributeSpec and a row number and returns a slice of bytes (which can be converted to another value using Attribute-specific methods). At least one byte should be returned.

MapOverRows

This allows algorithms to iterate over all the rows in the DataGrid in whichever order is convenient for the underlying implementation. The first argument is a slice of AttributeSpec structures describing which fields are needed. The second argument is a function pointer which takes two arguments. The first argument of the function pointer is a slice of byte slices containing all of the binary on a given row. The second argument is a row number. The return values are a boolean saying whether the inner algorithm has terminated, and an optional error if the inner algorithm terminated with an error.

RowString

FixedDataGrid adds a RowString method for easier inspection. The argument is the row number to be printed.

Size

Size returns the current dimensions of the FixedDataGrid. The first value returned is the number of Attributes, the second value is the number of rows.