Created Custom DataGrids (markdown)
parent
b09b02548a
commit
1784b6b546
|
@ -0,0 +1,68 @@
|
|||
The default `DenseInstances` type may not meet your application's needs. Fortunately, the `DenseInstances` type implements a `FixedDataGrid`, meaning it's easy to adapt for your own needs.
|
||||
|
||||
## Interfaces
|
||||
Here's the `DataGrid` for in golearn 0.1:
|
||||
```go
|
||||
type DataGrid interface {
|
||||
// Retrieves a given Attribute's specification
|
||||
GetAttribute(Attribute) (AttributeSpec, error)
|
||||
// Retrieves details of every Attribute
|
||||
AllAttributes() []Attribute
|
||||
// Marks an Attribute as a class Attribute
|
||||
AddClassAttribute(Attribute) error
|
||||
// Unmarks an Attribute as a class Attribute
|
||||
RemoveClassAttribute(Attribute) error
|
||||
// Returns details of all class Attributes
|
||||
AllClassAttributes() []Attribute
|
||||
// Gets the bytes at a given position or nil
|
||||
Get(AttributeSpec, int) []byte
|
||||
// Convenience function for iteration.
|
||||
MapOverRows([]AttributeSpec, func([][]byte, int) (bool, error)) error
|
||||
}
|
||||
```
|
||||
|
||||
`FixedDataGrid` adds a few extra methods.
|
||||
```go
|
||||
type FixedDataGrid interface {
|
||||
DataGrid
|
||||
// Returns a string representation of a given row
|
||||
RowString(int) string
|
||||
// Returns the number of Attributes and rows currently allocated
|
||||
Size() (int, int)
|
||||
}
|
||||
```
|
||||
|
||||
[Refer to the automatically up-to-date documentation for more recent versions of GoLearn.](https://godoc.org/github.com/sjwhitworth/golearn/base#DataGrid)
|
||||
|
||||
## Functional description
|
||||
|
||||
### `GetAttribute`
|
||||
`Attribute` implementations in GoLearn describe features of the machine learning problem. As of GoLearn 0.1, the implementations that exist as part of base are `CategoricalAttribute` and `FloatAttributes` (both 64-bits), as well as `BinaryAttribute`. `AttributeSpec` structures link an `Attribute` to a implementation-specific idea of where the data underlying a given `Attribute` is located in memory. An example of their use in `DenseInstances` is to store the column offset. `DataGrid` implementations outside of `base` won't be able to add additional fields to an `AttributeSpec` but they can:
|
||||
* Maintain local `map[AttributeSpec]int` structures to offer fast resolution.
|
||||
* Extend `AttributeSpec` to add additional fields (untested).
|
||||
|
||||
When deciding which AttributeSpec to return, implementations should use strict equality (using `Attribute.Equals`, otherwise odd problems (like `CategoricalAttributes` having corrupted orderings) might cause odd behaviour.
|
||||
|
||||
### `AllAttributes`
|
||||
Simply returns a copy of all of the available `Attributes`. This is used for determining compatibility with other `DataGrid` implementations, and is usually a precursor to `GetAttribute` calls. It should occur in a fixed order.
|
||||
|
||||
### `AddClassAttribute`
|
||||
Each `DataGrid` implementation keeps track of which `Attribute`s are designated class variables. Normally, this is done using a `map[Attribute]bool` structure.
|
||||
|
||||
### `RemoveClassAttribute`
|
||||
A call to this method means that the argument should no longer appear in calls to `AllClassAttributes`.
|
||||
|
||||
### `AllClassAttributes`
|
||||
This method returns every `Attribute` designated as a class `Attribute` via previous calls to `AddClassAttribute`.
|
||||
|
||||
### `Get`
|
||||
This method takes an `AttributeSpec` and a row number and returns a slice of bytes (which can be converted to another value using `Attribute`-specific methods). At least one byte should be returned.
|
||||
|
||||
### `MapOverRows`
|
||||
This allows algorithms to iterate over all the rows in the `DataGrid` in whichever order is convenient for the underlying implementation. The first argument is a slice of `AttributeSpec` structures describing which fields are needed. The second argument is a function pointer which takes two arguments. The first argument of the function pointer is a slice of byte slices containing all of the binary on a given row. The second argument is a row number. The return values are a boolean saying whether the inner algorithm has terminated, and an optional error if the inner algorithm terminated with an error.
|
||||
|
||||
### `RowString`
|
||||
`FixedDataGrid` adds a `RowString` method for easier inspection. The argument is the row number to be printed.
|
||||
|
||||
### `Size`
|
||||
`Size` returns the current dimensions of the `FixedDataGrid`. The first value returned is the number of Attributes, the second value is the number of rows.
|
Loading…
Reference in New Issue