Documentation
¶
Overview ¶
Package marisa contains bindings for the marisa-trie library.
Example (Load) ¶
trie, err := marisa.Open("words.dat")
if err != nil {
panic(err)
}
fmt.Println(trie)
Example (Marshal) ¶
var trie marisa.Trie
if err := trie.Build(slices.Values(testdata.Words), marisa.Config{}); err != nil {
panic(err)
}
buf, err := trie.MarshalBinary()
if err != nil {
panic(err)
}
fmt.Println(trie.Size(), trie.DiskSize(), len(buf))
if err := trie.UnmarshalBinary(buf); err != nil {
panic(err)
}
fmt.Println(trie.Size())
Output: 466550 1413352 1413352 466550
Example (Save) ¶
var trie marisa.Trie
if err := trie.Build(slices.Values(testdata.Words), marisa.Config{}); err != nil {
panic(err)
}
f, err := os.Create("words.dat")
if err != nil {
panic(err)
}
defer f.Close()
if _, err := trie.WriteTo(f); err != nil {
panic(err)
}
if err := f.Close(); err != nil {
panic(err)
}
Index ¶
- Constants
- func Initialize() error
- type CacheLevel
- type Config
- type Key
- type NodeOrder
- type TailMode
- type Trie
- func (t *Trie) AppendBinary(b []byte) ([]byte, error)
- func (t *Trie) Build(keys iter.Seq[string], cfg Config) error
- func (t *Trie) BuildWeights(keys iter.Seq2[string, float32], cfg Config) error
- func (t *Trie) CommonPrefixSearch(query string, limit int) ([]Key, error)
- func (t *Trie) CommonPrefixSearchSeq(query string) func(*error) iter.Seq2[uint32, string]
- func (t *Trie) DiskSize() uint32
- func (t *Trie) Dump(limit int) ([]Key, error)
- func (t *Trie) DumpSeq() func(*error) iter.Seq2[uint32, string]
- func (t *Trie) Lookup(key string) (uint32, bool, error)
- func (t *Trie) MapFile(f *os.File, offset int64, length int64) error
- func (t *Trie) MarshalBinary() ([]byte, error)
- func (t *Trie) NodeOrder() NodeOrder
- func (t *Trie) NumNodes() uint32
- func (t *Trie) NumTries() uint32
- func (t *Trie) PredictiveSearch(query string, limit int) ([]Key, error)
- func (t *Trie) PredictiveSearchSeq(query string) func(*error) iter.Seq2[uint32, string]
- func (t *Trie) ReadFrom(r io.Reader) (int64, error)
- func (t *Trie) ReverseLookup(id uint32) (string, bool, error)
- func (t *Trie) Size() uint32
- func (t *Trie) String() string
- func (t *Trie) TailMode() TailMode
- func (t *Trie) TotalSize() uint32
- func (t *Trie) UnmarshalBinary(b []byte) error
- func (t *Trie) WriteTo(w io.Writer) (int64, error)
Examples ¶
Constants ¶
const ( MinNumTries = int(_MARISA_MIN_NUM_TRIES) MaxNumTries = int(_MARISA_MAX_NUM_TRIES) )
NumTries specifies the number of tries to use. Usually, more tries make a dictionary space-efficient but time-inefficient.
The limits are an implementation detail.
Variables ¶
This section is empty.
Functions ¶
func Initialize ¶
func Initialize() error
Initialize compiles the wasm binary.
This is called implicitly when Trie is used for the first time.
Types ¶
type CacheLevel ¶
type CacheLevel int
CacheLevel specifies the cache size. A larger cache enables faster search but takes a more space.
const ( HugeCache CacheLevel = iota + 1 LargeCache NormalCache SmallCache TinyCache )
func (CacheLevel) String ¶ added in v0.0.2
func (c CacheLevel) String() string
type Config ¶
type Config struct {
NumTries int
CacheLevel CacheLevel
TailMode TailMode
NodeOrder NodeOrder
}
Config specifies options for a dictionary. Any unspecified options will be set to their default.
type NodeOrder ¶
type NodeOrder int
NodeOrder specifies the arrangement of nodes, which affects the time cost of matching and the order of predictive search.
const ( // LabelOrder arranges nodes in ascending label order. LabelOrder is useful // if an application needs to predict keys in label order. LabelOrder NodeOrder = iota + 1 // WeightOrder arranges nodes in descending weight order. WeightOrder is // generally a better choice because it enables faster matching. WeightOrder )
type TailMode ¶
type TailMode int
TailMode specifies the kind of TAIL implementation.
const ( // TextTail merges last labels as zero-terminated strings. So, it is // available if and only if the last labels do not contain a NULL character. // If TextTail is specified and a NULL character exists in the last labels, // the setting is automatically switched to MARISA_BINARY_TAIL. TextTail TailMode = iota + 1 // BinaryTail also merges last labels but as byte sequences. It uses a bit // vector to detect the end of a sequence, instead of NULL characters. So, // BinaryTail requires a larger space if the average length of labels is // greater than 8. BinaryTail )
type Trie ¶
type Trie struct {
// contains filtered or unexported fields
}
Trie is a read-only in-memory little-endian MARISA dictionary.
At the moment, it must not be used concurrently. This restriction may be lifted in the future. It's okay to nest iterators or call methods from within one.
On 64-bit systems, the maximum dictionary size is 4GiB. On 32-bit systems, the maximum dictionary size is 2 GiB. Note that if you build/load the same trie twice, it needs twice the amount of memory since it swaps it at the end.
Example (Query) ¶
var trie marisa.Trie
if err := trie.Build(slices.Values(testdata.Words), marisa.Config{}); err != nil {
panic(err)
}
id, ok, err := trie.Lookup("iterate")
if err != nil {
panic(err)
}
if !ok {
fmt.Println("not found")
}
key, ok, err := trie.ReverseLookup(id)
if err != nil {
panic(err)
}
if !ok {
fmt.Println("not found")
}
fmt.Println("l", id, key)
for id, key := range trie.PredictiveSearchSeq("iterat")(&err) {
fmt.Println("p", id, key)
}
if err != nil {
panic(err)
}
for id, key := range trie.CommonPrefixSearchSeq("iterated")(&err) {
fmt.Println("c", id, key)
}
if err != nil {
panic(err)
}
Output: l 262491 iterate p 352923 iterative p 413344 iteratively p 413345 iterativeness p 352924 iteration p 413346 iterations p 352925 iterating p 262491 iterate p 352926 iterated p 352927 iterately p 352928 iterates p 262492 iterator p 352929 iterator's p 352930 iterators c 4 i c 2235 ite c 17192 iter c 262491 iterate c 352926 iterated
func Load ¶
Load is shorthand for initializing a dictionary with Trie.ReadFrom.
func New ¶
New is shorthand for initializing a dictionary with Trie.UnmarshalBinary. Using Load with a bytes.Reader may result in a more optimal in-memory layout.
func (*Trie) Build ¶
Build builds a dictionary out of the specified set of keys, with a weight of 1 for each.
Example ¶
keys := []string{
"a",
"a/b",
"a/b/c",
"b",
"b/c",
"c",
}
var trie marisa.Trie
if err := trie.Build(slices.Values(keys), marisa.Config{}); err != nil {
panic(err)
}
fmt.Println(&trie)
Output: marisa.Trie(size=6 io_size=4088 total_size=3403 num_tries=3 num_nodes=7 tail_mode=text node_order=weight)
func (*Trie) BuildWeights ¶
BuildWeights builds a dictionary out of the specified set of keys and weights. If a key is specified multiple times, the weights are accumulated.
Example ¶
keys := map[string]float32{
"a": 1.0,
"a/b": 1.0,
"a/b/c": 1.0,
"b": 1.0,
"b/c": 5.0,
"c": 3.5,
}
// note: weights are cumulative and include the weights of all children
cfg := marisa.Config{
NodeOrder: marisa.WeightOrder,
}
var trie marisa.Trie
if err := trie.BuildWeights(maps.All(keys), cfg); err != nil {
panic(err)
}
fmt.Println(&trie)
var err error
for _, key := range trie.DumpSeq()(&err) {
fmt.Println(key)
}
if err != nil {
panic(err)
}
Output: marisa.Trie(size=6 io_size=4088 total_size=3403 num_tries=3 num_nodes=7 tail_mode=text node_order=weight) b b/c c a a/b a/b/c
func (*Trie) CommonPrefixSearch ¶
CommonPrefixSearchSeq returns keys which equal any prefix of the query string. If the limit is -1, all keys are returned.
func (*Trie) CommonPrefixSearchSeq ¶
CommonPrefixSearchSeq returns keys which equal any prefix of the query string.
func (*Trie) MapFile ¶
MapFile mmaps a file and loads the dictionary from it. On error, the trie is left unchanged. If not supported by the current platform, an error matching errors.ErrUnsupported is returned.
func (*Trie) MarshalBinary ¶
MarshalBinary serializes the dictionary.
func (*Trie) NodeOrder ¶ added in v0.0.2
NodeOrder returns the tail mode of the dictionary. If unknown, it returns zero.
func (*Trie) PredictiveSearch ¶
PredictiveSearch returns keys starting with a query string. If the limit is -1, all keys are returned.
func (*Trie) PredictiveSearchSeq ¶
PredictiveSearch returns keys starting with a query string.
func (*Trie) ReadFrom ¶
ReadFrom reads a dictionary from r. On success, it will have read exactly the size of the dictionary. On error, the trie is left unchanged.
func (*Trie) ReverseLookup ¶
ReverseLookup gets a key by its ID.
func (*Trie) Size ¶
Size returns the number of keys in the dictionary. Key are numbered from 0 to Size-1.
func (*Trie) TailMode ¶ added in v0.0.2
TailMode returns the tail mode of the dictionary. If unknown, it returns zero.
func (*Trie) UnmarshalBinary ¶
UnmarshalBinary copies b and maps the trie directly from it. This is faster than Trie.ReadFrom, but may have a less optimal memory layout. On error, the trie is left unchanged.
Directories
¶
| Path | Synopsis |
|---|---|
|
cmd
|
|
|
marisa-benchmark
command
Command marisa-benchmark is a Go re-implementation of the same command.
|
Command marisa-benchmark is a Go re-implementation of the same command. |
|
marisa-build
command
Command marisa-build is a Go re-implementation of the same command.
|
Command marisa-build is a Go re-implementation of the same command. |
|
marisa-common-prefix-search
command
Command marisa-common-prefix-search is a Go re-implementation of the same command.
|
Command marisa-common-prefix-search is a Go re-implementation of the same command. |
|
marisa-dump
command
Command marisa-dump is a Go re-implementation of the same command.
|
Command marisa-dump is a Go re-implementation of the same command. |
|
marisa-lookup
command
Command marisa-lookup is a Go re-implementation of the same command.
|
Command marisa-lookup is a Go re-implementation of the same command. |
|
marisa-predictive-search
command
Command marisa-predictive-search is a Go re-implementation of the same command.
|
Command marisa-predictive-search is a Go re-implementation of the same command. |
|
marisa-reverse-lookup
command
Command marisa-reverse-lookup is a Go re-implementation of the same command.
|
Command marisa-reverse-lookup is a Go re-implementation of the same command. |
|
cxxerr
Package cxxerr wraps C++ exceptions.
|
Package cxxerr wraps C++ exceptions. |
|
walloc
Package walloc implements allocators for wazero.
|
Package walloc implements allocators for wazero. |
|
wexcept
Package wexcept throws and catches Go and C++ exceptions.
|
Package wexcept throws and catches Go and C++ exceptions. |
|
wexport
Package wexport contains type-safe wrappers for exporting host functions.
|
Package wexport contains type-safe wrappers for exporting host functions. |
|
wwrap
Package wwrap wraps wazero modules.
|
Package wwrap wraps wazero modules. |