0.3.0 • Published 11 years ago

fm-index.jsx v0.3.0

Weekly downloads
4
License
-
Repository
github
Last release
11 years ago

fm-index.jsx

Synopsis

FM-index is the fastest full text search algorithm using a compressed index file. This is FM-index for JSX/JS/AMD/Common.js.

Motivation

FM-index is the alternate search algorithm of an inverse index algorithm. FM-index has the following advantages:

  1. It doesn't need to split word (like N-gram). It is good for CJK languages.
  2. It can recreate original document from the index file
  3. Index file is compressed.
  4. Easy to control the performance and the index file size.

Code Example

Use from JSX

import "fm-index.jsx";

class _Main {
    static function main(argv : string[]) : void
    {
        var fm = new FMIndex();
        fm.push("hello");
        fm.push("world");
        this.fm.build(5);
        console.log(this.fm.search('world')); // -> [5]
    }
}

Use from node.js

var FMIndex = require('fm-index.common.js').FMIndex;

Use from require.js

// use fm-index.amd.js
define(['fm-index.jsx'], function (fmindex) {

    var fmindex = fmindex.FMIndex();
    // Write simple usage here!
});

Use via standard JSX function

<script src="fm-index.js}}" type="text/javascript"></script>
<script type="text/javascript">
window.onload = function () {
    var FMIndex = JSX.require("src/fm-index.js").FMIndex;
});
</script>

Use via global variables

<script src="fm-index.global.js}}" type="text/javascript"></script>
<script type="text/javascript">
window.onload = function () {
    var fmindex = new FMIndex();
});
</script>

Installation

$ npm install fm-index.jsx

You should add the following modules to package.json if you want to use from JSX:

  • burrows-wheeler-transform.jsx
  • wavelet-matrix.jsx
  • binary-io.jsx
  • binary-support.jsx

API Reference

class FMIndex()

Constructor.

FMIndex.push(str : string) : void

Append string.

FMIndex.contentSize()

Return total length of pushed string. It is available before build().

FMIndex.build(ddic : int, maxChar : int = 65535) : void

Build search index. ddic is a cache density. (1 / ddic) * 100 % is a actual cache rate. If ddic == 1, densty = 100%, it provides maximum speed but it use match memory and storage. Initial recommendation value is 50.

maxChar is a maximum character code. If you reduce this, you can save memory.

FMIndex.size()

Return contetn size. It is available after build().

FMIndex.search(keyword : string) : int[]

Return position list that includes keyword.

FMIndex.getSubstring(pos : int, len : int) : string

Return original document content

FMIndex.dump(output : BinaryOutput) : void

Export bit-vector.

FMIndex.load(input : BinaryInput) : void

Import bit-vector.

Development

JSX

Don't be afraid JSX! If you have an experience of JavaScript, you can learn JSX quickly.

  • Static type system and unified class syntax.
  • All variables and methods belong to class.
  • JSX includes optimizer. You don't have to write tricky unreadalbe code for speed.
  • You can use almost all JavaScript API as you know. Some functions become static class functions. See reference.

Setup

To create development environment, call following command:

$ npm install

Repository

Run Test

$ grunt test

Build

$ grunt build

Generate API reference

$ grunt doc

Author

  • shibukawa / yoshiki@shibu.jp

License

MIT

Complete license is written in LICENSE.md.

0.3.0

11 years ago

0.2.6

11 years ago

0.2.5

11 years ago

0.2.4

11 years ago

0.2.3

11 years ago

0.2.2

11 years ago

0.2.1

11 years ago

0.2.0

11 years ago

0.1.3

11 years ago

0.1.2

11 years ago

0.1.1

11 years ago

0.1.0

11 years ago