432 lines
8.7 KiB
Markdown
432 lines
8.7 KiB
Markdown
|
# liqe
|
||
|
|
||
|
Lightweight and performant Lucene-like parser, serializer and search engine.
|
||
|
|
||
|
- [liqe](#liqe)
|
||
|
- [Motivation](#motivation)
|
||
|
- [Usage](#usage)
|
||
|
- [Query Syntax](#query-syntax)
|
||
|
- [Liqe syntax cheat sheet](#liqe-syntax-cheat-sheet)
|
||
|
- [Keyword matching](#keyword-matching)
|
||
|
- [Number matching](#number-matching)
|
||
|
- [Range matching](#range-matching)
|
||
|
- [Wildcard matching](#wildcard-matching)
|
||
|
- [Boolean operators](#boolean-operators)
|
||
|
- [Serializer](#serializer)
|
||
|
- [AST](#ast)
|
||
|
- [Utilities](#utilities)
|
||
|
- [Compatibility with Lucene](#compatibility-with-lucene)
|
||
|
- [Recipes](#recipes)
|
||
|
- [Handling syntax errors](#handling-syntax-errors)
|
||
|
- [Highlighting matches](#highlighting-matches)
|
||
|
- [Development](#development)
|
||
|
- [Compiling Parser](#compiling-parser)
|
||
|
- [Benchmarking Changes](#benchmarking-changes)
|
||
|
- [Tutorials](#tutorials)
|
||
|
|
||
|
## Motivation
|
||
|
|
||
|
Originally built Liqe to enable [Roarr](https://github.com/gajus/roarr) log filtering via [cli](https://github.com/gajus/roarr-cli#filtering-logs). I have since been polishing this project as a hobby/intellectual exercise. I've seen it being adopted by [various](https://github.com/gajus/liqe/network/dependents) CLI and web applications that require advanced search. To my knowledge, it is currently the most complete Lucene-like syntax parser and serializer in JavaScript, as well as a compatible in-memory search engine.
|
||
|
|
||
|
Liqe use cases include:
|
||
|
|
||
|
* parsing search queries
|
||
|
* serializing parsed queries
|
||
|
* searching JSON documents using the Liqe query language (LQL)
|
||
|
|
||
|
Note that the [Liqe AST](#ast) is treated as a public API, i.e., one could implement their own search mechanism that uses Liqe query language (LQL).
|
||
|
|
||
|
## Usage
|
||
|
|
||
|
```ts
|
||
|
import {
|
||
|
filter,
|
||
|
highlight,
|
||
|
parse,
|
||
|
test,
|
||
|
} from 'liqe';
|
||
|
|
||
|
const persons = [
|
||
|
{
|
||
|
height: 180,
|
||
|
name: 'John Morton',
|
||
|
},
|
||
|
{
|
||
|
height: 175,
|
||
|
name: 'David Barker',
|
||
|
},
|
||
|
{
|
||
|
height: 170,
|
||
|
name: 'Thomas Castro',
|
||
|
},
|
||
|
];
|
||
|
```
|
||
|
|
||
|
Filter a collection:
|
||
|
|
||
|
```ts
|
||
|
filter(parse('height:>170'), persons);
|
||
|
// [
|
||
|
// {
|
||
|
// height: 180,
|
||
|
// name: 'John Morton',
|
||
|
// },
|
||
|
// {
|
||
|
// height: 175,
|
||
|
// name: 'David Barker',
|
||
|
// },
|
||
|
// ]
|
||
|
```
|
||
|
|
||
|
Test a single object:
|
||
|
|
||
|
```ts
|
||
|
test(parse('name:John'), persons[0]);
|
||
|
// true
|
||
|
test(parse('name:David'), persons[0]);
|
||
|
// false
|
||
|
```
|
||
|
|
||
|
Highlight matching fields and substrings:
|
||
|
|
||
|
```ts
|
||
|
test(highlight('name:john'), persons[0]);
|
||
|
// [
|
||
|
// {
|
||
|
// path: 'name',
|
||
|
// query: /(John)/,
|
||
|
// }
|
||
|
// ]
|
||
|
test(highlight('height:180'), persons[0]);
|
||
|
// [
|
||
|
// {
|
||
|
// path: 'height',
|
||
|
// }
|
||
|
// ]
|
||
|
```
|
||
|
|
||
|
## Query Syntax
|
||
|
|
||
|
Liqe uses Liqe Query Language (LQL), which is heavily inspired by Lucene but extends it in various ways that allow a more powerful search experience.
|
||
|
|
||
|
### Liqe syntax cheat sheet
|
||
|
|
||
|
```rb
|
||
|
# search for "foo" term anywhere in the document (case insensitive)
|
||
|
foo
|
||
|
|
||
|
# search for "foo" term anywhere in the document (case sensitive)
|
||
|
'foo'
|
||
|
"foo"
|
||
|
|
||
|
# search for "foo" term in `name` field
|
||
|
name:foo
|
||
|
|
||
|
# search for "foo" term in `full name` field
|
||
|
'full name':foo
|
||
|
"full name":foo
|
||
|
|
||
|
# search for "foo" term in `first` field, member of `name`, i.e.
|
||
|
# matches {name: {first: 'foo'}}
|
||
|
name.first:foo
|
||
|
|
||
|
# search using regex
|
||
|
name:/foo/
|
||
|
name:/foo/o
|
||
|
|
||
|
# search using wildcard
|
||
|
name:foo*bar
|
||
|
name:foo?bar
|
||
|
|
||
|
# boolean search
|
||
|
member:true
|
||
|
member:false
|
||
|
|
||
|
# null search
|
||
|
member:null
|
||
|
|
||
|
# search for age =, >, >=, <, <=
|
||
|
height:=100
|
||
|
height:>100
|
||
|
height:>=100
|
||
|
height:<100
|
||
|
height:<=100
|
||
|
|
||
|
# search for height in range (inclusive, exclusive)
|
||
|
height:[100 TO 200]
|
||
|
height:{100 TO 200}
|
||
|
|
||
|
# boolean operators
|
||
|
name:foo AND height:=100
|
||
|
name:foo OR name:bar
|
||
|
|
||
|
# unary operators
|
||
|
NOT foo
|
||
|
-foo
|
||
|
NOT foo:bar
|
||
|
-foo:bar
|
||
|
name:foo AND NOT (bio:bar OR bio:baz)
|
||
|
|
||
|
# implicit AND boolean operator
|
||
|
name:foo height:=100
|
||
|
|
||
|
# grouping
|
||
|
name:foo AND (bio:bar OR bio:baz)
|
||
|
```
|
||
|
|
||
|
### Keyword matching
|
||
|
|
||
|
Search for word "foo" in any field (case insensitive).
|
||
|
|
||
|
```rb
|
||
|
foo
|
||
|
```
|
||
|
|
||
|
Search for word "foo" in the `name` field.
|
||
|
|
||
|
```rb
|
||
|
name:foo
|
||
|
```
|
||
|
|
||
|
Search for `name` field values matching `/foo/i` regex.
|
||
|
|
||
|
```rb
|
||
|
name:/foo/i
|
||
|
```
|
||
|
|
||
|
Search for `name` field values matching `f*o` wildcard pattern.
|
||
|
|
||
|
```rb
|
||
|
name:f*o
|
||
|
```
|
||
|
|
||
|
Search for `name` field values matching `f?o` wildcard pattern.
|
||
|
|
||
|
```rb
|
||
|
name:f?o
|
||
|
```
|
||
|
|
||
|
Search for phrase "foo bar" in the `name` field (case sensitive).
|
||
|
|
||
|
```rb
|
||
|
name:"foo bar"
|
||
|
```
|
||
|
|
||
|
### Number matching
|
||
|
|
||
|
Search for value equal to 100 in the `height` field.
|
||
|
|
||
|
```rb
|
||
|
height:=100
|
||
|
```
|
||
|
|
||
|
Search for value greater than 100 in the `height` field.
|
||
|
|
||
|
```rb
|
||
|
height:>100
|
||
|
```
|
||
|
|
||
|
Search for value greater than or equal to 100 in the `height` field.
|
||
|
|
||
|
```rb
|
||
|
height:>=100
|
||
|
```
|
||
|
|
||
|
### Range matching
|
||
|
|
||
|
Search for value greater or equal to 100 and lower or equal to 200 in the `height` field.
|
||
|
|
||
|
```rb
|
||
|
height:[100 TO 200]
|
||
|
```
|
||
|
|
||
|
Search for value greater than 100 and lower than 200 in the `height` field.
|
||
|
|
||
|
```rb
|
||
|
height:{100 TO 200}
|
||
|
```
|
||
|
|
||
|
### Wildcard matching
|
||
|
|
||
|
Search for any word that starts with "foo" in the `name` field.
|
||
|
|
||
|
```rb
|
||
|
name:foo*
|
||
|
```
|
||
|
|
||
|
Search for any word that starts with "foo" and ends with "bar" in the `name` field.
|
||
|
|
||
|
```rb
|
||
|
name:foo*bar
|
||
|
```
|
||
|
|
||
|
Search for any word that starts with "foo" in the `name` field, followed by a single arbitrary character.
|
||
|
|
||
|
```rb
|
||
|
name:foo?
|
||
|
```
|
||
|
|
||
|
Search for any word that starts with "foo", followed by a single arbitrary character and immediately ends with "bar" in the `name` field.
|
||
|
|
||
|
```rb
|
||
|
name:foo?bar
|
||
|
```
|
||
|
|
||
|
### Boolean operators
|
||
|
|
||
|
Search for phrase "foo bar" in the `name` field AND the phrase "quick fox" in the `bio` field.
|
||
|
|
||
|
```rb
|
||
|
name:"foo bar" AND bio:"quick fox"
|
||
|
```
|
||
|
|
||
|
Search for either the phrase "foo bar" in the `name` field AND the phrase "quick fox" in the `bio` field, or the word "fox" in the `name` field.
|
||
|
|
||
|
```rb
|
||
|
(name:"foo bar" AND bio:"quick fox") OR name:fox
|
||
|
```
|
||
|
|
||
|
## Serializer
|
||
|
|
||
|
Serializer allows to convert Liqe tokens back to the original search query.
|
||
|
|
||
|
```ts
|
||
|
import {
|
||
|
parse,
|
||
|
serialize,
|
||
|
} from 'liqe';
|
||
|
|
||
|
const tokens = parse('foo:bar');
|
||
|
|
||
|
// {
|
||
|
// expression: {
|
||
|
// location: {
|
||
|
// start: 4,
|
||
|
// },
|
||
|
// quoted: false,
|
||
|
// type: 'LiteralExpression',
|
||
|
// value: 'bar',
|
||
|
// },
|
||
|
// field: {
|
||
|
// location: {
|
||
|
// start: 0,
|
||
|
// },
|
||
|
// name: 'foo',
|
||
|
// path: ['foo'],
|
||
|
// quoted: false,
|
||
|
// type: 'Field',
|
||
|
// },
|
||
|
// location: {
|
||
|
// start: 0,
|
||
|
// },
|
||
|
// operator: {
|
||
|
// location: {
|
||
|
// start: 3,
|
||
|
// },
|
||
|
// operator: ':',
|
||
|
// type: 'ComparisonOperator',
|
||
|
// },
|
||
|
// type: 'Tag',
|
||
|
// }
|
||
|
|
||
|
serialize(tokens);
|
||
|
// 'foo:bar'
|
||
|
```
|
||
|
|
||
|
## AST
|
||
|
|
||
|
```ts
|
||
|
import {
|
||
|
type BooleanOperatorToken,
|
||
|
type ComparisonOperatorToken,
|
||
|
type EmptyExpression,
|
||
|
type FieldToken,
|
||
|
type ImplicitBooleanOperatorToken,
|
||
|
type ImplicitFieldToken,
|
||
|
type LiteralExpressionToken,
|
||
|
type LogicalExpressionToken,
|
||
|
type RangeExpressionToken,
|
||
|
type RegexExpressionToken,
|
||
|
type TagToken,
|
||
|
type UnaryOperatorToken,
|
||
|
} from 'liqe';
|
||
|
```
|
||
|
|
||
|
There are 11 AST tokens that describe a parsed Liqe query.
|
||
|
|
||
|
If you are building a serializer, then you must implement all of them for the complete coverage of all possible query inputs. Refer to the [built-in serializer](./src/serialize.ts) for an example.
|
||
|
|
||
|
## Utilities
|
||
|
|
||
|
```ts
|
||
|
import {
|
||
|
isSafeUnquotedExpression,
|
||
|
} from 'liqe';
|
||
|
|
||
|
/**
|
||
|
* Determines if an expression requires quotes.
|
||
|
* Use this if you need to programmatically manipulate the AST
|
||
|
* before using a serializer to convert the query back to text.
|
||
|
*/
|
||
|
isSafeUnquotedExpression(expression: string): boolean;
|
||
|
```
|
||
|
|
||
|
## Compatibility with Lucene
|
||
|
|
||
|
The following Lucene abilities are not supported:
|
||
|
|
||
|
* [Fuzzy Searches](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Fuzzy%20Searches)
|
||
|
* [Proximity Searches](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity%20Searches)
|
||
|
* [Boosting a Term](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Boosting%20a%20Term)
|
||
|
|
||
|
## Recipes
|
||
|
|
||
|
### Handling syntax errors
|
||
|
|
||
|
In case of a syntax error, Liqe throws `SyntaxError`.
|
||
|
|
||
|
```ts
|
||
|
import {
|
||
|
parse,
|
||
|
SyntaxError,
|
||
|
} from 'liqe';
|
||
|
|
||
|
try {
|
||
|
parse('foo bar');
|
||
|
} catch (error) {
|
||
|
if (error instanceof SyntaxError) {
|
||
|
console.error({
|
||
|
// Syntax error at line 1 column 5
|
||
|
message: error.message,
|
||
|
// 4
|
||
|
offset: error.offset,
|
||
|
// 1
|
||
|
offset: error.line,
|
||
|
// 5
|
||
|
offset: error.column,
|
||
|
});
|
||
|
} else {
|
||
|
throw error;
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
### Highlighting matches
|
||
|
|
||
|
Consider using [`highlight-words`](https://github.com/tricinel/highlight-words) package to highlight Liqe matches.
|
||
|
|
||
|
## Development
|
||
|
|
||
|
### Compiling Parser
|
||
|
|
||
|
If you are going to modify parser, then use `npm run watch` to run compiler in watch mode.
|
||
|
|
||
|
### Benchmarking Changes
|
||
|
|
||
|
Before making any changes, capture the current benchmark on your machine using `npm run benchmark`. Run benchmark again after making any changes. Before committing changes, ensure that performance is not negatively impacted.
|
||
|
|
||
|
|
||
|
## Tutorials
|
||
|
|
||
|
* [Building advanced SQL search from a user text input](https://contra.com/p/WobOBob7-building-advanced-sql-search-from-a-user-text-input)
|