dockerfile/examples/omnivore/api/liqe/README.md

432 lines
8.7 KiB
Markdown
Raw Permalink Normal View History

2024-03-15 14:52:38 +08:00
# liqe
Lightweight and performant Lucene-like parser, serializer and search engine.
- [liqe](#liqe)
- [Motivation](#motivation)
- [Usage](#usage)
- [Query Syntax](#query-syntax)
- [Liqe syntax cheat sheet](#liqe-syntax-cheat-sheet)
- [Keyword matching](#keyword-matching)
- [Number matching](#number-matching)
- [Range matching](#range-matching)
- [Wildcard matching](#wildcard-matching)
- [Boolean operators](#boolean-operators)
- [Serializer](#serializer)
- [AST](#ast)
- [Utilities](#utilities)
- [Compatibility with Lucene](#compatibility-with-lucene)
- [Recipes](#recipes)
- [Handling syntax errors](#handling-syntax-errors)
- [Highlighting matches](#highlighting-matches)
- [Development](#development)
- [Compiling Parser](#compiling-parser)
- [Benchmarking Changes](#benchmarking-changes)
- [Tutorials](#tutorials)
## Motivation
Originally built Liqe to enable [Roarr](https://github.com/gajus/roarr) log filtering via [cli](https://github.com/gajus/roarr-cli#filtering-logs). I have since been polishing this project as a hobby/intellectual exercise. I've seen it being adopted by [various](https://github.com/gajus/liqe/network/dependents) CLI and web applications that require advanced search. To my knowledge, it is currently the most complete Lucene-like syntax parser and serializer in JavaScript, as well as a compatible in-memory search engine.
Liqe use cases include:
* parsing search queries
* serializing parsed queries
* searching JSON documents using the Liqe query language (LQL)
Note that the [Liqe AST](#ast) is treated as a public API, i.e., one could implement their own search mechanism that uses Liqe query language (LQL).
## Usage
```ts
import {
filter,
highlight,
parse,
test,
} from 'liqe';
const persons = [
{
height: 180,
name: 'John Morton',
},
{
height: 175,
name: 'David Barker',
},
{
height: 170,
name: 'Thomas Castro',
},
];
```
Filter a collection:
```ts
filter(parse('height:>170'), persons);
// [
// {
// height: 180,
// name: 'John Morton',
// },
// {
// height: 175,
// name: 'David Barker',
// },
// ]
```
Test a single object:
```ts
test(parse('name:John'), persons[0]);
// true
test(parse('name:David'), persons[0]);
// false
```
Highlight matching fields and substrings:
```ts
test(highlight('name:john'), persons[0]);
// [
// {
// path: 'name',
// query: /(John)/,
// }
// ]
test(highlight('height:180'), persons[0]);
// [
// {
// path: 'height',
// }
// ]
```
## Query Syntax
Liqe uses Liqe Query Language (LQL), which is heavily inspired by Lucene but extends it in various ways that allow a more powerful search experience.
### Liqe syntax cheat sheet
```rb
# search for "foo" term anywhere in the document (case insensitive)
foo
# search for "foo" term anywhere in the document (case sensitive)
'foo'
"foo"
# search for "foo" term in `name` field
name:foo
# search for "foo" term in `full name` field
'full name':foo
"full name":foo
# search for "foo" term in `first` field, member of `name`, i.e.
# matches {name: {first: 'foo'}}
name.first:foo
# search using regex
name:/foo/
name:/foo/o
# search using wildcard
name:foo*bar
name:foo?bar
# boolean search
member:true
member:false
# null search
member:null
# search for age =, >, >=, <, <=
height:=100
height:>100
height:>=100
height:<100
height:<=100
# search for height in range (inclusive, exclusive)
height:[100 TO 200]
height:{100 TO 200}
# boolean operators
name:foo AND height:=100
name:foo OR name:bar
# unary operators
NOT foo
-foo
NOT foo:bar
-foo:bar
name:foo AND NOT (bio:bar OR bio:baz)
# implicit AND boolean operator
name:foo height:=100
# grouping
name:foo AND (bio:bar OR bio:baz)
```
### Keyword matching
Search for word "foo" in any field (case insensitive).
```rb
foo
```
Search for word "foo" in the `name` field.
```rb
name:foo
```
Search for `name` field values matching `/foo/i` regex.
```rb
name:/foo/i
```
Search for `name` field values matching `f*o` wildcard pattern.
```rb
name:f*o
```
Search for `name` field values matching `f?o` wildcard pattern.
```rb
name:f?o
```
Search for phrase "foo bar" in the `name` field (case sensitive).
```rb
name:"foo bar"
```
### Number matching
Search for value equal to 100 in the `height` field.
```rb
height:=100
```
Search for value greater than 100 in the `height` field.
```rb
height:>100
```
Search for value greater than or equal to 100 in the `height` field.
```rb
height:>=100
```
### Range matching
Search for value greater or equal to 100 and lower or equal to 200 in the `height` field.
```rb
height:[100 TO 200]
```
Search for value greater than 100 and lower than 200 in the `height` field.
```rb
height:{100 TO 200}
```
### Wildcard matching
Search for any word that starts with "foo" in the `name` field.
```rb
name:foo*
```
Search for any word that starts with "foo" and ends with "bar" in the `name` field.
```rb
name:foo*bar
```
Search for any word that starts with "foo" in the `name` field, followed by a single arbitrary character.
```rb
name:foo?
```
Search for any word that starts with "foo", followed by a single arbitrary character and immediately ends with "bar" in the `name` field.
```rb
name:foo?bar
```
### Boolean operators
Search for phrase "foo bar" in the `name` field AND the phrase "quick fox" in the `bio` field.
```rb
name:"foo bar" AND bio:"quick fox"
```
Search for either the phrase "foo bar" in the `name` field AND the phrase "quick fox" in the `bio` field, or the word "fox" in the `name` field.
```rb
(name:"foo bar" AND bio:"quick fox") OR name:fox
```
## Serializer
Serializer allows to convert Liqe tokens back to the original search query.
```ts
import {
parse,
serialize,
} from 'liqe';
const tokens = parse('foo:bar');
// {
// expression: {
// location: {
// start: 4,
// },
// quoted: false,
// type: 'LiteralExpression',
// value: 'bar',
// },
// field: {
// location: {
// start: 0,
// },
// name: 'foo',
// path: ['foo'],
// quoted: false,
// type: 'Field',
// },
// location: {
// start: 0,
// },
// operator: {
// location: {
// start: 3,
// },
// operator: ':',
// type: 'ComparisonOperator',
// },
// type: 'Tag',
// }
serialize(tokens);
// 'foo:bar'
```
## AST
```ts
import {
type BooleanOperatorToken,
type ComparisonOperatorToken,
type EmptyExpression,
type FieldToken,
type ImplicitBooleanOperatorToken,
type ImplicitFieldToken,
type LiteralExpressionToken,
type LogicalExpressionToken,
type RangeExpressionToken,
type RegexExpressionToken,
type TagToken,
type UnaryOperatorToken,
} from 'liqe';
```
There are 11 AST tokens that describe a parsed Liqe query.
If you are building a serializer, then you must implement all of them for the complete coverage of all possible query inputs. Refer to the [built-in serializer](./src/serialize.ts) for an example.
## Utilities
```ts
import {
isSafeUnquotedExpression,
} from 'liqe';
/**
* Determines if an expression requires quotes.
* Use this if you need to programmatically manipulate the AST
* before using a serializer to convert the query back to text.
*/
isSafeUnquotedExpression(expression: string): boolean;
```
## Compatibility with Lucene
The following Lucene abilities are not supported:
* [Fuzzy Searches](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Fuzzy%20Searches)
* [Proximity Searches](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Proximity%20Searches)
* [Boosting a Term](https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Boosting%20a%20Term)
## Recipes
### Handling syntax errors
In case of a syntax error, Liqe throws `SyntaxError`.
```ts
import {
parse,
SyntaxError,
} from 'liqe';
try {
parse('foo bar');
} catch (error) {
if (error instanceof SyntaxError) {
console.error({
// Syntax error at line 1 column 5
message: error.message,
// 4
offset: error.offset,
// 1
offset: error.line,
// 5
offset: error.column,
});
} else {
throw error;
}
}
```
### Highlighting matches
Consider using [`highlight-words`](https://github.com/tricinel/highlight-words) package to highlight Liqe matches.
## Development
### Compiling Parser
If you are going to modify parser, then use `npm run watch` to run compiler in watch mode.
### Benchmarking Changes
Before making any changes, capture the current benchmark on your machine using `npm run benchmark`. Run benchmark again after making any changes. Before committing changes, ensure that performance is not negatively impacted.
## Tutorials
* [Building advanced SQL search from a user text input](https://contra.com/p/WobOBob7-building-advanced-sql-search-from-a-user-text-input)