Skip to content

Commit 33bd486

Browse files
author
Theo van Kraay
committed
lucene index preview
1 parent 14207fd commit 33bd486

File tree

3 files changed

+172
-0
lines changed

3 files changed

+172
-0
lines changed

articles/managed-instance-apache-cassandra/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
href: configure-hybrid-cluster.md
2020
- name: Deploy Spark Cluster with Databricks
2121
href: deploy-cluster-databricks.md
22+
- name: Search using Lucene Index
23+
href: search-lucene-index.md
2224
- name: Tutorials
2325
items:
2426
- name: Migration

articles/managed-instance-apache-cassandra/index.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ landingContent:
4949
url: create-multi-region-cluster.md
5050
- text: Deploy a Spark cluster with Azure Databricks
5151
url: deploy-cluster-databricks.md
52+
- text: Search using Lucene Index
53+
url: search-lucene-index.md
5254
- linkListType: concept
5355
links:
5456
- text: Security overview
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
---
2+
title: Quickstart - Search Azure Managed Instance for Apache Cassandra using Stratio's Cassandra Lucene Index.
3+
description: This quickstart shows how to search Azure Managed Instance for Apache Cassandra cluster using Stratio's Cassandra Lucene Index.
4+
author: TheovanKraay
5+
ms.author: thvankra
6+
ms.service: managed-instance-apache-cassandra
7+
ms.topic: quickstart
8+
ms.date: 04/17/2023
9+
---
10+
# Quickstart: Search Azure Managed Instance for Apache Cassandra using Lucene Index (Preview)
11+
12+
Cassandra Lucene Index, derived from Stratio Cassandra, is a plugin for Apache Cassandra that extends its index functionality to provide full text search capabilities and free multivariable, geospatial and bitemporal search. It is achieved through an Apache Lucene based implementation of Cassandra secondary indexes, where each node of the cluster indexes its own data.
13+
14+
This quickstart demonstrates how to query using Lucene Index.
15+
16+
> [!IMPORTANT]
17+
> Lucene Index is in public preview.
18+
> This feature is provided without a service level agreement, and it's not recommended for production workloads.
19+
> For more information, see [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
20+
21+
## Prerequisites
22+
23+
- If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
24+
- Deploy an Azure Managed Instance for Apache Cassandra cluster. You can do this via the [portal](create-cluster-portal.md) - Lucene indexes will be enabled by default.
25+
- Connect to your cluster from [CQLSH](https://learn.microsoft.com/en-us/azure/managed-instance-apache-cassandra/create-cluster-portal#connecting-from-cqlsh).
26+
27+
## Create a managed instance cluster
28+
29+
1. In your `CQLSH` command window, create a keyspace and table as below:
30+
31+
```SQL
32+
CREATE KEYSPACE demo
33+
WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'datacenter-1': 3};
34+
USE demo;
35+
CREATE TABLE tweets (
36+
id INT PRIMARY KEY,
37+
user TEXT,
38+
body TEXT,
39+
time TIMESTAMP,
40+
latitude FLOAT,
41+
longitude FLOAT
42+
);
43+
```
44+
45+
1. Now create a custom secondary index on the table using Lucene Index:
46+
47+
```SQL
48+
CREATE CUSTOM INDEX tweets_index ON tweets ()
49+
USING 'com.stratio.cassandra.lucene.Index'
50+
WITH OPTIONS = {
51+
'refresh_seconds': '1',
52+
'schema': '{
53+
fields: {
54+
id: {type: "integer"},
55+
user: {type: "string"},
56+
body: {type: "text", analyzer: "english"},
57+
time: {type: "date", pattern: "yyyy/MM/dd"},
58+
place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}
59+
}
60+
}'
61+
};
62+
```
63+
64+
1. Insert the following sample tweets:
65+
66+
```SQL
67+
INSERT INTO tweets (id,user,body,time,latitude,longitude) VALUES (1,'theo','Make money fast, 5 easy tips', '2023-04-01T11:21:59.001+0000', 0.0, 0.0);
68+
INSERT INTO tweets (id,user,body,time,latitude,longitude) VALUES (2,'theo','Click my link, like my stuff!', '2023-04-01T11:21:59.001+0000', 0.0, 0.0);
69+
INSERT INTO tweets (id,user,body,time,latitude,longitude) VALUES (3,'quetzal','Click my link, like my stuff!', '2023-04-02T11:21:59.001+0000', 0.0, 0.0);
70+
INSERT INTO tweets (id,user,body,time,latitude,longitude) VALUES (4,'quetzal','Click my link, like my stuff!', '2023-04-01T11:21:59.001+0000', 40.3930, -3.7328);
71+
INSERT INTO tweets (id,user,body,time,latitude,longitude) VALUES (5,'quetzal','Click my link, like my stuff!', '2023-04-01T11:21:59.001+0000', 40.3930, -3.7329);
72+
```
73+
74+
1. The index you created earlier will index all the columns in the table with the specified types, and it will be refreshed once per second. Alternatively, you can explicitly refresh all the index shards with an empty search with consistency ALL:
75+
76+
```SQL
77+
CONSISTENCY ALL
78+
SELECT * FROM tweets WHERE expr(tweets_index, '{refresh:true}');
79+
CONSISTENCY QUORUM
80+
```
81+
82+
1. Now, you can search for tweets within a certain date range:
83+
84+
```SQL
85+
SELECT * FROM tweets WHERE expr(tweets_index, '{filter: {type: "range", field: "time", lower: "2023/03/01", upper: "2023/05/01"}}');
86+
```
87+
1. The same search can be performed forcing an explicit refresh of the involved index shards:
88+
89+
```SQL
90+
SELECT * FROM tweets WHERE expr(tweets_index, '{
91+
filter: {type: "range", field: "time", lower: "2023/03/01", upper: "2023/05/01"},
92+
refresh: true
93+
}') limit 100;
94+
```
95+
96+
1. Now, to search the top 100 more relevant tweets where body field contains the phrase “Click my link” within the aforementioned date range:
97+
98+
```SQL
99+
SELECT * FROM tweets WHERE expr(tweets_index, '{
100+
filter: {type: "range", field: "time", lower: "2023/03/01", upper: "2023/05/01"},
101+
query: {type: "phrase", field: "body", value: "Click my link", slop: 1}
102+
}') LIMIT 100;
103+
```
104+
105+
1. To refine the search to get only the tweets written by users whose names start with "q":
106+
107+
```SQL
108+
SELECT * FROM tweets WHERE expr(tweets_index, '{
109+
filter: [
110+
{type: "range", field: "time", lower: "2023/03/01", upper: "2023/05/01"},
111+
{type: "prefix", field: "user", value: "q"}
112+
],
113+
query: {type: "phrase", field: "body", value: "Click my link", slop: 1}
114+
}') LIMIT 100;
115+
```
116+
117+
1. To get the 100 more recent filtered results you can use the sort option:
118+
119+
```SQL
120+
SELECT * FROM tweets WHERE expr(tweets_index, '{
121+
filter: [
122+
{type: "range", field: "time", lower: "2023/03/01", upper: "2023/05/01"},
123+
{type: "prefix", field: "user", value: "q"}
124+
],
125+
query: {type: "phrase", field: "body", value: "Click my link", slop: 1},
126+
sort: {field: "time", reverse: true}
127+
}') limit 100;
128+
```
129+
130+
1. The previous search can be restricted to tweets created close to a geographical position:
131+
132+
```SQL
133+
SELECT * FROM tweets WHERE expr(tweets_index, '{
134+
filter: [
135+
{type: "range", field: "time", lower: "2023/03/01", upper: "2023/05/01"},
136+
{type: "prefix", field: "user", value: "q"},
137+
{type: "geo_distance", field: "place", latitude: 40.3930, longitude: -3.7328, max_distance: "1km"}
138+
],
139+
query: {type: "phrase", field: "body", value: "Click my link", slop: 1},
140+
sort: {field: "time", reverse: true}
141+
}') limit 100;
142+
```
143+
144+
1. It is also possible to sort the results by distance to a geographical position:
145+
146+
```SQL
147+
SELECT * FROM tweets WHERE expr(tweets_index, '{
148+
filter: [
149+
{type: "range", field: "time", lower: "2023/03/01", upper: "2023/05/01"},
150+
{type: "prefix", field: "user", value: "q"},
151+
{type: "geo_distance", field: "place", latitude: 40.3930, longitude: -3.7328, max_distance: "1km"}
152+
],
153+
query: {type: "phrase", field: "body", value: "Click my link", slop: 1},
154+
sort: [
155+
{field: "time", reverse: true},
156+
{field: "place", type: "geo_distance", latitude: 40.3930, longitude: -3.7328}
157+
]
158+
}') limit 100;
159+
```
160+
161+
For more in-depth information and samples see [Stratio's Cassandra Lucene Index](https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.14/doc/documentation.rst).
162+
163+
## Next steps
164+
165+
In this quickstart, you learned how to search an Azure Managed Instance for Apache Cassandra cluster using Lucene Search. You can now start working with the cluster:
166+
167+
> [!div class="nextstepaction"]
168+
> [Deploy a Managed Apache Spark Cluster with Azure Databricks](deploy-cluster-databricks.md)

0 commit comments

Comments
 (0)