Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

--- opensearch [2025/03/25 01:03]
jango
+++ opensearch [2025/03/27 11:49] (aktuell)
jango
@@ Zeile 1: / Zeile 1: @@
+Siehe auch [[ElasticSearch]] (z.B Scripte, API, etc. sind identisch). [[Wazuh]] verwendet [[OpenSearch]].
 =====Installation=====
@@ Zeile 21: / Zeile 23: @@
 )
 client.info()
+</code>
+Get some random data for e.g [[https://www.kaggle.com/datasets/jrobischon/wikipedia-movie-plots|wikipedia-movie-plots]]. Read the data into a pandas array.
+<code python>
+import pandas as pd
+df = (
+    pd.read_csv("wiki_movie_plots_deduped.csv")
+    .dropna()
+    .sample(5000, random_state=42)
+    .reset_index(drop=True)
+)
 </code>
@@ Zeile 41: / Zeile 56: @@
 }
 response = client.indices.create("movies", body=body)
+</code>
+Push the data into the index
+<code python>
+for i, row in df.iterrows():
+    body = {
+            "title": row["Title"],
+            "ethnicity": row["Origin/Ethnicity"],
+            "director": row["Director"],
+            "cast": row["Cast"],
+            "genre": row["Genre"],
+            "plot": row["Plot"],
+            "year": row["Release Year"],
+            "wiki_page": row["Wiki Page"]
+    }
+    client.index(index="movies", id=i, body=body)
+</code>
+More data in a bulk
+<code python>
+from opensearchpy.helpers import bulk
+bulk_data = []
+for i,row in df.iterrows():
+    bulk_data.append(
+        {
+            "_index": "movies",
+            "_id": i,
+            "_source": {
+                "title": row["Title"],
+                "ethnicity": row["Origin/Ethnicity"],
+                "director": row["Director"],
+                "cast": row["Cast"],
+                "genre": row["Genre"],
+                "plot": row["Plot"],
+                "year": row["Release Year"],
+                "wiki_page": row["Wiki Page"],
+            }
+        }
+    )
+bulk(client, bulk_data)
+</code>
+Count the inserted data
+<code python>
+client.indices.refresh(index="movies")
+client.cat.count(index="movies", format="json")
+</code>
+Search the data
+<code python>
+resp = client.search(
+    index="movies",
+    body={
+        "query": {
+            "bool": {
+                "must": {
+                    "match_phrase": {
+                        "cast": "jack nicholson",
+                    }
+                },
+                "filter": {"bool": {"must_not": {"match_phrase": {"director": "tim burton"}}}},
+            },
+        },
+    }
+)
+resp
+</code>
+Remove documents
+<code python>
+client.delete(index="movies", id="2500")
+</code>
+Delete the index
+<code python>
+client.indices.delete(index='movies')
 </code>

MBCDN

Benutzer-Werkzeuge

Webseiten-Werkzeuge

Unterschiede

Seiten-Werkzeuge