Benutzer-Werkzeuge

Webseiten-Werkzeuge


opensearch

Unterschiede

Hier werden die Unterschiede zwischen zwei Versionen angezeigt.

Link zu dieser Vergleichsansicht

Beide Seiten der vorigen Revision Vorhergehende Überarbeitung
Nächste Überarbeitung
Vorhergehende Überarbeitung
opensearch [2025/03/25 01:03]
jango
opensearch [2025/03/27 11:49] (aktuell)
jango
Zeile 1: Zeile 1:
 +Siehe auch [[ElasticSearch]] (z.B Scripte, API, etc. sind identisch). [[Wazuh]] verwendet [[OpenSearch]].
 +
 =====Installation===== =====Installation=====
  
Zeile 21: Zeile 23:
 ) )
 client.info() client.info()
 +</code>
 +
 +Get some random data for e.g [[https://www.kaggle.com/datasets/jrobischon/wikipedia-movie-plots|wikipedia-movie-plots]]. Read the data into a pandas array.
 +
 +<code python>
 +import pandas as pd
 +
 +df = (
 +    pd.read_csv("wiki_movie_plots_deduped.csv")
 +    .dropna()
 +    .sample(5000, random_state=42)
 +    .reset_index(drop=True)
 +)
 </code> </code>
  
Zeile 41: Zeile 56:
 } }
 response = client.indices.create("movies", body=body) response = client.indices.create("movies", body=body)
 +</code>
 +
 +Push the data into the index
 +
 +<code python>
 +for i, row in df.iterrows():
 +    body = {
 +            "title": row["Title"],
 +            "ethnicity": row["Origin/Ethnicity"],
 +            "director": row["Director"],
 +            "cast": row["Cast"],
 +            "genre": row["Genre"],
 +            "plot": row["Plot"],
 +            "year": row["Release Year"],
 +            "wiki_page": row["Wiki Page"]
 +    }
 +    client.index(index="movies", id=i, body=body)
 +</code>
 +
 +More data in a bulk
 +
 +<code python>
 +from opensearchpy.helpers import bulk
 +
 +bulk_data = []
 +for i,row in df.iterrows():
 +    bulk_data.append(
 +        {
 +            "_index": "movies",
 +            "_id": i,
 +            "_source": {
 +                "title": row["Title"],
 +                "ethnicity": row["Origin/Ethnicity"],
 +                "director": row["Director"],
 +                "cast": row["Cast"],
 +                "genre": row["Genre"],
 +                "plot": row["Plot"],
 +                "year": row["Release Year"],
 +                "wiki_page": row["Wiki Page"],
 +            }
 +        }
 +    )
 +bulk(client, bulk_data)
 +</code>
 +
 +Count the inserted data
 +
 +<code python>
 +client.indices.refresh(index="movies")
 +client.cat.count(index="movies", format="json")
 +</code>
 +
 +Search the data
 +
 +<code python>
 +resp = client.search(
 +    index="movies",
 +    body={
 +        "query": {
 +            "bool": {
 +                "must": {
 +                    "match_phrase": {
 +                        "cast": "jack nicholson",
 +                    }
 +                },
 +                "filter": {"bool": {"must_not": {"match_phrase": {"director": "tim burton"}}}},
 +            },
 +        },
 +    }
 +)
 +resp
 +</code>
 +
 +Remove documents
 +<code python>
 +client.delete(index="movies", id="2500")
 +</code>
 +
 +Delete the index
 +<code python>
 +client.indices.delete(index='movies')
 </code> </code>
opensearch.1742861012.txt.gz · Zuletzt geändert: 2025/03/25 01:03 von jango