关于数据库：Elasticsearch查询返回所有记录

Elasticsearch query to return all records

我在Elasticsearch中有一个小型数据库，出于测试目的，我希望将所有记录拉回来。我试图使用表单的URL ...

1	http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

有人可以给我你用来完成这个的URL吗？

相关讨论

我认为支持lucene语法，所以：

http://localhost:9200/foo/_search?pretty=true&q=*:*

size默认为10，因此您可能还需要&size=BIGNUMBER才能获得10个以上的项目。 (其中BIGNUMBER等于您认为比您的数据集大的数字)

但是，elasticsearch文档建议使用扫描搜索类型来获取大型结果集。

例如：

1
2
3
4
5
6

curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
"query" : {
"match_all" : {}
}
}'

然后根据上面的文档链接继续请求建议。

编辑：scan在2.1.0中已弃用。

scan与_doc排序的常规scroll请求相比没有任何好处。链接到弹性文档(由@ christophe-roussy发现)

相关讨论

1 2	http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1 ^

请注意大小参数，它会将默认值(10)显示的匹配数增加到每个分片1000个。

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html

相关讨论

elasticsearch(ES)支持从ES集群索引获取数据的GET或POST请求。

当我们做GET时：

1	http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=:

当我们做POST时：

1
2
3
4
5
6
7
8
9

http://localhost:9200/[your_index_name]/_search
{
"size": [your value] //default 10
"from": [your start index] //default 0
"query":
{
"match_all": {}
}
}

我建议使用带弹性搜索的UI插件http://mobz.github.io/elasticsearch-head/
这将帮助您更好地了解您创建的索引并测试索引。

相关讨论

Note: The answer relates to an older version of Elasticsearch 0.90. Versions released since then have an updated syntax. Please refer to other answers that may provide a more accurate answer to the latest answer that you are looking for.

下面的查询将返回您想要返回的NO_OF_RESULTS。

1
2
3
4
5
6

curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
"match_all" : {}
}
}'

现在，这里的问题是你想要返回所有记录。很自然地，在编写查询之前，您不会知道NO_OF_RESULTS的值。

我们如何知道您的文档中存在多少条记录？只需在下面输入查询即可

1	curl -XGET 'localhost:9200/foo/_search' -d '

这会给你一个看起来像下面的结果

1
2
3
4
5
6

{
hits" : {
"total" : 2357,
"hits" : [
{
..................

结果总计告诉您文档中有多少记录可用。所以，这是了解NO_OF结果值的好方法

1	curl -XGET 'localhost:9200/_search' -d '

搜索所有索引中的所有类型

1	curl -XGET 'localhost:9200/foo/_search' -d '

搜索foo索引中的所有类型

1	curl -XGET 'localhost:9200/foo1,foo2/_search' -d '

搜索foo1和foo2索引中的所有类型

1	curl -XGET 'localhost:9200/f*/_search

搜索以f开头的任何索引中的所有类型

1	curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '

在所有索引中搜索类型用户和推文

相关讨论

这是我使用python客户端找到的最佳解决方案

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

# Initialize the scroll
page = es.search(
index = 'yourIndex',
doc_type = 'yourType',
scroll = '2m',
search_type = 'scan',
size = 1000,
body = {
# Your query's body
})
sid = page['_scroll_id']
scroll_size = page['hits']['total']

# Start scrolling
while (scroll_size > 0):
print"Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
# Update the scroll ID
sid = page['_scroll_id']
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
print"scroll size:" + str(scroll_size)
# Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

使用java客户端

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi","test");

SearchResponse scrollResp = client.prepareSearch(test)
.addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
.setScroll(new TimeValue(60000))
.setQuery(qb)
.setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
for (SearchHit hit : scrollResp.getHits().getHits()) {
//Handle the hit...
}

scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

相关讨论

使用server:9200/_stats来获取有关所有别名的统计信息..比如每个别名的大小和元素数量，这非常有用并提供有用的信息

相关讨论

如果你想要提取数千条记录，那么......有些人给出了使用"滚动"的正确答案(注意：有些人还建议使用"search_type = scan"。这已被弃用，并在v5.0中删除了。你不需要它)

从"搜索"查询开始，但指定"滚动"参数(此处我使用1分钟超时)：

1
2
3
4
5
6
7

curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
"query": {
"match_all" : {}
}
}
'

这包括你的第一批"点击"。但我们没有在这里完成。上面的curl命令的输出将是这样的：

{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}

使用_scroll_id非常重要，接下来应运行以下命令：

1
2
3
4
5
6

curl -XGET 'localhost:9200/_search/scroll' -d'
{
"scroll" :"1m",
"scroll_id" :"c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1"
}
'

但是，传递scroll_id并不是设计为手动完成的。最好的办法就是编写代码来完成它。例如在java中：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

private TransportClient client = null;
private Settings settings = ImmutableSettings.settingsBuilder()
.put(CLUSTER_NAME,"cluster-test").build();
private SearchResponse scrollResp = null;

this.client = new TransportClient(settings);
this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));

QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
.setScroll(new TimeValue(60000))
.setQuery(queryBuilder)
.setSize(100).execute().actionGet();

scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
.setScroll(new TimeValue(timeVal))
.execute()
.actionGet();

现在最后一个命令的LOOP使用SearchResponse来提取数据。

如果你只是添加一些大数字作为大小，Elasticsearch将变得更慢，一个方法用于获取所有文档使用扫描和滚动ID。

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

在Elasticsearch v7.2中，您可以这样做：

1
2
3
4
5
6
7

POST /foo/_search?scroll=1m
{
"size": 100,
"query": {
"match_all": {}
}
}

这样的结果将包含一个_scroll_id，你必须查询它以获得下一个100块。

1
2
3
4
5

POST /_search/scroll
{
"scroll" :"1m",
"scroll_id" :"<YOUR SCROLL ID>"
}

相关讨论

简单！您可以使用size和from参数！

1	http://localhost:9200/[your index name]/_search?size=1000&from=0

然后逐渐更改from，直到获得所有数据。

相关讨论

调整大小的最佳方法是在URL前面使用size = number

1	Curl -XGET"http://localhost:9200/logstash-*/_search?size=50&pretty"

注意：可以在此大小中定义的最大值为10000.对于任何超过一万的值，它希望您使用滚动功能，这将最大限度地减少对性能的影响。

相关讨论

您可以使用_count API获取size参数的值：

1	http://localhost:9200/foo/_count?q=<your query>

返回{count:X, ...}。提取值'X'然后执行实际查询：

1	http://localhost:9200/foo/_search?q=<your query>&size=X

相关讨论

？HTTP：//本地主机：9200 /富/ _search /尺寸= 1000＆安培;漂亮= 1

您需要指定大小查询参数，因为默认值为10

相关讨论

对于Elasticsearch 6.x

要求：GET /foo/_search?pretty=true

回复：在Hits-> total中，给出文档的计数

1
2
3
4
5
6
7
8
9
10
11
12
13
14

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1001,
"max_score": 1,
"hits": [
{

size param将显示的命中数从默认值(10)增加到500。

1	http://localhost:9200/[indexName]/_search?pretty=true&size=500&q=:

逐步更改以获取所有数据。

1	http://localhost:9200/[indexName]/_search?size=500&from=0

1	curl -X GET 'localhost:9200/foo/_search?q=*&pretty'

默认情况下，Elasticsearch返回10条记录，因此应明确提供大小。

根据请求添加大小以获得所需的记录数。

http：// {host}：9200 / {index_name} / _search？pretty = true＆amp; size =(记录数)

注意：
最大页面大小不能超过index.max_result_window索引设置，默认为10,000。

官方文档提供了这个问题的答案！你可以在这里找到它。

1
2
3
4

{
"query": {"match_all": {} },
"size": 1
}

您只需将size(1)替换为您想要查看的结果数量！

相关讨论

来自Kibana DevTools：

1
2
3
4
5
6

GET my_index_name/_search
{
"query": {
"match_all": {}
}
}

弹性搜索返回的最大结果是10000，通过提供大小

1
2
3
4
5
6
7

curl -XGET 'localhost:9200/index/type/_search?scroll=1m' -d '
{
"size":10000,
"query" : {
"match_all" : {}
}
}'

之后，您必须使用Scroll API获取结果并获取_scroll_id值并将此值放在scroll_id中

1
2
3
4
5

curl -XGET 'localhost:9200/_search/scroll' -d'
{
"scroll" :"1m",
"scroll_id" :""
}'

这是完成你想要的查询，
(我建议使用Kibana，因为它有助于更??好地理解查询)

1
2
3
4
5
6
7
8

GET my_index_name/my_type_name/_search
{
"query":{
"match_all":{}
},
size : 20,
from : 3
}

获取所有记录，你必须使用"match_all"查询。

size是要获取的记录数(限制类型)。
默认情况下，ES只返回10条记录

from就像跳过，跳过前3条记录。

如果要准确获取所有记录，只需使用"total"字段中的值
一旦你从Kibana点击这个查询并使用它与"大小"，从结果。

要返回所有索引的所有记录，您可以执行以下操作

curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty

输出：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

"took" : 866,
"timed_out" : false,
"_shards" : {
"total" : 25,
"successful" : 25,
"failed" : 0
},
"hits" : {
"total" : 512034694,
"max_score" : 1.0,
"hits" : [ {
"_index" :"grafana-dash",
"_type" :"dashboard",
"_id" :"test",
"_score" : 1.0,
...

1
2
3
4
5
6

curl -XGET '{{IP/localhost}}:9200/{{Index name}}/{{type}}/_search?scroll=10m&pretty' -d '{
"query": {
"filtered": {
"query": {
"match_all": {}
}}'

相关讨论

没有，除了@Akira Sendoh已经回答了如何实际获得所有文档。但即使是那个解决方案也会在没有日志的情况下崩溃我的ES 6.3服使用低级elasticsearch-py库唯一有用的是通过使用scroll() api的扫描助手：

1
2
3
4
5
6
7
8
9
10
11

from elasticsearch.helpers import scan

doc_generator = scan(
es_obj,
query={"query": {"match_all": {}}},
index="my-index",
)

# use the generator to iterate, dont try to make a list or you will get out of RAM
for doc in doc_generator:
# use it somehow

然而，现在更干净的方式似乎是通过elasticsearch-dsl库，它提供了更抽象，更清晰的调用，例如：http：//elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#hits

如果有人正在寻找像我一样从Elasticsearch中检索的所有数据用于某些用例，我就是这样做的。而且，所有数据均指，所有索引和所有文档类型。我正在使用Elasticsearch 6.3

1
2
3
4
5
6
7

curl -X GET"localhost:9200/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}
'

Elasticsearch参考

使用python包elasticsearch-dsl的简单解决方案：

1
2
3
4
5
6
7
8
9
10
11
12
13
14

from elasticsearch_dsl import Search
from elasticsearch_dsl import connections

connections.create_connection(hosts=['localhost'])

s = Search(index="foo")
response = s.scan()

count = 0
for hit in response:
# print(hit.to_dict()) # be careful, it will printout every hit in your index
count += 1

print(count)

另请参见https://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Search.scan。

您可以使用size = 0这将返回所有文档
例

1
2
3
4
5
6
7

curl -XGET 'localhost:9200/index/type/_search' -d '
{
size:0,
"query" : {
"match_all" : {}
}
}'

相关讨论