问题
早上同事反馈ELK套件平台的日志看不到了。
查询logstash应用日志显示:
[INFO ] 2022-01-11 11:07:17.735 [[main]>worker1] elasticsearch - retrying failed action with response code: 503 ({"type"=>"unavailable_shards_exception", "reason"=>"[log-center-2022.01.11][0] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[log-center-2022.01.11][0]] containing [11] requests]"})
解决问题
一、查看集群健康状态:
发现健康状态是red
[root@netmgmt-prod-elk-03 ~]# curl '10.7.1.8:9200/_cluster/health?pretty'
{
"cluster_name" : "es-e679l179",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 6,
"number_of_data_nodes" : 3,
"active_primary_shards" : 5831,
"active_shards" : 11422,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 250,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 97.8581220013708
二、查看异常的index
发现有个索引异常了,直接进行删除即可。
[root@netmgmt-prod-elk-03 ~]# curl http://'10.7.1.8:9200/_cat/indices' | grep red
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
1 144k 1 2704 0 0 907 0 0:02:43 0:00:02 0:02:41 907
red open gps_lte-mode-2019.04.03 _N2IkwVeSxiP4s1gMyFQgw 5 1
删除索引:
curl -XDELETE ‘http://10.7.1.8:9200/gps_lte-mode-2019.04.03’
参考资料:https://blog.csdn.net/stefan1240/article/details/88988587