Introduction
This article documents a case where Elasticsearch errors occurred due to disk pressure caused by Docker containers and images, along with the investigation and resolution methods. We hope this serves as a useful reference for those facing similar issues.
The Problem
The following error occurred in a running Elasticsearch instance:
{
"error": {
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
...
},
"status": 503
}
Initial investigation revealed that indices had entered a close state, and insufficient disk space was suspected.
Investigating Disk Usage
Checking Root Directory Usage
First, we checked the overall disk usage of the system.
sudo du -h --max-depth=1 / | sort -hr | head -n 20
Output:
60G /
50G /var
4.7G /usr
2.1G /home
1.2G /opt
...
The /var directory was found to be abnormally large at 50 GB.
Detailed Investigation of /var Directory
sudo du -h --max-depth=1 /var | sort -hr
Output:
50G /var
49G /var/lib
342M /var/log
240M /var/cache
128M /var/spool
...
Since /var/lib accounted for nearly the entire volume, we investigated further.
sudo du -h --max-depth=1 /var/lib | sort -hr
Output:
49G /var/lib
49G /var/lib/docker
256M /var/lib/snapd
128M /var/lib/apt
...
Root cause identified: Docker data was occupying 49 GB.
Analyzing Docker Disk Usage
We checked Docker’s detailed usage.
docker system df
Output:
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 38 5 39.8GB 35.99GB (90%)
Containers 5 4 10.44MB 0B (0%)
Local Volumes 2 1 646MB 32.57kB (0%)
Build Cache 129 0 2.972GB 2.972GB (100%)
Analysis Results
- Images: 33 out of 38 (approximately 36 GB) were unused
- Build Cache: All 3 GB were reclaimable
- Containers: Most were active and not candidates for deletion
- Volumes: Nearly all in use
Performing Cleanup
Bulk Cleanup Command
The following command was used to remove all unused resources at once.
docker system prune -a --volumes
This command removes:
- Stopped containers
- Unused images (the
-aoption includes untagged images) - Unused networks
- Unused volumes (the
--volumesoption) - Build cache
Result
WARNING! This will remove:
- all stopped containers
- all networks not used by at least one container
- all images without at least one container associated to them
- all build cache
Are you sure you want to continue? [y/N] y
Total reclaimed space: 39.2GB
Approximately 39 GB of free disk space was recovered.
Preventing Recurrence
Configuring Docker Log Rotation
To prevent Docker container logs from accumulating indefinitely, we edited /etc/docker/daemon.json.
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}
Configuration explanation:
max-size: Maximum size of a single log filemax-file: Number of log files to retain
Applying the Configuration
sudo systemctl restart docker
Considering Periodic Cleanup
In production environments, automating periodic cleanup can also be considered.
# Example: delete unused images weekly
0 2 * * 0 /usr/bin/docker image prune -f
Results and Lessons Learned
Resolution Results
- Elasticsearch errors were resolved
- Disk usage was reduced from 60 GB to 21 GB
- System stability improved
Lessons Learned
- Importance of regular monitoring: Regular monitoring of disk usage is necessary
- Docker operational management: Unused resources tend to accumulate especially in development environments
- Importance of log management: Log rotation configuration is essential
- Preventive maintenance: Periodic cleanup before problems occur is effective
Summary
In environments using Docker, images, containers, and build caches tend to accumulate, making regular cleanup important. We recommend performing appropriate operational management using the investigation and resolution methods introduced in this article.
Through this response, we were able to restore stable server operation. We hope this helps those facing similar issues.
Reference Command List
# Investigate disk usage
sudo du -h --max-depth=1 /path | sort -hr
# Check Docker resources
docker system df
docker images
docker ps -a
# Cleanup
docker system prune -a --volumes # Bulk deletion
docker image prune # Unused images only
docker container prune # Stopped containers only