Syncing multiple GBs from EFS to a remote S3 account
Whilst shutting down some old projects, I’m archiving all of their folders and storing them in S3 in case of an emergency.
I don’t have AWS credentials stored and set up on the EC2 instance, so I’m passing them through as variables to the script below.
As with the script to export tens of thousands of individual MySQL databases, you’ll want to edit the values at the top.
EFS_MOUNT_POINT
is the full path to where the EFS drive is mountedS3_BUCKET
is your target S3 bucketAWS_
… self explanatory!ARCHIVE_DIR
is the full path to temporarily store the archivesLOG_FILE
is the path to the log file for this sync actionDEPTH
sets the depth before creating a new archive (see below)
The $DEPTH
value lets you control when a new archive will be created. In my situation, I have a directory structure like:
- /efs/vhosts/somesite.com/
- /efs/vhosts/differentsite.com/
- /efs/vhosts/finalsite.com/
- /efs/logs/somesite.com/
- /efs/logs/differentsite.com/
- /efs/logs/finalsite.com/
- … and so on
I don’t want a single giant archive containing everything, so I’ll set my $DEPTH
to 2
so that I end up with an archive for each vhost, each log directory, etc.
Let’s look at the script now:
#!/bin/bash
# Variables
EFS_MOUNT_POINT="/path/to/efs/mount"
S3_BUCKET="s3://your-target-s3-bucket"
AWS_ACCESS_KEY_ID="your-access-key-id"
AWS_SECRET_ACCESS_KEY="your-secret-access-key"
AWS_REGION="us-east-1"
ARCHIVE_DIR="/tmp/efs_archives"
LOG_FILE="/tmp/sync_log.txt"
DEPTH=2
# Ensure the archive directory exists
mkdir -p $ARCHIVE_DIR
# Clear the log file before we start
> $LOG_FILE
# Read and iterate the folders
cd $EFS_MOUNT_POINT
total_dirs=$(find . -mindepth $DEPTH -maxdepth $DEPTH -type d | wc -l)
current_dir=0
start_time=$(date +%s)
find . -mindepth $DEPTH -maxdepth $DEPTH -type d | while read -r dir; do
((current_dir++))
archive_path="${ARCHIVE_DIR}$(echo "$dir" | sed 's|/|-|g').tar.gz"
archive_name=$(basename "$archive_path" | sed 's/\./-/g')
echo "[$(date +'%Y-%m-%d %H:%M:%S')] Starting sync for: $archive_name ($current_dir of $total_dirs)"
# Create the GZIP archive
tar -czf "$archive_path" "$dir"
if [ $? -ne 0 ]; then
echo "[$(date +'%Y-%m-%d %H:%M:%S')] [$current_dir/$total_dirs] Failed to create archive for $dir" >> $LOG_FILE
continue
fi
# Sync the archive to S3
AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY aws s3 cp "$archive_path" "$S3_BUCKET" --region $AWS_REGION
if [ $? -ne 0 ]; then
echo "[$(date +'%Y-%m-%d %H:%M:%S')] [$current_dir/$total_dirs] Failed to sync $archive_name to S3" >> $LOG_FILE
continue
fi
# Remove the local archive
rm "$archive_path"
if [ $? -ne 0 ]; then
echo "[$(date +'%Y-%m-%d %H:%M:%S')] [$current_dir/$total_dirs] Failed to delete local archive $archive_name" >> $LOG_FILE
continue
fi
# Estimate remaining time
current_time=$(date +%s)
elapsed_time=$((current_time - start_time))
avg_time_per_dir=$((elapsed_time / current_dir))
remaining_time=$((avg_time_per_dir * (total_dirs - current_dir)))
echo "[$(date +'%Y-%m-%d %H:%M:%S')] [$current_dir/$total_dirs] Successfully processed $dir. Estimated time remaining: $((remaining_time / 60)) minutes and $((remaining_time % 60)) seconds"
done
# End time
end_time=$(date +%s)
total_duration=$((end_time - start_time))
echo "[$(date +'%Y-%m-%d %H:%M:%S')] All backups completed in $(date -u -d @$total_duration +'%H:%M:%S')"
Once you’re all set, let’s make the script executable (chmod +x sync.sh
), and open a terminal screen (screen -S sync
). Once you’re ready to go, run the script we just created (./sync.sh
). Once it’s started and starts outputting successful progress, you can detach the screen (cmd+a d
) and it will carry on running in the background.
As it processes through each archive file, the estimated time remaining will adjust itself - it’s calculating the average time per archive and then applying that to the number of remaining folders it needs to handle.
Whenever you want to check in, you can tail the log file (tail -f /tmp/sync_log.txt
) or you can rejoin the screen with screen -x sync
to see the estimated remaining time. Don’t forget that if you rejoin the screen, you need to detach it with cmd+a d
or it’ll close and you’ll have to restart!