Counting the livestock [Updated]

Update 4/27/2012: See my more recent post Checking for Ticks for a more up-to-date and capable version of the script in this post.

Update 3/13/2012: Updated to make much faster, and to fix the “Files that have different sizes on disk than what’s listed in the DB” command, which was incorrect.

Recently, we moved the alf_data directory for our production system to a new set of disks. As we did so, a glitch meant that a few of the files were not correctly copied over. We got it sorted out fairly quickly, but it did serve to make me paranoid about the consistency between what we have listed in the Alfresco DB and the files we have on disk.

So, this morning, I decided to write up a little unix shell script to do that comparison. Fortunately, it’s made a lot easier by the fact that there’s a alf_content_url table in Alfresco’s database which lists both every file in alf_data/contentstore as well as the size that Alfresco thinks it is.

This script then became simply an exercise in getting a file listing and database dump from the alf_content_url table into the same format so they can be diff-ed.

Important Notes

  1. Use the script below at your own risk, especially the “Files on disk that are not in the Alfresco DB. Can be backed up, then deleted.” bit. Unless you thoroughly understand what this script is doing and are comfortable with it, be very careful here. You could badly damage your Alfresco repository.
  2. I’m running EE 3.4.7 on RHEL5 and MySQL, and that’s what this script is written for. It shouldn’t be that hard to modify it for use on other platforms or other databases.

The script

You’ll need to make the following substitutions in the script below. You should remove the square brackets when you perform the substitution.

  • [username]: The username Alfresco uses to access its database
  • [password]: The password Alfresco uses to access its database
  • [host]: The hostname of the database server
  • [port]: The port on the database server to connect to
  • [database]: The name of the database on the database server
  • [alf_data directory]: The path to the ‘alf_data’ directory, for example /opt/alfresco/alf_data.
  • [escaped alf_data directory]: The path to the ‘alf_data’ directory with all the ‘/’s replaced with ‘\/’s, for example \/opt\/alfresco\/alf_data.
#!/bin/bash

TEMP="$(dirname "$0")"

cd "$TEMP";

/usr/bin/mysql --skip-column-names -u [username] --password='[password]' \
-h [host] -P [port] -e "select content_size, content_url FROM alf_content_url " \
[database] | /bin/sed 's/store:\//\./' | /bin/sort -n --field-separator='/' -k2 -k3 -k4 \
-k5 -k6 -k7 > ./compare_db_results.txt

/usr/bin/find [alf_data directory] -type f \
| /usr/bin/xargs du -b | /bin/sed 's/[escaped alf_data directory]\/contentstore/\./' \
| /bin/sort -n --field-separator='/' -k2 -k3 -k4 -k5 -k6 -k7 > ./compare_file_results.txt

/usr/bin/diff compare_db_results.txt compare_file_results.txt \
| /bin/sort --field-separator='/' -k2 -k3 -k4 -k5 -k6 -k7 > compare_store_diff.txt

echo "WARNING: Files that have different sizes on disk than what's listed in the DB"
/usr/bin/uniq -f2 -D compare_store_diff.txt | /bin/grep "[<>]" | /bin/sed 's/^>/---\n>/'

echo
echo "WARNING: Files listed in the Alfresco DB that are not on disk"
/usr/bin/uniq -f2 -u compare_store_diff.txt | /bin/grep "<"

echo
echo "Files on disk that are not in the Alfresco DB. Can be deleted."
/usr/bin/uniq -f2 -u compare_store_diff.txt | /bin/grep ">"

Future improvements

The alf_content_url table also includes a CRC for each file. It would be nice to generate and check that CRC against the files on disk, but that might make this script unusably slow.

Advertisements
This entry was posted in Scripts. Bookmark the permalink.

4 Responses to Counting the livestock [Updated]

  1. Luke says:

    Very nice – Thanks!

    • icrew says:

      Glad you’re finding it useful, Luke! I just did a bug fix update to this post, so you might want to check back and get the more current version of the script….

  2. Mike says:

    Very awesome Ian, good title too!

  3. Pingback: Checking for Ticks | Wandering Alfresco

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s