Finding a penny

Sometimes you stumble across a really valuable tip and just have to share it. I just got one such tip from Loftux in the IRC chat this morning:

Back when we upgraded from 3.4.x to, we had a bunch of problems with getting our production database to upgrade properly. Thanks to some really great help from Alfresco’s support team (thanks Kyle!), we were able to successfully manipulate the database to allow the upgrade to complete successfully. The worrisome thing about this is that the databases for our development and QA instances upgraded without a hitch.

Because of that, we learned to always test a major upgrade on a copy of the production database and repository. The only hard thing about that is that making a copy of our nearly-3-TB production repository to be able to test an upgrade is a real pain. This is where the tip from Loftux comes in. He wrote:

Upgrade tip: Create a copy instance from backup with only the database, skip content (alf_data). Then in set system.bootstrap.config_check.strict=false. Then you can run the upgrade and test that all the patches work without having to copy all your content data. This is an important first test. Your instance will not be usable (you have no content), but saves you the trouble copying all that file content.

Thanks Loftux!

Posted in Uncategorized | 2 Comments

Wearing earplugs

We recently upgraded to Alfresco While we’re definitely happy to be up on the new version (lots of nice new features), one issue has been bugging me: There’s an awful lot of noise in the log file.

Back in 3.4.x, I usually monitored the log file with the command:

tail -500f /usr/share/tomcat/logs/catalina.out | grep -v "^\Wat"

What that does is to strip out all of the stack traces from the log (all of the lines beginning with ” at”), making the log file much more readable. As this was just affecting which lines are displayed, not what’s actually in the log file, I could turn off the grep filtering if I ever wanted the complete version for troubleshooting.

However, in, there’s a lot more spurious stuff in there. In addition to the stack traces, there’s also:

1) Very frequent repetitions of the lines:

Sep 7, 2012 3:35:03 PM org.apache.tomcat.util.http.Parameters processParameters
WARNING: Parameters: Invalid chunk '' ignored.

2) Complete copies of the raw HTML of the error page served up whenever a HTTP 500 (internal error) error occurs.

3) A bunch of blank lines

So, this afternoon, after a bit of chatter about this on the Alfresco IRC channel, I decided I really needed a better version of my log file monitoring command. Here’s what I developed:

tail -7500f /usr/share/tomcat/logs/catalina.out | perl -nle 'print if (($_ !~ /^\W*[<a\.]/) && ($_ !~ /^$/) && ($_ !~ /org.apache.tomcat.util.http.Parameters processParameters/) && ($_ !~ /Invalid chunk/))'

Note that the line above scrolls sideways so it doesn’t get word-wrapped incorrectly.

To break down the individual portions of that perl “print if” command:
($_ !~ /^\W*[<a\.]/): Don’t print any lines that begin with whitespace (optional), followed by either a “<” (getting rid of the error 500 HTML lines); an “a” (getting rid of the ” at” lines); or a “.” (getting rid of the ” … 40 more” lines)

($_ !~ /^$/): Don’t print any blank lines

($_ !~ /org.apache.tomcat.util.http.Parameters processParameters/): Get rid of the lines containing “org.apache.tomcat.util.http.Parameters processParameters”

($_ !~ /Invalid chunk/): Get rid of the lines containing “Invalid chunk”

This isn’t an earth-shattering improvement, but it does make the log file a bunch easier to read, for me at least.

Posted in Uncategorized | 1 Comment

Walking faster

Update 6/29/2012: Andrew Laurence did a nice write-up of the background behind these improvements for the TidBITS blog about a week ago. There are also some interesting comments there.

In the recently-released Mac OS X 10.7.4 update, Apple has implemented changes to its WebDAV client that dramatically speed up connections. If you are connecting to Alfresco by mounting it in the Finder, you’re in for a much improved experience, especially if you’re using the authentication chain to allow users to authenticate against an external authN/Z provider.

In our environment, these speed-ups are on the order of 2.5 times faster. This is especially evident when browsing between folders or working with large numbers of small files.

These changes also make the WebDAV connections more reliable and stable.

So, if you’re using Mac OS X Finder to connect to Alfresco, I’d strongly recommend updating to Mac OS X 10.7.4.

From a technical point of view, all they did was to add support for session cookies to the WebDAV client.

To connect to the Alfresco from the Mac OS X Finder, here’s what you do:

  1. In the Finder, choose Connect to Server… from the Go menu
  2. Enter in the Server Address: field of the window that appears.

    A question for the audience: In our environment (tomcat running behind Apache, where Apache is set to do a 302 redirect from http to https), entering “http://&#8221; instead of “https://&#8221; does not see the speed benefits mentioned here. I’d appreciate any thoughts about why this might be happening.

  3. Click the + button to the right of the server address field to add it to the Favorite Servers list, for easier future access.
  4. Click Connect and enter your Name and Password when prompted.
  5. You’ll see a “webdav” disk appear on your desktop. You can now use that like any other disk on your system.

Until this update was released, we had been recommending that Mac OS X users use Cyberduck to connect to WebDAV on Alfresco. Cyberduck is still somewhat faster than the Mac OS X Finder, but this update makes the Finder much more usable, and suitable for most uses.

Posted in Uncategorized | 2 Comments

Checking for Ticks

This is a follow-on post to my “Counting the Livestock” post from a couple of weeks ago. Reading that post first will give some background to this one.

With many thanks to Alex Strachan and Chris Turner, I’ve been able to greatly improve the script and have it do a somewhat deeper examination of how the files on disk are referenced by tables in the repository. I’ve also greatly cleaned up the script, so it should be easier to understand and modify for use in your environment. (For example, the 7 variables at the start of the script are all you need to change, rather than having to do a search-and-replace for that info throughout.)

First, some background on the structure of the Alfresco database in terms of how it refers to files:

  1. At the lowest level, the alf_content_url table directly contains the paths to the files on disk corresponding to nodes in the repository, as well as their sizes.
  2. The alf_content_data table contains things like the mime type and content encoding, and refers to the alf_content_url table.
  3. Finally, 3 tables refer to entries in the alf_content_data table: alf_node_properties, avm_nodes, and alf_audit_model.

The script below is designed to check all of that for consistency. That is, it will let you know if the 5 tables above all map correctly down to files on disk.

Important notes:

  1. Use the script below at your own risk, especially the “Files on disk that are not in the Alfresco DB. Can be backed up, then deleted.” bit. Unless you thoroughly understand what this script is doing and are comfortable with it, be very careful here. You could badly damage your Alfresco repository.
  2. This is only a reporting tool. If it reports errors in your repository, some may be harmless, some may be indicative of a real problem. This is meant as a starting point for investigations into potential problems, not as a fix for those problems.
  3. Even though the script below only reads from the database and files on disk, and doesn’t make any changes, it’s still a really good idea to have a good backup of the repository and database before running this.

Continue reading

Posted in Scripts | 5 Comments

Having a chat

Lately I’ve been hanging out a bit on the #Alfresco IRC chat. It’s quite a useful resource for quick answers from others in the community, and as moral support from others who know what you’re going through!

I personally use Adium to connect to it, and, since it’s a bit obscure, I thought I’d post some instructions on how to get Adium to connect to the #Alfresco IRC chat:

  1. Choose preferences from the Adium menu
  2. Choose IRC (Internet Relay Chat) from the ‘+’ pop-up menu in the lower left corner of the Accounts tab of the Preferences window.
  3. On Account tab in window that appears, enter
  4. Nickname should be your nickname on irc, e.g. ‘iancrew’. Leave password field blank.
  5. On Personal tab in that window, enter your IRC nickname in the “Username (Ident)” field.
  6. Click OK.
  7. You should see (nickname) in your accounts list, and it should connect pretty quickly.
  8. Close the preferences window
  9. Choose “Join Group Chat” from the File menu
  10. Choose “ (nickname)” from the Account pop-up menu
  11. Enter “#alfresco” in the Channel field. Leave the password field blank.
  12. Click Join.
  13. Once you’re in the chat, choose “Add Group Chat Bookmark” from the Contact menu to save the #Alfresco chat in your contacts list

To have Adium automatically connect to the #Alfresco chat each time it starts up:

  1. Right-click on the #Alfresco listing in your contacts list
  2. Choose “Get Info”
  3. Check the checkbox on the “Bookmark: Automatically join on connect” line

The only thing I don’t like about Adium is that it doesn’t auto-split your posts to the chatroom at 256 characters, which means that longer posts sometimes get cut off. Annoying, but not that big a deal. (Though if anyone knows how to fix it, please let me know in the comments below!)

Hope to see you there! (Yes, “iancrew” is me….)

Posted in Uncategorized | Leave a comment

Improving a trail

Last June, I posted instructions here for how to implement something closely approximating site quotas in Alfresco Share. With thanks to Antonio Soler, I made some pretty major improvements to and simplifications of that script last October. I’ve finally updated that post to reflect the most current (simpler, faster, more reliable) version of that script. This post is really just to let folks know about the updated version of the script. We’ve been using it for several months now, and it works well.

Thanks Antonio!

Posted in Scripts | Leave a comment

Counting the livestock [Updated]

Update 4/27/2012: See my more recent post Checking for Ticks for a more up-to-date and capable version of the script in this post.

Update 3/13/2012: Updated to make much faster, and to fix the “Files that have different sizes on disk than what’s listed in the DB” command, which was incorrect.

Recently, we moved the alf_data directory for our production system to a new set of disks. As we did so, a glitch meant that a few of the files were not correctly copied over. We got it sorted out fairly quickly, but it did serve to make me paranoid about the consistency between what we have listed in the Alfresco DB and the files we have on disk.

So, this morning, I decided to write up a little unix shell script to do that comparison. Fortunately, it’s made a lot easier by the fact that there’s a alf_content_url table in Alfresco’s database which lists both every file in alf_data/contentstore as well as the size that Alfresco thinks it is.

This script then became simply an exercise in getting a file listing and database dump from the alf_content_url table into the same format so they can be diff-ed.

Important Notes

  1. Use the script below at your own risk, especially the “Files on disk that are not in the Alfresco DB. Can be backed up, then deleted.” bit. Unless you thoroughly understand what this script is doing and are comfortable with it, be very careful here. You could badly damage your Alfresco repository.
  2. I’m running EE 3.4.7 on RHEL5 and MySQL, and that’s what this script is written for. It shouldn’t be that hard to modify it for use on other platforms or other databases.

Continue reading

Posted in Scripts | 4 Comments