Annotate Server Installation Guide - Part 2: optional modules

This chapter describes how to install the optional modules for Annotate, and should be read after completing the basic install (with PDF and HTML support). Note that installing these modules is more complex than installing the basic Annotate server. You can install all, none or just a subset of these modules, depending on your requirements.

1. Enabling the Apache user to run programs

Typically PHP scripts are run as the apache user - which is a restricted account with no home directory. On Ubuntu linux, the default apache user is set up as www-data; one way to find our which user apache runs as on your system is to type ps aux | grep httpd and see who owns the cluster of 'httpd' processes.

If you want to be able to run openoffice or firefox as the apache user, (to support uploading Word documents and generating thumbnails of web pages) you will need to create a home directory where these applications store their profile settings.

# As root: first check if the apache user
# already has a home directory that it can write to:
#   N.B. if apache runs as a different user
#   e.g. 'www-data' you should replace 'apache' with 'www-data'
#   throughout this guide, e.g.
#     su www-data
  % su apache
  $ cd
  $ touch tmp.tmp

# If this works, and the apache home directory is writable by
# the apache user, you can skip the rest of this section.
# If this fails, you need to set up a home directory
# for the apache user - the example below sets it
# to '/var/www/ahome':-

  % cd /var/www
  % mkdir ahome
  % chown apache ahome
  % chgrp apache ahome

# Enable home directory for apache:
  % vi /etc/passwd

# edit the entry for the apache user to allow logins
# and set the home dir, e.g.:

# Check you can su to the apache user now:
  % su apache

At this point you should also change the settings in the configuration file in annotate/scripts/, which is included by the various scripts for running openoffice and firefox. A sample is provided in which you need to copy to and then edit:

  % su annotate
  $ cd /var/www/html/annotate/scripts
  $ cp
  $ vi

# as the annotate user, edit the settings in annotate/scripts/


2. File conversion. OpenOffice support

Install openoffice 3.x.x or later : It will install itself to somewhere like /opt/openoffice.org3/program.

Many linux distributions only offer older versions of openoffice via their standard install mechanism, so it is worth downloading directly from the openoffice download page. To fetch the openoffice 3.x.x. binary directly onto the server, you can follow the steps below:-

# go to: -
# right-click on the Download link for the version you want (e.g, English-US, Linux RPM)
# and 'copy link location' - it is a HTTP redirect to a download link.
# Use this as the argument to curl to fetch the soffice.tgz as root on server:-

# Login as root...
  % cd /mnt/install/downloads          # or your chosen download directory
  % curl -L "" -o soffice.tgz

# it's about 170Mb, so might take a while to download...

  % tar xvfz soffice.tgz
  % cd OOO300_m9_native_packed-1_en-US.9358/RPMS
  % rpm -Uvih *.rpm

# - you may need to install gnome-vfs2 (for fedora the package is 'yum install gnome-vfs2')
# The default install location is:
#   /opt/openoffice.org3/program/soffice

2.1 Optional: installing openoffice in a non-standard location

If you want to install openoffice in another directory rather than on top of any existing installation, you can use the steps described on: [external]. You can also use this if you do not usually use the RPM package system (e.g. on debian / ubuntu).

# ==== Optional ====
# e.g. in your home directory:
  % sudo apt-get install rpm
  % mkdir oo 
  % cd oo
  % curl -L "" -o soffice.tgz
  % mkdir TEMP
  % cd TEMP 
  % tar xvfz ../soffice.tgz
  % cd OOO300_m9_native_packed-1_en-US.9358/RPMS/
  % mkdir TEMP_ROOT
  % cd TEMP_ROOT
# extract the RPMs...  will make an opt/ subdir
  % for i in ../o*.rpm; do rpm2cpio $i | cpio -id; done

  % mv opt ~    # or where you want the installed version
# run it using (e.g.) 
  % ~/opt/openoffice.org3/program/soffice &

2.2 Testing your openoffice installation:

# Check you can run openoffice as a normal user:
  % su annotate
  $ vi .bashrc
  export PATH=/opt/openoffice.org3/program:$PATH
  $ source .bashrc

# On your local machine, set xhost+ and check your firewall
# can accept X connections on the normal TCP port (6000)
  $ export DISPLAY={your ip}:0.0
  $ soffice &

# on Ubuntu, the executable could be called 'ooffice' not 'soffice'

2.3 Configuring Annotate to use openoffice:

There is a test document in annotate/scripts which you can try, but first you need to check the settings in annotate/scripts/ (a sample is provided in

# As the annotate user, check the paths in the config file: scripts/
  % su annotate
  $ cd /var/www/html/annotate/scripts          # ... where you installed annotate
  $ cp    # ... if not present
  $ vi

# If any of these are not correct, edit the file.
# The OOPYTHON setting points to the version of python which is
# bundled with the installation of openoffice from
# For Ubuntu, you can change this to your standard python install
#  (e.g. /usr/bin/python).  You will need the python-uno package  
#  installed for calling openoffice.

# The following command should convert 'sample.doc' to '/tmp/sample.pdf'
  $ ./ sample.doc /tmp/sample.pdf

2.4 Running openoffice in server mode

The test conversion above started openoffice, converted a document, then killed the openoffice process. This can take a few seconds for each document. You can avoid the openoffice startup time by running it in server mode, listening to a socket for incoming documents. This also has the advantage that you can run openoffice as a separate user from the Apache one (e.g. you could create a new user just to run openoffice).

# As the user you want to run openoffice as:
# (as root)
  % adduser openoffice

  % su openoffice

# (as the 'openoffice' user')
  $ cd /var/www/html/annotate/scripts
  $ ./                 # this starts up openoffice

# Check that the 'soffice.bin' process is running:
  $ ps aux | grep soffice

# Try converting a test file again a couple of times, running
# as the apache user:
# (as root)
  % su apache

  $ ./ sample.doc /tmp/test5.pdf
  $ ./ sample.doc /tmp/test6.pdf

# All being well, the second time should have been much
# faster, as you avoid the startup time of openoffice.

# You need to keep the openoffice process alive all the time
# e.g. using a cron job, as your chosen openoffice user, adding an
# entry like:
# as root...
  % su openoffice
  $ crontab -e
* * * * * bash /var/www/html/annotate/scripts/ >/dev/null 2>&1

While openoffice is running in server mode, the conversions from office formats should be much faster.

2.5 Troubleshooting OpenOffice installs on Fedora, RedHat and CentOS

If you are installing on RedHat, Fedora or CentOS, check this blog post [external] for a solution to a known bug with the yum installation system for openoffice, which can break the openoffice install if automatic updates are switched on.

If the CRON job above is not starting the openoffice process properly, then check /var/log/cron for messages; if you see entries like 'Error: PAM Access Problems', then you may need to explicitly enable the cron daemon to run tasks as the openoffice user, with a line in /etc/security/access.conf.

2.6 Updating the php/ file to enable openoffice support

To enable support for the office formats when you upload a document to your annotate server, edit your php/ file as follows:

# Edit the setting in php/ to point to the script:
  % su annotate
  $ cd /var/www/html/annotate/php        # ... or your install directory
  $ vi
  $ooshcommand="/bin/bash /var/www/html/annotate/scripts/";

# Test it out by uploading a short Word / openoffice file on your
# documents.php page.

2.7 Using openoffice to convert uploaded images to PDF [new Dec 2009]

You can configure openoffice to convert uploaded image files to PDF and then use the same annotation interface as text documents (by default, image files are shown using the HTML annotation interface, in a separate frame). To set this up, add the line below to your file:

// Optional: Uncomment to convert uploaded images to pdf using OO 
  $convertUploadedImagesToPDF = 1;

2.8 Installing Windows Fonts for OpenOffice on Linux

By default, an openoffice installation on Linux will not have access to the standard Windows fonts (Arial, Verdana etc), which can cause problems with the Word to PDF conversion for documents created on a Microsoft operating system. Unlike PDF files, Word files do not include the fonts they depend on, and assume the recipient has the relevant fonts installed. However, it is possible to install the Windows standard fonts on Linux which greatly improves the quality of generated PDFs from Word files.

# Install microsoft truetype fonts on Ubuntu / debian:
  % sudo apt-get install msttcorefonts

This (external) blog entry has details on installing truetype fonts on Linux; another blog entry has notes on using the new MS Vista fonts on Linux.The basic steps for installing TrueType fonts and making them available to applications (including openoffice) are outlined below. On Windows systems, your fonts will be installed to a path like: C:\WINDOWS\Fonts\*.ttf. You will have to restart openoffice after installing fonts.

# Check you have the standard PostScript Type1 fonts installed:
# (e.g. on Fedora:)
  % yum install ghostscript-fonts

# Steps for installing additional TrueType fonts on Linux
  % cd /usr/share/fonts/truetype
  % mkdir myfonts
  % cd myfonts
# ... copy the *.ttf files to myfonts/
  % mkfontdir
  % fc-cache

3. File upload progress meter

To display a progress bar during upload, you need to install a Perl script into the cgi-bin/ directory, and copy the configuration settings from '' to '':

# If you haven't yet set up your server to run cgi-bin scripts yet:
#   As root, check the Apache configuration file
#   (e.g. in /etc/httpd/conf or /etc/apache2/apache2.conf /mnt/install/apache/conf/httpd.conf)
#   If you haven't set up your apache for cgi-bin, check
#   that the mod_alias module is installed.
  %   vi /etc/httpd/conf
#   The cgi-bin setting will be in a line like:-
  ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"

# You can put the Annotate perl scripts in a subdirectory:
# as root:
  % mkdir /var/www/cgi-bin/annotate
  % chmod a+rx annotate
  % cd /var/www/cgi-bin/annotate
  % cp /var/www/html/annotate/cgi-bin/* .

# The settings are in a perl file 'cgi-bin/' - edit this
# to make sure the paths are set correctly. The important
# setting here is the temporary directory to use for uploads,
# as it must agree with the setting in php/
# You can leave it as the default (/tmp/annotate), or change
# it in both places.

  % cp
  % vi

  % chmod a+x *

# Test running the perl script from the command line.
# If you get any error messages here, the script won't
# run from the cgi-bin directory either.

  % ./
# ... this should do nothing.

# Try visiting:  http://your.server/cgi-bin/annotate/ from browser
# to check the cgi-bin is working. It should print out a list of environment
# variables to the browser.

# Gotchas: if you get 'Internal Server Error' it could be
# caused by having DOS not Unix return characters at the end of
# the script lines.  You can fix this with the dos2unix command.
# Also worth checking the log files (somewhere like /var/log/httpd)
# Some perl / cgi-bin installations have security settings which 
# will only run cgi-bin programs if they have the same owner/group
# as the cgi-bin user, so you may need to set the owner of the cgi-bin
# perl scripts if this is the case.

# Edit the php/ file to switch the file upload progress bar on:
  % su annotate
  $ cd /var/www/html/annotate/php
  $ vi

  $uploadtmpdir = "/tmp/annotate";

  $fileuploadprogress = true;
  $fileuploadcgibin = "/cgi-bin/annotate/";

# Try uploading a pdf file by browsing to your documents.php page;
# you should now see a blue progress bar during the upload.

4. Generating thumbnails of snapshotted websites

To generate thumbnails of websites (displayed on your index pages next to the list of notes), you need to install a web browser (firefox) and a virtual X framebuffer (Xvfb). You will also need the netpbm tools (pnmtopng, pnmscale).

# As root:
# (on fedora core 4):-
  % yum install xorg-x11
  % yum install xorg-x11-Xvfb

# (on fedora 8):-
  % yum install xorg-x11-server-Xorg
  % yum install xorg-x11-server-Xvfb
  % yum install xorg-x11-fonts-*
  % yum install xorg-x11-apps-*
# (on ubuntu)
  % sudo apt-get install xvfb
  % yum install firefox

The thumbnail generation will be run as the apache user on some systems. However, in order to enable the apache user to run firefox, there has to be a home directory created for apache (on many systems, apache runs as a restricted user). (see section 1 above for creating a home directory for apache).

# Check you can su to the apache user:
#   (on Ubuntu, this is 'www-data' not 'apache'):
#    su www-data
  % su apache

# Try running firefox as the apache user; store
# the settings in the profile 'test'
# On your local machine, xhost+
  $ export DISPLAY={your IP}:0.0
  $ firefox -CreateProfile test

# Run firefox with the display on your X window:
  $ firefox -P test

# Resize the window to be about 1000 x 1024 pixels.
# When firefox starts again, it will keep
# the size, which will be used for the screenshots
# for the thumbnails.

# At this point, you also need to switch off the
# 'Restore session' window which will appear on
# restarting firefox if it crashes for any reason.
# 1. Type 'about:config' in the location bar.
# 2. go to the 'browser.sessionstore.enabled' setting
# 3. change the setting to false (double-click on the entry)

# Quit firefox, and the settings will be saved in the profile.

To run firefox on the server, you will need to have the Xvfb frame buffer running all the time. One way to do this is to set up a cron job to check Xvfb and start it if it is not running - this will also automatically restart the X display if the process dies for any reason.

# As root...
  % su annotate
# We will run the X framebuffer as the 'annotate' user.

# Check that you can start Xvfb manually:
  $ cd /var/www/html/annotate/scripts/
  $ ./

# If this works ok, check you can run firefox using the Xvfb display:

  $ export DISPLAY=:1
  $ firefox &

# You won't see any output, as firefox is displaying to 
# the Xvfb virtual frame buffer.
# Kill the firefox process if you get no error messages.
# You can use 'jobs' to find the process number, e.g.:
  $ kill %1

# To ensure the framebuffer is always running all the time, you
# can add a CRON job.

  $ export DISPLAY={your ip}:0.0

# Edit the cron list for the 'annotate' user.
  $ crontab -e

# Add a line like (with the correct path to
   * * * * * bash /var/www/html/annotate/scripts/ >/dev/null 2>&1

# You can check the XVfb process is running after a minute using 'top'
# It will create a X server on localhost:1.0

To test it out, you can take a snapshot of a web page, and look at its index page - it should show 'Generating thumbnail...' and then a small image. The image is stored somewhere like: /var/www/html/annotate/docs/{date}/{code}/small.png

If the thumbnail is not generated correctly, you can also try running the thumbnail generator from the command line :

# as root...
  % su apache        # or your apache user, e.g. www-data on Ubuntu
  $ cd /var/www/html/annotate/scripts/
  $ ./
# ... should generate a thumbnail '/tmp/small.png'
# if it doesn't work, check the paths in the scripts/ settings

5. Enabling 'export PDF with notes'

There is Java code included to generate a PDF with the notes attached (from the Tools > Export PDF menu option). To enable this, you need to have installed Java on your server:

# as root...
(on ubuntu)
  % sudo apt-get install sun-java6

(other linux distributions will have different package names)

If 'java' isn't installed to the standard path, you can set the version of java to use with the $javaexe setting in php/

// e.g. ... in
  $javaexe = "/opt/jre1.6.0/bin/java";

6. Set the initial tags available to new users

Each user account maintains a list of tags which have been used by that user, and these are used to populate the tags chooser for new notes. You can initialise this list for new user accounts by editing the text file 'php/inittags.txt' - the format is plain text, one line per tag.

  cd php
  vi inittags.txt

7. Enabling email notifications

To enable email notifications on the server (so users get sent an email when someone adds a comment to a document), you need to set up a regular CRON job to check for news. There is a PHP script php/sendEmailNotifications.php in your installation which you can run by viewing it in your browser - to set up a cron job to fetch this URL every 10 minutes:

# as root...
  % su annotate
  $ crontab -e
*/10 * * * * /usr/bin/curl "" -o - >/dev/null 2>&1

Note that your users will have to choose to switch on email notifications for their account - there is a link on the home page, and the account page lets you control detailed settings (e.g. for immediate, hourly or daily updates).

8. Advances configuration settings

A number of installation settings are present in the file which can be used to change the standard behaviour of Annotate, and use your own logo / branding / messages. The basic settings are below, see your file for details of all the options.

// Optional: Change the default note edit/delete/content settings.
// $authorOnlyDelete = 1;   // Uncomment so doc owner can't delete other's comments.
// $authorOnlyEdit = 1;     // Uncomment so doc owner can't edit others' comments.
// $fixOnReply = 1;         // Uncomment to stop notes with replies being deleted.
// $anyEdit = 1;               // [added v3.0.21] Uncomment to allow any viewer to edit other's comments.
// $allowJavascriptNotes = 1;  // [added May09] Uncomment to allow javascript: urls in notes
// $enforceLinkSharable = 1;  // [added Dec09] Require invite or linkSharable setting to access doc via link

// Optional: Customize the welcome message in the banner of home.php
// $todaysMessage = "Welcome to Annotate and hello world";

// Optional: Override the Annotate logo displayed in the 
// top left with your own logo. You can include html;
// use an absolute URL for images, e.g.:
// $customBannerLogo = "<img border='0' src='' />";

// Optional: Don't send users emails on creating accounts.
// $noNewAccountEmail = 1;

// Optional: Don't give users a welcome document.
// $noSampleDocument = 1;  

// Optional: Customize the footer used when exporting PDFs with notes.
// The default footer just has the page number of the orig document.
// For a footer like this one uncomment the settings below:
//   "Page 1. {document title} - generated by user123 - notes by [joe,jill] - visit"
// $pdffooter_title       = 1; // add document title too.
// $pdffooter_generatedby = 1; // add who it was generated by.
// $pdffooter_annotators  = 1; // add annotators too.
// $pdffooter = " - visit"; 

9. Support large document uploads

Annotate doesn't impose any file size limit for uploads itself - but there will be limits set in your "php.ini" apache/php configuration file. You can find what they are set to on your system, and where your php.ini config file is by pointing your browser at a file 'phpinfo.php' file which includes the line:

 <?php phpinfo(); ?>

Relevant php.ini settings are: file_uploads, upload_max_filesize, max_input_time, memory_limit, max_execution_time, post_max_size. You may want to increase the default settings, e.g. to:

# Sample settings for php.ini:

You will need to restart your web server for any changes to take effect - you can view a phpinfo.php file to check their values.

10. Backing up your documents and notes

All documents are stored in the docs/ folder; all notes are stored in the private/ folder. You should take regular backups of these folders, e.g. by running a cron job which uses the rsync tool to make an incremental remote copy on another server.

11. Apache cache and cookie settings

Annotate has been designed to make use of client web browser caches to minimize the number of server requests for pages and notes. For Apache, a sample htaccess-cache.txt file is supplied which should be copied to annotate/.htaccess. This uses the mod_expires apache module to add a HTTP header to allow browsers to cache static content (such as page images). You need to make sure that the optional mod_expires module is enabled, so uncomment the lines below in your httpd.conf apache config file and restart the web server:

LoadModule expires_module modules/
LoadModule headers_module modules/

On Ubuntu, the configuration of optional apache modules can be done by linking from /etc/apache2/mods-enabled/ to /etc/apache2/mods-available:

$ cd /etc/apache2/mods-enabled
$ ln -s ../rewrite.load rewrite.load
$ ln -s ../headers.load headers.load
$ ln -s ../expires.load expires.load

The default .htaccess file is below: You can force a page reload at any time from a browser using shift-reload (on Firefox) or ctrl-reload (on IE) - or clear your browser cache then reload.

ExpiresActive On
ExpiresDefault "access plus 1 week"

11.1 Enabling compression

Configuring your apache server to serve up compressed versions of html and javascript speeds up the annotate server significantly as transfers of notes and code to the browser will be faster. You need to enable the mod_deflate apache module and edit your .htaccess file or httpd.conf settings as below: (sample provided in htaccess-cache-gzip.txt)

ExpiresActive On
ExpiresDefault "access plus 1 week"

<Files *.js>
SetOutputFilter DEFLATE

<Files *.css>
SetOutputFilter DEFLATE

<Files *.html>
SetOutputFilter DEFLATE

<Files *.txt>
SetOutputFilter DEFLATE

11.2 Cookies and embedding an iframe with IE

If you are embedding an Annotate panel in another application which is hosted on a different site from your Annotate server, you may encounter login problems with Internet Explorer, which by default blocks 3rd party cookies (such as the PHP session cookie) needed for logins to Annotate to work.

Solutions to this are:


12. Creating a robots.txt file for search engines

Search engines will not index any of the documents uploaded to your annotate server unless you post a public link to the document on a website. You can also create a robots.txt file to prevent search engines indexing your content even if a link is published to the web. A sample is provided in robots-sample.txt which you can edit and copy to the root of your web directory so it can be found as - a sample is given below:

User-agent: *
Disallow: /annotate/

13. Custom storage locations for documents and notes

By default, documents are stored in the docs/ folder, and notes in the private/ folder of your annotate installation. It is possible to configure these, so documents are stored in any path on your system using the docsdir and privatedir phpconfig settings. This can be useful if you want to store on a network drive, or just separately from the rest of the annotate install. These must also be specified if running using the Quercus java servlet implementation of PHP rather than regular apache-php.

The once complexity is that if you move the docs/ folder, you also need to edit your web server configuration to ensure that static content from is served from the new folder too.

$docsdir    = "c:/test/resin-4.0.9/webapps/ROOT/annotate/docs/";
$privatedir = "c:/test/resin-4.0.9/webapps/ROOT/annotate/private/";

# ... or on linux:
# $docsdir    = "/var/disk123/docs/";
# $privatedir = "/var/disk123/private/";

# NB you also need to configure your web server to serve
# static content from
# from the docsdir.

14. HTTPS install notes

Since v3.1.15 (Oct 2010) it has been possible to install annotate on a HTTPS server. You need to configure your web server with a certificate (e.g. for testing see this external guide). After this, you just need to configure the path to include https (see below), and access the server through a https: URL. Everything ought to work as normal for PDF and word documents, however HTML snapshots may display browser warnings because they load http: content as well as https: content.

The HTTPS support has been tested on Linux with Apache; contact us if you run into any issues on other configurations.

# sample config for https: 

Questions / problems:

Please email any questions to support [at]