≡ Menu

high availability

Firewall Builder Logo

This article continues mini-series started with the post Introduction to Firewall Builder 4.0. This article is also available as a section in the "Firewall Builder Cookbook" chapter of Firewall Builder Users Guide 4.0.

Firewall Builder 4.0 is currently in beta testing phase. If you find it interesting after reading this post, please download and try it out. Source code archives, binary deb and rpm packages for popular Linux distributions and commercially distributed Windows and Mac OS X packages are available for download here.

In this post I demonstrate how Firewall Builder can be used to generate firewall configuration for a clustered web server with multiple virtual IP addresses. The firewall is running on each web server in the cluster. This example assumes the cluster is built with heartbeat using "old" style configuration files, but which high availability software is used to build the cluster is not really essential. I start with the setup that consists of two identical servers running Linux but in the end of the article I am going to demonstrate how this configuration can be converted to OpenBSD with CARP.
[click to continue…]

Linux Condor security and bug fix update

Condor is a specialized workload management system for compute-intensive jobs. It provides a job queuing mechanism, scheduling policy, priority scheme, and resource monitoring and management.

A flaw was found in the way Condor interpreted wildcards in authorization lists. Certain authorization lists using wildcards in DENY rules, such as DENY_WRITE or HOSTDENY_WRITE, that conflict with the definitions in ALLOW rules, could permit authenticated remote users to submit computation jobs,
even when such access should have been denied. (CVE-2008-3424)

How do I fix this bug in Condor Software?

Type the following command to fix this bug
# up2date -u
If you are using Red Hat Enterprise MRG 1, enter:
# yum update

Bug Fixed in this update

* the /etc/condor/condor_config file started with "What machine is your
central manager?". The following line was blank, instead of having the
"CONDOR_HOST" option, causing confusion. The "What machine..." text is now

* condor_config.local defined "LOCK = /tmp/[lock file]". This is no longer
explicitly defined; however, lock files may be in "/tmp/", and could be
removed by tmpwatch. A "LOCK_FILE_UPDATE_INTERVAL" option, which defaults
to eight hours, has been added. This updates the timestamps on lock files,
preventing them from being removed by tools such as tmpwatch.

* when a "SCHEDD_NAME" name in condor_config ended with an "@", the
system's hostname was appended. For example, if "SCHEDD_NAME = test@" was
configured, "condor_q -name test@" failed with an "Collector has no record
of schedd/submitter" error. Now, the hostname is not appended when a name
ends with an "@". In High Availability (HA) Schedd deployments, this allows
a name to be shared by multiple Schedds.

* when too few arguments were passed to "condor_qedit", such as
"condor_qedit -constraint TRUE", a segfault occurred. Better argument
handling has been added to resolve this.

* due to missing common_createddl.sql and pgsql_createddl.sql files,
it was not possible to use Quill. Now, these files are included in

* "condor_submit -dump ad [file-name]" caused a segfault if the [file-name]
job contained "universe = grid".

* previously, a condor user and group were created if they did not exist,
without specifying a specific UID and GID. Now, UID and GID 64 are used.
The effect of this change is non-existent if upgrading the condor packages.
If an existing condor user and group are manually changed, problems with
file ownership will occur.

Configuration changes (from the Condor release notes - see link below):

* a new CKPT_SERVER_CHECK_PARENT_INTERVAL variable sets the time interval
between a checkpoint server checking if its parent is running. If the
parent server has died, the checkpoint server is shut down.

* a new CKPT_PROBE variable to define an executable for the helper process
Condor uses for information about the CheckpointPlatform attribute.

* STARTER_UPLOAD_TIMEOUT now defaults to 300 seconds.

* new variables (booleans) PREEMPTION_REQUIREMENTS_STABLE and
PREEMPTION_RANK_STABLE, configure whether attributes used in

default value of 5, defines the number of simultaneous WS destroy commands
that can be sent to a server for type gt4 grid universe jobs.

* now, VALID_SPOOL_FILES automatically includes the "SCHEDD.lock" lock file
for condor_schedd HA failover.

* the default value for SEC_DEFAULT_SESSION_DURATION has been changed from
8640000 seconds (100 days) to 86400 seconds (one day).

Important: these updated packages upgrade Condor to version 7.0.4. For a
full list of changes, refer to the Condor release notes:

condor users should upgrade to these updated packages, which resolve these

Xen is one of the leading Virtualization software. You can use Xen virtualization to implement HA clusters. However, there are few issues you must be aware of while handling failures in a high-availability environment. This article explains configuration options using Xen:

The idea of using virtual machines to build high available clusters is not new. Some software companies claim that virtualization is the answer to your HA problems, off course that's not true. Yes, you can reduce downtime by migrating virtual machines to another physical machine for maintenance purposes or when you think hardware is about to fail, but if an application crashes you still need to make sure another application instance takes over the service. And by the time your hardware fails, it's usually already too late to initiate the migration.

So, for each and every application you still need to look at whether you want to have it constantly available, if you can afford the application to be down for some time, or if your users won't mind having to relogin when one server fails.

=> Using Xen for High Availability Clusters [onlamp.com]

My previous article related to iSCSI storage and NAS storage brought a couple of questions. An interesting question from my mail bag:

I've 5 Debian Linux servers with HP SAN box. Should I boot from SAN?

No, use centralized network storage for shared data or high availability configuration only. Technically you can boot and configure system. However I don't recommend booting from SAN or any other central server until and unless you need diskless nodes:

[a] Use local storage - Always use local storage for /boot and / (root) filesystem

[b] Keep it simply - Booting from SAN volumes is complicated procedure. Most operating systems are not designed for this kind of configuration. You need to modify scripts and booting procedure.

[c] SAN booting support - Your SAN vendor must support platform booting a Linux server. You need to configure HBA and SAN according to vendor specification. You must totally depend upon SAN vendor for drivers and firmware (HBA Bios) to get thing work properly. General principle - don't put all your eggs in one basket err one vendor ;)

[d] Other factors - Proper fiber channel topology must be used. Make sure Multipathing and redundant SAN links are used. The boot disk LUN is dedicated to a single host. etc

As you can see, complications started to increases, hence I don't recommend booting from SAN.

Arun Singh shows us how to create shared storage on SUSE Linux Enterprise Server 10 using OCFS2 (Oracle Cluster File System v2 for shared storage) and Xen Virtualization technology. Enterprise grade shared storage can cost you lots of money but here no real expensive shared storage used. The information provided here works with real shared storage as well:

This paper is to help you to understand the steps involved in creating shared storage without using expensive shared storage. Using this information you can create shared storage used by all xen guest OS and Host, avoiding copying of files between guest OS's. Hope you will find this paper useful.

You can easily port instructions to Redhat or any other Linux distro without a problem. You can also use Redhat's Global File System (GFS) too. We often use Fibre Channel or iSCSI, devices for GFS shared storage.

Creating shared storage on SUSE Linux Enterprise Server 10 using Xen and OCFS2 [novell.com]

On a related note there is also article about creating a highly available VMware Server environment on a Debian Etch system.