Blog

Answers about Puppet

DevOps, Automation

Universe and Everything

Need Puppet help?

Contact Alessandro Franceschi / example42
for direct expert help on Puppet.
If solution is quick, it’s free. No obligations.

The Next Generation of Example42 Puppet modules

Update 20130902: Read the post Next Gen Modules: Lessons Learned for current ideas and considerations about NextGen modules

Introduction

The Example42 Puppet modules collection has tried to establish a standard, integrated, reusable and coherent way to manage Puppet modules based on the principles exposed in a pair of PuppetCamp presentations: Puppet Modules Standards and Interoperability and ReUse Your Modules!

It has been initially developed and conceived for Puppet versions 0.2x and has introduced and followed some concepts that I've found quite useful in various situations, such as monitoring and firewall abstraction, "include and play" approach without the enforcing of custom logic, easily extendable multiple operating system support, attempt (only partially successful) to separate "core" module elements from custom ones, standard structure that allows scaffolding and relatively quick creation and upgrade of modules.

Still, experience and the evolution of the Puppet language have suggested new and better ways of "doing things with Puppet" and, more important, most of the modules are going to have some issues (as probably a relevant part of the existing Puppet code in the world) when Puppet 2.8 will be released and variables dynamic scoping will be discontinued.

For this reason I've decided to rewrite the whole modules set from scratch and make a new generation of Puppet modules, that is going to be compliant only with Puppet Masters with version major than 2.6 and will benefit of all the experience (mistakes?) made with the current modules.

The effort is not as big as one may think, given the way modules are organized (most of the existing changes are in params.pp and custom module-specific defines are placed in dedicated files that can be imported) and the fact that module's generation is based on the scaffolding of a "foo template".

Still there's much work to do ...

Existing and NextGen features

You can find the next generation of Puppet modules at http://github.com/example42/puppet-modules-nextgen

consider that currently that module set is experimental work, not ready for production nor for testing, but it has the base structure that can give glimpses of what is going to become.

All the application related modules are git submodules contained in independent git repositories (this is a first fix and evolution of the older one-repo modules set that will allow better integration in the Puppet Module Forge and possibly, better cherry picking of the desired modules), but some special modules and directories are part of the main git repo.

The main features of the modules are:

- Coherent and standardized structure, logic and usage based on best practices

- Cross OS support (main targets are Redhat and Ubuntu derivatives)

- Use of parametrized classes and fully qualified variables for Puppet 2.8 compliance (with support for mixed approach) (This is a "NextGen" feature).

- Extreme customization options without any change to the core module (NextGen)

- Optional integration with Puppi

- Optional support of monitoring and firewalling abstraction

- Decommissiong support: you can remove (almost) whatever you've added with a module, Monitoring elements included.

- Auditing support: you can audit changes that the modules would do to existing files, before applying them (NextGen)

- Integrated rspec-puppet tests (NextGen)

- Embedded documentation compliant with PuppetDoc

- Modules scaffolding based on different kind of basic foo templates

- Compact code optimized for compilation and reporting times. (NextGen)

Parametrized classes and variables

All the main classes are now parametrized: you can pass all the parameters they use as explicit arguments.

This allows better introspection on the parameters used by the classes, with a coherent and standard API (at least inside Example42 modules) to the modules functionality, but introduces new challenges in the definition of a whole Puppet setup:

- You can use the same parametrized class only once

- Besides internal defaults, you have to explicitly declare all the parameters you want to pass.

Since I wanted to provide a rich and standard set of common parameters, that allows users to affect the behavior of the class without changing it, this could result in having a lot of similar and redundant code parts to manage these arguments  (such as if to enable automatic monitoring and what tools to use).

For this reason all the next-gen classes have these features and twists:

- all the parameters' defaults are defined in the params.pp class where cross OS variations are managed and top scope variables eventually used to redefine defaults

- you can therefore use top scope variables (such as the ones defined in an External Node Classifier) instead of parameters

- you can mix top scope variables and class parameters (at first look this might not sound a good idea, but keep on reading...)

- parameters, if defined, always override the module internal defaults and top scope variables.

So, let's take the openssh module (which is actually the one I'm testing for different cases).

You can use it in both the old include way:

include openssh

or as a parametrized class:

class { "openssh" : }

by default this just installs the relevant package, starts the relevant service and doesn't change configuration files.

You can provide parameters in two ways.

1- Define top scope variables (the ones you define in a ENC or in site.pp) and include the class: 

$::openssh_template = "example42/openssh/sshd.config.erb" 
include openssh

2- Pass the arguments in the parametrized class:

class { "openssh":  
 template => "example42/openssh/sshd.config.erb",
}

and you can mix top scope variables and class parameters to manage site-wide settings reducing the verbosity of common arguments:

$::monitor = true
$::monitor_tools = [ "nagios" , "puppi" , "monit" , "munin" ]
$::puppi = true
class { "openssh":  template => "example42/openssh/sshd.config.erb", }

The above does the same of:

class { "openssh":  
  template      => "example42/openssh/sshd.config.erb",
  monitor       => true,
  monitor_tools =>  [ "nagios" , "puppi" , "monit" , "munin" ],
  puppi         => true,
}

which, if repeated for many classes can actually be too much redundant.

Needless to say that the values of the top scope variables can be assigned via Hiera, Extlookup or other functions that set specific values to the variable according to custom logic.

Customize everything, modify nothing

You have already had a glimpse of the parameters you can pass to these classes: the path of the template to use for the main configuration file, for example.

The whole discussion about externalizing Puppet data sources and separating it from the module logic with functions like Hiera is, IMHO, a very important but not resolutive step towards full modules' reusability.

A module, according to me, should not enforce the way it provides its configuration files (static files, eventually based on an array of possible sources,  templates or even concatenated files), it should allow site specific customizations where existing resources are modified or new ones added, and should be adaptable to different scenarios.

To our openssh class you can pass variables like:

class { "openssh":  
  source   => [ "puppet:///modules/lab42/openssh/sshd_config-${hostname}" , 
                "puppet:///modules/lab42/openssh/sshd_config" ], 
}

So you can specify the name of the Puppet path  of the static files to source (in this case an array).

But you can also do something like:

class { "openssh":  
  source_dir       => "puppet:///modules/lab42/openssh/" , 
  source_dir_purge => false, 
}

in order to provide the whole content of the configuration directory based on the path specified in source_dir (in this case, with the source_dir_purge option disabled, without removing eventual existing files on the local system that are not in the source directory on the PuppetMaster).

But if you prefer to use templates, or they are more fitting for your setup, you can use the template argument instead of the source one (note that the parameters source and template can't coexist):

class { "openssh":  
  template => "lab42/openssh/sshd_config.erg" , 
  options  => { 
    'LogLevel => 'INFO',
    'PermitRootLogin => 'yes',
    'ListenAddress' => '0.0.0.0',
  },
}

Here besides the usage of a custom template you can specify, as an hash, whatever custom options you may want to use in your template. They needn't to be explicitely added as parameters of the openssh class, so you can actually provide custom parameters to use in your templates without the need to explicit them in the called class.

Given the above example, in your own lab42/openssh/sshd_config.erg template , you can use the custom values provided in the options hash in this way:

[ ...]
# Direct usage example 
PermitRootLogin <%= options['PermitRootLogin'] %>
LogLevel <%= options['LogLevel'] %>
# Conditional usage example, with default value setting
<% if scope.lookupvar("openssh::options['ListenAddress']") then -%>
ListenAddress <%= options['ListenAddress'] %>
<% else -%>
ListenAddress 0.0.0.0 # Default value
<% end -%>

Now, even if this approach has its evident disadvantages (not exactly easy to use, you need to place some extra logic in the template to manage defaults in order to avoid the danger of having blank fields for options not explicitly passed) it provides a mathod to affect, in a completely independent way, the main class behavior without touching anything of its module.

Still this might not cover all your customization needs, you may want to add custom resources to the basic openssh class, or redefine arguments for existing arguments for which there's not already a class parameter. Well, you can specify a custom subclass to automatically include:

class { "openssh":  
  my_class => "openssh::lab42", 
}

This make the class openssh include openssh::lab42 a custom class that may inherit or not the main openssh class (inheritance would be needed only when you have to modify existing resources defined in openssh). 

Note that in order to allow class inheritance your custom class name should be something like openssh::myname (and not myname::openssh).

So, if you would like (as good and recommended practice) to place your custom classes in a custom module (eventually named myname), you might need to explicitly  import somewhere your myname module with all your custom classes, templates and files, since class autoloading in this case wouldn't work (it actually depends on the Puppetmaster's version: on 2.6 you need to explicitly import your myname module with custom classes, on 2.7 it seems not necessary).

In any case, as usual, all these are options left to the module's user: usage of custom source files or whole configuration directories, addition of custom classes and resources, usage of custom templates with custom extra options are possible without any modification of the core module, but, if you simply get the module and modify it directly to fit your needs, adding there resources, custom arguments or whatever, you are free to do that.

Maybe this won't be not the most "reusable way", it will break upstream compatibility (but Github pull requests are always welcomed ;-), but will work for you.

And this is what matters.

"Common" options and features

Most, if not all, the next-gen modules are going to have a set of parameters that affect their basic behavior. 

They are common in the sense that you'll find them in all the Example42 modules (most of these features are actually already present in the current set) but actually they are not so common in the world of Puppet modules.

Some are related to the decommissioning of resources: as whoever uses Puppet knows, if you want to remove something deployed via Puppet you can't simply comment or remove the Puppet code that placed it. You need to explicitly tell Puppet to remove the resource.

This can be a big PITA in some cases or can be just a matter of typing:

class { "openssh":  
  absent => true, 
}

This not only removes the openssh package, its configuration files and the service but all the relevant monitoring configurations that the module provided (more on that later). You get it? Decommissioning is automatically done on nasty beasts like Nagios configurations, Monit control files and so on..

Similarly you can specify:

class { "openssh":  
  disable => true,
}

in order to keep the openssh package but disable its service (and the relevant monitoring).

And you can also set:

class { "openssh":   
  disableboot => true, 
}

for the specific cases (typically when the service is managed by a cluster) where you don't want to start the service at boot but also don't want to check if it's running and eventually stop it during a Puppet run.

Nothing new, up to now, these features are already present in the current modules, even if managed in a less elegant way (now there are sub classes like openssh::absent to include and generally more verbosity in the module). A new addiction is an option like this:

class { "openssh":   
  audit_only => true, 
}

which does what it suggests. When you specify it the class doesn't change any existing configuration file, it just activates Puppet's audit metaparameter and lets you see what are the changes that the class would make to your files (a sort of noop run with central reporting of the expected changes). Should be useful to see and prevent the disasters you're going to distribute when applying your Puppet modules on existing nodes.

There are various other parameters that you cat pass to the class, such as: package, service, service_status, process, process_args, config_file_mode, config_file_owner, config_file_group, config_file_init, pid_file, data_dir, log_dir, log_file, port, protocol.

For them there's not the "top scope variable" equivant since, generally, they are for module's internal use and are not expected to be modified, normally, but still you can pass them as parameters to the class.

In the openssh/manifest/params.pp file the right values for the main Operating Systems (or at least Linux distros) are already set, but there are special cases where it can be  nice to have the opportunity to alter them.

For example you may need to use custom package names, built internally, which use custom service names and process names, or you may need to change the owner of the configuration file(s) to allow "modifications by non privileged users ;-)"or make some tweaks to the resources to monitor (various of the above parameters are used just for the monitoring abstraction and the Puppi stuff).

Just note that changing these parameters may yield to untested and unpredictable results (for example the port argument is used for monitoring and it doesn't necessarily reflect a parameter in the configuration file if it's not explicitly used in a relevant template).

Finally, talking about "common" features, the modules are supposed to have cross Operating Systems compatibility, they should work for RedHat 5/6 and derivativesDebian 5/6 and derivativesUbuntu 10.04 and later (I'm fed up to quarrel with Ubuntu 8.04 operatingsystem fact). Suse support sooner or later will probably be introduced, Solaris too, eventually. Generally speaking development of alternative systems support is done upon necessity: when I'll have to work on these OS, the relevant support will be introduced (call it job driven development ;-).

Monitoring and firewalling abstraction

One of the features I'm proudest of, in the Example42 modules, is the monitoring abstraction approach: In the modules I define what to monitor, not how.

It has proven to be extremely powerful and useful even if I managed to write "connectors" only for a limited number of monitoring tools: Nagios, Monit, Puppi (more on this later) and, even if they auto configure themselves, Munin and Collectd.

Again, I didn't add new "tools" because I hadn't the opportunity to use them in a "real world", but I'm quite confident that the abstraction model works also for most of the cases, eventually with some tweak.

Basically the point is that, if you want to monitor the resources (typically the listening port and the running process) provided by a class you can add these parameters :

class { "openssh":  
  monitor        => true,
  monitor_tool   => [ 'nagios','puppi','monit','munin' ], 
  monitor_target => $ipaddress_eth1,
}

With the above parameters you enable the monitoring of Openssh with the specified monitoring tools specifying the Ip address to use as target for monitoring (by default is $ipaddress and you generally don't need to define it, here it was just placed for reference).

Note that, as most of the other arguments and as shown before, you can use top scope variables ( for example $::openssh_monitor_tool ) to set these configurations, but, most important, you can set a "site wide" general behavior with the variables $::monitor , $::monitor_tool and $::monitor_target and have per module overrides with the arguments written in the example above or the analog variables: $::openssh_monitor , $::openssh_monitor_tool and $::openssh_monitor_target .

Actually the setting of general top scope variables that might be overriden, case per case, by parametrized class specific arguments is the real reason that can justify the usage of the mixed approach.

A similar approach is used for automatic firewalling of the ports provided by the module.

You can set these parameters:

class { "openssh":  
  firewall      => true,
  firewall_tool => [ 'iptables' ], 
  firewall_src  => "0.0.0.0/0",  # This is the default value
  firewall_dst  => "$ipaddress", # This is the default value
  port          => "22",         # This is the default value
  protocol      => "tcp",        # This is the default value
}

as you see most of them have sensible defaults and are reported here as reference. The only firewall tool currently supported is Example42's iptables module, but it's possible, as for the monitor metamodule, to create connectors for other modules and eventually also for a central network firewall.

One of the features of the existing module set that is probably not going to be implemented in the next gen, is the backup abstraction. It's based on similar logic (a module defines what to backup and then in a backup metamodule are defined the connectors for different backup tools), but, honestly, it has not been really used up to now.

The underlining idea behind all these abstactions, is that somehow is reductive to use a Puppet module JUST to configure an application: a module inherently may have a lot of information about it: what are its components (package and service names, configuration files, but also, with limited extra effort, process names, listening ports, log and data paths...), the services it provides, their dependencies.

All this information can and should be used for many infrastructure activities: monitoring, firewalling, backup, maybe storage management, possibly network configuration, dependencies provisioning and, why not, local access to relevant information...

Oh My Puppi!

Sometimes I feel like I really never managed to express properly what is Puppi and what it can do.
The fact that even in the work environment where I introduced it (where we deploy dozens of different applications each day with it, either with a single command line, a cronjob, a mcollective command or an automated task) people make confusion between Puppet and Puppi (yes the name doesn't help) should make me ponder about this.

Let's try to summarize in a few words what is Puppi.

Puppi has basically two different and indipendent functions:
- Deploy applications
- Get information about the system

More concretely Puppi is a Puppet module that installs on a system a bash command and all the scripts and files that it needs to perform its subcommands.

The Puppet module can be imported indipendently (yes, you can use it without the whole Example42 bunch) and just included in your nodes.

Once you have it you have at disposal defines, in Puppet language, that allow you to perform the above functions.

In order to manage application deployments there's not much effort to achieve appreciable results. For example, this define:

puppi::project::war { "myapp":
    source      => "http://repo.example42.com/deploy/prod/myapp.war",
    deploy_root => "/usr/share/tomcat/webapps",
}

makes it possible to issue the command "puppi deploy myapp" on the node where you placed it.

What that command does, what are the different things you can deploy and the options you have to customize the procedure is better described elsewhere in this site (hint: look at the top menu).

For the second function, "get information about the system", you have to use other defines for specific Puppi commands. Out of the box Puppi already provides some output to show when you type puppi check, puppi info or puppi log (some system wide information is shown in these cases) but it becomes more interesting and powerful when application specific information is made available (for example where are its logs, its configuration files, what's its status and if it's working properly).

The Example42 modules (many of the existing and all the NextGen ones) have integrated optional Puppi support.

That is you can use them also without using Puppi, as you can use Puppi without Example42 modules, but if you use Example42 modules and activate Puppi, some magic happens. Activate Puppi with:

class { "openssh":  
  puppi        => true,
  puppi_helper => "myhelper", # Default is "standard"
}

or, as you can guess by now, with the top scope variable $::puppi set to true. 

Now the bad news: the implementation of Puppi for the NextGen modules still doesn't exists (!?).

I've totally changed the way to provide data for Puppi in the module.
Up to now modules created various files that were (are) used for the various Puppi subcommands.
The new approach is lighter and moves much of the work on the local system: it just creates a single yaml file containing all the variables and parameters used by the module (yes, all the "Knowledge that Puppet alreeady knows on the system: names, paths and so on"). The puppi_helper is just one more variable that will allow the personalize the scripts that will use those variables in order to yield custom outputs. 

Still there's not yet a Puppi something on the local system able to do sane things with that yaml.

I don't expect it will be difficult to do that but I still have to figure out a sane way to keep backward compatibility and if the new Puppi will be a Puppet face (this would make it unfit for Puppet 2.6 and this bothers me).

So not much more to say about Puppi here, let talk about...

The painful art of testing

I must confess it. Testing and QA on the existing modules have been crap.

Fixes have been pushed upstream from different working environments on specific distros, quickly tested on other ones. Some modules were obsoleted and not updated and, even worse, the sample "Puppet infrastructure" shown on this site not always was aligned to the modules' changes and this led to brutal errors that even if quick to fix in some cases could be disturbing and misleading for new users.

I would like to be more rigorous about this, I would like to provide modules that can be updated safely, that behave consistently on different distros and "always" work out of the box.
I also want to introduce finally an automated testing environment for different operating systems, where not only the catalog is tested for the expected resources (as Cucumber and Rspec Puppet do) but the deployed application is working as wanted.

One of the common mistakes that is easy to do with Cucumber Puppet or Rspec Puppet is just to check if Puppet does what it is expected to do: I define a resource and check if that resource is present on the catalog.
This is something that the same Nikolay Sturm, author of Cucumber Puppet has stressed in different occasions: test the logic of your modules, not if Puppet works (that's PuppetLabs' job).

In the NextGen modules I inserted some Puppet Rspec tests that try to verify some internal logic, but to be honest I'm not totally won by testing approaches based on catalog checks.

You can have a wonderful and totally sane catalog that when applied to a node breaks a service because there's a syntax error in the service's configuration file. This is something that you will never discover if you look just at the catalogue. Now, I don't dare to enter here in a discussion about best practices for managing Puppet change in production environments, for the sake of this post just let me say that I'll probably automate a procedure that builds different operating systems, runs Puppet, triggers Puppi checks and notifies the result. Eventually at each post-commit, and have softer and quicker rspec tests in the pre-commit stage.

I've called a similar approach "Test Driven Puppet Infrastructure Development", but people around are doing similar things with different names, approaches and tools. The value added I see is that when you write tests for your modules you are actually writing checks that you can use in your monitoring tools.

Let's finally face another characteristic of these modules: scaffolding.

Is modules templates scaffolding a good idea? 

Looking at some answers at the feedback page on this site various people don't consider it so.

Maybe I haven't well expressed the advantages I see in "modules cloning" or maybe is really a bad idea.

The point is that to make new modules I always start from an existing "foo" template: the skel of a full featured module, that is cloned, renamed (by the script 00_example42_scripts/module_clone.sh ) and then customized according to the application specificity.

This doesn't mean that all the modules are the same with a massive renaming, changes are added in terms of specific resources or defines according to the single applications.

The value added I see in this approach is that I can very quickly create a basic module with all the features we have seem so far.

Another plus is that modules keep a standard structure, coherent naming and parameters and relatively easy upgradability.

Still not all the applications need similar module design, so, in the NextGen, I decided to create different foo templates for different type of modules: from the typical package/service/configuration file to layouts dedicated to java or php web applications, where you can even decide if to install them from package or sources (or via Puppi).

I'm testing these different layouts with the openssh modules, we are seeing it, the wordpress and the solr ones.

Actually they are supposed to be the test beds from where to define good starting foo templates from where regenerate all the new modules.

For this reason also I'm writing this post, to gather ideas, suggestions and comments about the "nextgen" openssh module before converting it to foo and make it the master of the clones.

So, if you've read up to now I suppose you are interested in the topic and therefore I would love to hear your opinion about what has beed described here: any suggestion will be considered and pondered, if you think there are design issues, bad choices, wrong approaches, please let me know and argument: commenting this post or directly on GitHub.

Thanks for the attention and the patience.