An Ansible pattern for lists of files

Problem: In this scenario, we have a potentially long list of usernames (accounts) and a directory containing SSH public keys, one per file with each file named after the user that owns that key. We want to deploy all the keys for the users in our list.

The sets of users and keys are not identical; there may be more users without keys (a common situation, alas), and we may have many other keys belonging to users not in this list.

(You might instead store your users’ keys directly in a list or dictionary, which would obviate the need for the code below, but you’d forever be copy-and-pasting keys into a YAML data structure unless you have some automated way to keep it up to date. Come to that, I hear those crazy kids even store public keys in LDAP directories these days.)

First issue: a file lookup throws an error if the file doesn’t exist. We could simply iterate over the list of users and set ignore_errors so that we pass on the ones that don’t have keys, then supply a null default value instead:

- name: add SSH keys
  authorized_key:
    user: "{{ item }}"
    key: "{{ lookup('file', 'path/to/keys/' + item) | default('# NONE')}}"
  with_items: "{{ userlist }}"
  ignore_errors: yes

But this is messy and longwinded, as we’re incurring a file lookup, even if it fails, and remote call for every user. (Does default even work with lookup? I have a feeling it may not…)

A naïve first pass at solving this might be to go through the list of users, call the stat module to see if a local key file exists for each one and save the results in a list, and then use a conditional test before trying the lookup for the key:

- name: check for a key file for each user
  local_action:
    module: stat
    path: "{{ item }}"
get_checksum: false
get_attributes: false
get_mime: false
  with_items: "{{ userlist }}"
  register: user_keys

- name: deploy user keys
  authorized_key:
    user: "{{ item.item }}"
    key: "{{ lookup('file', 'path/to/keys/' + item.item) }}"
  with_items: "{{ user_keys.results }}"
  # only deploy keys that exist:
  when: item.stat.exists

(Remember that a registered variable for a looped task contains a list of hashes for each item in the loop, comprising the original object named item and the task result for that object named after the module, in this case stat.)

We use local_action because we’re looking for the key files on the Ansible controller node rather than the client. This at least saves us some remote calls, as we only deploy actual keys that we find, rather than trying to do so for all the users. Here, we reduce the overhead of the stat module slightly by disabling the retrieval of file attributes that we don’t need, such as checksums. But we’re effectively iterating over the entire list of users twice, once for the users and again for their potential keys, many of which may not exist, which is slow and mostly wasted effort.

A better approach would be to iterate over only a list of keys that we know exist:

- name: deploy user keys
  authorized_key:
    user: "{{ item.item }}"
    key: "{{ lookup('file', 'path/to/keys/' + item.item) }}"
  with_items: "{{ user_keys.results | map(attribute='stat') | selectattr('exists') | list }}"
  when: item.stat.exists

Here we pull out all the stat elements from the results list and then select only the ones that have an exists attribute which is true. This may reduce the number of iterations considerably but it is still a loop stepping through one item at a time, and we haven’t avoided doing all that file I/O for the interminable stat lookups.

Ideally, we’d instead generate a listing of all the key files in a single pass (like an ‘ls’ of the directory), then take the intersection of that list with our list of users - i.e. to obtain the list of users for whom we have keys.

My first thought was to use the fileglob lookup to create a list of all the key files. However, fileglob returns the full paths for all the objects it finds. You might think it would be possible to use the Jinja2 map function to apply the basename filter to every element of the list, thus stripping the paths and leaving only the filenames:

key_names: "{{ lookup('fileglob', 'path/to/keys/*') | map('basename') | list }}"

But this doesn’t work for some reason that isn’t obvious to me; instead it breaks the filenames up into a list of single character strings like this:

["u", "s", "e", "r", "1", "u", "s", "e", "r", "2", ...]

However, the find module does give us a list we can filter in that way:

- name: get list of all user SSH key files
  local_action:
    module: find
    paths: "path/to/keys"
    excludes: '*~'
  register: find_keys

- name: derive key names
  set_fact:
    all_keys: "{{ find_keys.files | map(attribute='path') | map('basename') | list }}"

(Note that the find excludes parameter, used here to remove editor backup files from the results, is only available from Ansible 2.5. It isn’t strictly necessary, as only the files that match actual usernames will be used anyway. Alternatively, you could ensure that all your key files are named username.pubkey instead, which is perhaps a bit more intuitive, and then use patterns: '*.pubkey' with find. But you’d have to strip the extensions as well in the next step.)

(Instead of find, we could just run an ls command and process the standard output with split to make a list, or even shell out and call echo dir/*. But that’s spawning another process, which is cheating.)

Again, the find module is run locally as the key files are stored on our Ansible controller node. We extract the path attributes from the files key in the return value of the find module, as those contain the path for each file found, and then run basename over them to strip the directory path.

Now we can obtain the intersection of our sets of users and key files with one simple Jinja2 filter, and deploy the precise set of keys that are we need:

- name: derive list of user keys
  set_fact:
    # this could be appended to the previous step instead
    user_keys: "{{ all_keys | intersect(userlist) }}"

- name: install users' keys
  authorized_key:
    user: "{{ item }}"
    key: "{{ lookup('file', 'path/to/keys/' + item) }}"
    state: present
  with_items: "{{ user_keys }}"

This seems to me quite a neat pattern if you ever need to process a set of files according to some selective criteria or by cross-referencing against a second list. And it’s a lot quicker than watching a list of values scroll steadily up the screen.

What sucks, who sucks and you suck