Ansible is an open-source software automating configuration management and software deployment. Ansible is used in Quarkslab to manage our infrastructure and in our product Irma. In order to have an idea of the security of Ansible, we conducted a security assessment. This blogpost presents our findings.
Introduction
Ansible [1] is an open-source software that automates configuration management and software deployment. If you're not familiar with Ansible, we encourage you to read this introduction first [15].
Ansible works by connecting to the infrastructure machines (which can be a Windows or Linux OS) and uploading small pieces of code to them. These programs are called "Ansible Modules" and are executed by Ansible to modify the state of the system.
Each of these modules represent a task to be executed on the node and a collection of tasks is a playbook.
To manage the remote machines (also called nodes), Ansible uses SSH (or Windows Remote Management WinRM) and only needs Python (or Powershell). Ansible is used in Quarkslab to manage our infrastructure [13] and in our product Irma [14].
The lack of agent on the managed machines can be seen as an advantage of Ansible over similar solutions because it's really easy to deploy and to configure. But as we'll see in this blogpost, it can lead to some security issues.
In order to have an idea of the security of Ansible, we did an internal security assessment. We wanted to know what are the risks of executing a playbook on an compromised infrastructure.
We conducted this research in November 2019 on Ansible 2.8.
The most common modules have been reviewed. We found several vulnerabilities and 10 CVEs were assigned after this research:
- [CVE-2020-1733]: Privilege escalation when used become mecanism to an unprivileged user - Moderate
- [CVE-2020-1734]: Command injection in pipe lookup plugin - Moderate
- [CVE-2020-1735]: Path traversal in the fetch module - Moderate
- [CVE-2020-1736]: Incorrect permission when create file on the node - Low
- [CVE-2020-1737]: Path Traversal in win_unzip module - Moderate
- [CVE-2020-1738]: Arbitrary module execution with the module package or service - Low
- [CVE-2020-1739]: Expose password argument of svn module - Low
- [CVE-2020-1740]: Race condition in ansible-vault edit command - Low
- [CVE-2020-10684]: Injection of arbitrary ansible-fact that lead to an arbitrary command execution - Important
- [CVE-2020-10685]: Unremoved decrypted vault file after some module execution - Moderate
Findings
Ansible can manage multiple nodes on a single run and several users per node (by using become [16] plugins). For us, the main threat is a privilege escalation from a malicious user to another user on the same node or to another node or to the controller.
We set up a test infrastructure containing both Linux and Windows nodes and focused our research on common modules.
Each task defined in a playbook is converted to a module and then sent to the node. This is needed because Ansible doesn't have an agent on each node. An attacker shouldn't be able to modify the module and its parameters before or during the execution.
Module execution on a Linux node (CVE-2020-1733)
When Ansible needs to execute a module on a Linux node, the module is packaged in a single python file with all the dependencies and parameters. The package is then transferred and executed on the node. The verbosity parameter (-vvv) of Ansible allows us to follow the execution process with SSH.
For example, we execute this single task:
remote_user: user tasks: - name: Execute file module file: path: /tmp/missing_file state: absent
We obtain the following output from SSH:
... <10.10.0.11> SSH: EXEC sh -C [skip -o ...] 10.10.0.11 '/bin/sh -c '"'"'echo ~ && sleep 0'"'"'' ... <10.10.0.11> SSH: EXEC sh -C [skip -o ...] 10.10.0.11 '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847 `" && echo ansible-tmp-1584362480.6592143-258274020557847="` echo /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847 `" ) && sleep 0'"'"'' ... <10.10.0.11> PUT /home/user/.ansible/tmp/ansible-local-61394ba1tivkm/tmphcenrzoc TO /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/AnsiballZ_file.py <10.10.0.11> SSH: EXEC sftp -b - -C [skip -o ...] '[10.10.0.11]' ... <10.10.0.11> SSH: EXEC sh -C [skip -o ...] 10.10.0.11 '/bin/sh -c '"'"'chmod u+x /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/ /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/AnsiballZ_file.py && sleep 0'"'"'' ... <10.10.0.11> SSH: EXEC sh -C [skip -o ...] -tt 10.10.0.11 '/bin/sh -c '"'"'/usr/bin/python /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/AnsiballZ_file.py && sleep 0'"'"'' ... <10.10.0.11> SSH: EXEC sh -C [skip -o ...] 10.10.0.11 '/bin/sh -c '"'"'rm -f -r /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/ > /dev/null 2>&1 && sleep 0'"'"'' ...
The executed commands are as follows:
# resolve home directory /bin/sh -c 'echo ~user && sleep 0' # create temporary directory /bin/sh -c '( umask 77 && mkdir -p "` echo /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847 `" && echo ansible-tmp-1584362480.6592143-258274020557847="` echo /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847 `" ) && sleep 0' # push file with sftp to /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/AnsiballZ_file.py # set execution bit /bin/sh -c 'chmod u+x /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/ /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/AnsiballZ_file.py && sleep 0' # execute the module /bin/sh -c '/usr/bin/python /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/AnsiballZ_file.py && sleep 0' # remove the module and the temporary directory /bin/sh -c 'rm -f -r /home/user/.ansible/tmp/ansible-tmp-1584362480.6592143-258274020557847/ > /dev/null 2>&1 && sleep 0'
A temporary directory is created in the home directory of the user, with the current timestamp (with microsecond precision) and a random value of 48 bits [2].
We execute the same playbook but this time by becoming root:
# resolve home directory /bin/sh -c 'echo ~user && sleep 0' # create temporary directory /bin/sh -c '( umask 77 && mkdir -p "` echo /home/user/.ansible/tmp/ansible-tmp-1584364725.538944-238100626223672 `" && echo ansible-tmp-1584364725.538944-238100626223672="` echo /home/user/.ansible/tmp/ansible-tmp-1584364725.538944-238100626223672 `" ) && sleep 0' # push file with sftp to /home/user/.ansible/tmp/ansible-tmp-1584364725.538944-238100626223672/AnsiballZ_file.py # set execution bit /bin/sh -c 'chmod u+x /home/user/.ansible/tmp/ansible-tmp-1584364725.538944-238100626223672/ /home/user/.ansible/tmp/ansible-tmp-1584364725.538944-238100626223672/AnsiballZ_file.py && sleep 0' # execute the module with root user /bin/sh -c 'sudo -H -S -p "[sudo via ansible, key=ehpdhkiogpdqgpeehnzeyvouybvuuzdp] password:" -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-ehpdhkiogpdqgpeehnzeyvouybvuuzdp ; /usr/bin/python /home/user/.ansible/tmp/ansible-tmp-1584364725.538944-238100626223672/AnsiballZ_file.py'"'"' && sleep 0' # remove the module and the temporary directory /bin/sh -c 'rm -f -r /home/user/.ansible/tmp/ansible-tmp-1584364725.538944-238100626223672/ > /dev/null 2>&1 && sleep 0'
When become is used with root, the only change is the use of sudo (or another become method). The temporary directory isn't owned by root but by the trusted user who performs the connection.
We execute the same playbook but this time by becoming another user:
# resolve home directory /bin/sh -c 'echo ~user && sleep 0' # create temporary directory /bin/sh -c '( umask 77 && mkdir -p "` echo /var/tmp/ansible-tmp-1584365644.8392012-230589678596332 `" && echo ansible-tmp-1584365644.8392012-230589678596332="` echo /var/tmp/ansible-tmp-1584365644.8392012-230589678596332 `" ) && sleep 0' # push file with sftp to /var/tmp/ansible-tmp-1584365644.8392012-230589678596332/AnsiballZ_file.py # set execution bit /bin/sh -c 'setfacl -m u:become_user:r-x /var/tmp/ansible-tmp-1584365644.8392012-230589678596332/ /var/tmp/ansible-tmp-1584365644.8392012-230589678596332/AnsiballZ_file.py && sleep 0' # execute the module with root user /bin/sh -c 'sudo -H -S -p "[sudo via ansible, key=whacochmknumfjfmariwhcttwlztmihm] password:" -u become_user /bin/sh -c '"'"'echo BECOME-SUCCESS-whacochmknumfjfmariwhcttwlztmihm ; /usr/bin/python /var/tmp/ansible-tmp-1584365644.8392012-230589678596332/AnsiballZ_file.py'"'"' && sleep 0' # remove the module and the temporary directory /bin/sh -c 'rm -f -r /var/tmp/ansible-tmp-1584365644.8392012-230589678596332/ > /dev/null 2>&1 && sleep 0'
When the become user and the connection user are unprivileged (i.e., not root), the temporary directory is created in /var/tmp/. The directory is created by mkdir with the argument -p and will not fail if the directory already exists. The other commands (setfacl and rm -f) are also lazy and don't fail if the state is already the expected one.
A malicious user on the managed node who can predict the directory name can compromise the execution.
As mentioned before, the temporary directory name contains 48 random bits and a microsecond timestamp. However, the directory name is an argument of the /bin/sh and mkdir process. There is a race condition which allows to determine this directory. A malicious user who iterates on /proc/<pid>/cmdline can find the directory name and create the directory before the become user. Once the directory is created the malicious user can change the content of the uploaded module and elevate his privilege [CVE-2020-1733].
ansible_facts and become
To use Ansible, the user needs to define the list of targeted nodes (the infrastructure) and the tasks to perform on it (the playbook). Variables can be defined in various files [3] to share parameters between hosts and tasks. All variables are stored in the VariableManager. In addition to the variables read from configuration files, the setup module retrieves the facts [4] on each node before the execution of the first task. The VariableManager computes the available variables for each task by applying the precedence rules [5].
The precedence of the facts is in the middle of the precedence rules [5] and allows a malicious node to overwrite some variables. To avoid the facts to override sensible variables, a blacklist removes any variable matching Ansible internal variables [6] [7].
Only variables defined in group_vars, host_vars, inventory and roles default files can be overwritten by the facts.
In addition to the facts returned by the setup module, each module can add additional facts with the token ansible_facts in the returned json. These facts will be merged with the existing facts in the VariableManager. As a result, a compromised module allows a malicious user to tamper with the variables used by the following tasks on the same node.
Ansible provides a become mechanism that allows changing the user before executing the module on a node. This mechanism doesn't have an impact on the VariableManager and the merge of the returned facts. If a become user is malicious, they can inject facts and corrupt future tasks of the node, including those that don't use the same user.
In a first look, this issue doesn't seem to have an impact, as only the node that sets malicious facts will have its variable overwritten for its future tasks. However, in two cases, this behavior can have a security impact:
- when a task needs to be executed in the controller node;
- when the malicious fact is returned by a become user.
ansible_facts blacklist (CVE-2020-10684)
To avoid some facts to change critical variables, a blacklist removes any returned facts with a sensitive name. The blacklist works in three steps:
- when the result of a module is received, the method remove_internal_keys [6] removes the facts that begin with discovered_interpreter_ or ansible_discovered_interpreter_ before the variable is merged in the VariableManager.
- when the precedence rules are applied, a copy of the stored facts is cleaned by a call to clean_facts [7] function, cleaning all variables that match Ansible variables.
- a copy of the stored facts is added in the variable ansible_facts. The method namespace_facts [8] is used to remove the ansible_ prefix.
We found that the discovered_interpreter_ variables management (as for discovered_interpreter_python) doesn't follow the same pattern. When the variable is used, its value is taken from ansible_facts.discovered_interpreter_. The previous blacklist can be bypassed by using the following method:
- return an ansible_facts key nested in the ansible_facts of the module result. This key won't match when remove_internal_keys is called and replace the ansible_facts variable.
{ "changed": false, "ansible_facts": { "ansible_facts": { "discovered_interpreter_python" : "<reverse_shell>" } } }
- The method namespace_facts [8] doesn't remove the prefix but the first ansible_ found in the variable name. We can inject a variable discovered_ansible_interpreter_python that will bypass the method remove_internal_keys [6] and will become discovered_interpreter_python after namespace_facts [8].
{ "changed": false, "ansible_facts": { "discovered_ansible_interpreter_python" : "<reverse_shell>" } }
This issue was first acknowledged as a bug by Red Hat and then became a CVE [CVE-2020-10684].
Executing a task on another node
In some cases, a task in the playbook needs to be executed on another node than the managed one (for example, sign a certificate). To do this, one can use the delegated_to variable. The generated variables for this task use the facts of the delegated node.
However, we find some playbooks where the task sets connection: local to change the connection plugin to local and executes the module locally. When this variant is used, the VariableManager uses the facts of the managed node.
In addition to using an interpreter path of the managed node, some variables can be overwritten and may lead to malicious execution, depending on the playbook. We strongly discourage using connection: local to delegate a task to the controller node.
Vault (CVE-2020-1740)
Ansible includes a mechanism of secure storage with the Vault. As the configuration may be versioned, this mechanism allows the encryption of some secrets and only decrypts them when the playbook is running. The value is sent in clear text to the managed node. A malicious user may be in position to recover this value.
The Vault mechanism allows the administrator to encrypt and authenticate some file or variable in the configuration in order to commit encrypted values. The key must be kept safe and will be used by Ansible to decrypt the value when needed.
When Ansible needs to decrypt a file encrypted with Vault, the class DataLoader creates a temporary decrypted file with tempfile.mkstemp. The decrypted file is created safely in /tmp. The DataLoader instance keeps track of every temporary file and deletes each of them when Ansible terminates. However, when the decrypted file is needed by a module (like assemble or copy), the temporary file is created in another process (not the main one) but the tracking of the temporary files is only performed in the main process. Some actions plugin implement the cleanup manually but not all of them. Some secret files may stay decrypted after the execution. This issue was assigned to the CVE [CVE-2020-10685].
Additionally, we found a race condition in ansible-vault edit that allows a local user on the controller to steal the edited secret, but we haven't been able to exploit it without issuing an error [CVE-2020-1740].
Other issues
The module synchronise doesn't verify the ssh host key by default. We encourage enabling it if the host key is already known by the controller. Ansible should provide for a future release more options to secure the connection by default.
The shell built-in was enabled by default in lookup pipe [CVE-2020-1734]. We recommend disabling it by default and adding an option to enable it only when necessary.
The destination file of the fetch module can be changed by an user on the node [CVE-2020-1735]. This user can inject a path with ../ and change the content of the file to write an arbitrary file on the controller.
The atomic_move method used by module allows in some cases a malicious user to read the content of the file during the copy [CVE-2020-1736]. This method is used in some module to move files on the target node.
The win_unzip module doesn't sanitize the path found in the source archive [CVE-2020-1737]. The extracted file can be out of the destination directory. A similar bug was found in the unarchive module with no security impact.
The package and service modules are vulnerable to the injection of a fact that can change the execution of an arbitrary module or, with some condition, an arbitrary binary [CVE-2020-1738]. The execution of an arbitrary module should fail if the module doesn't have the expected option. An arbitrary binary may be injected with Ansible before 2.9 if a collection is installed.
The svn module provides the password in the command line of svn allowing any user on the node to retrieve it [CVE-2020-1739].
Some issues with powershell on Windows nodes are still processed by Red Hat.
Conclusion
We've seen that in a worst case scenario it may be possible to fully compromise an infrastructure managed by Ansible. First by gaining access to a node (using [CVE-2020-1733]) then by jumping from this node to the controller (using [CVE-2020-10684]).
Ansible by design can't protect itself from malicious nodes. Running Ansible on untrusted networks should be done with special care and every security best practices should be applied.
But regardless of these issues, the ease of use is really great and more than compensates the risk. At Quarkslab, we love Ansible and we'll keep using it because it's a great piece of software!
Timeline
- 2020/01/23: Quarkslab sent 16 issues to Red Hat, indicates that disclosure is scheduled for March 17th.
- 2020/01/27: Red Hat acknowledges the report.
- 2020/02/04: Quarkslab asks for a status update.
- 2020/02/10: Red Hat informs us that 10 out the 16 bugs will be considered worth a CVE (at least 4 moderate, none critical), 4 bugs not worthy of a CVE, 1 a design issue and 1 still under study.
- 2020/02/11: Red Hat informs us that they plan to open the issues with low and moderate impact and asks Quarkslab for consent.
- 2020/02/15: Quarkslab agrees to Red Hat's plan to disclose moderate and low impact bugs on Feb. 17th, asks for a status update, date of fixes, and informs the plan to talk about the bugs on March 17th.
- 2020/02/17: Red Hat says that date for the fixes is not yet known, and that since bugs were discovered by Quarkslab, it's ok to discuss publicly any technical details.
- 2020/02/17: 8 CVEs were assigned by Red Hat (from CVE-2020-1733 to CVE-2020-1740).
- 2020/02/21: Red Hat asks for Quarkslab address to send us a gift :).
- 2020/02/26: Issues for the published CVEs were opened on GitHub.
- 2020/03/03: Quarkslab asks for status update and details about the remaining bugs.
- 2020/03/11: Prior request of status update resent.
- 2020/03/12: Red Hat acknowledges the remaining issues as bugs, design issues or notabug.
- 2020/03/13: Quarkslab sends more details about some bugs and explains why they should be considered as vulnerabilities.
- 2020/03/18: Red Hat publishes the CVE-2020-10685.
- 2020/03/23: Red Hat publishes the CVE-2020-10684.
- 2020/03/24: Merge requests for the last CVEs are created on GitHub.
- 2020/03/27: Red Hat informs Quarkslab that they changed the classification of two bugs to vulnerabilities. They request an embargo on the CVE-2020-10684.
- 2020/03/30: Quarkslab replies that CVE-2020-10684 is already a public issue on Ansible's public repo and includes a PoC for testing so embargo does not have any purpose. Quarkslab reminds Red Hat that the original report indicated Quarkslab would disclose the vulnerabilities on March 17th at the Quarks-in-the-Shell conference, and asks for the estimated date for fixes to be released.
- 2020/04/01: Red Hat replies that they cannot make any promises but the current estimated date for release of fixes is within the next Ansible release.
- 2020/04/17: Ansible v2.9.7 , v2.8.11 and v2.7.17 are released.
- 2020/07/03: Quarkslab receives two mugs and goodies from RedHat. Thank you :)
Bonus
References
[1] | https://github.com/ansible/ansible/ |
[2] | https://github.com/ansible/ansible/blob/v2.8.10/lib/ansible/plugins/action/__init__.py#L333 |
[3] | https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html#content-organization |
[4] | https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#variables-discovered-from-systems-facts |
[5] | (1, 2) https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#ansible-variable-precedence |
[6] | (1, 2, 3) https://github.com/ansible/ansible/blob/v2.8.10/lib/ansible/vars/clean.py#L99 |
[7] | (1, 2) https://github.com/ansible/ansible/blob/v2.8.10/lib/ansible/vars/clean.py#L119 |
[8] | (1, 2, 3) https://github.com/ansible/ansible/blob/v2.8.10/lib/ansible/vars/clean.py#L171 |
[9] | https://docs.ansible.com/ansible/latest/user_guide/vault.html#vault-payload-format-1-1-1-2 |
[10] | https://github.com/ansible/ansible/blob/v2.8.10/lib/ansible/module_utils/urls.py#L264 |
[11] | https://www.mercurial-scm.org/repo/hg/rev/d7f7f1860f00 |
[12] | https://github.com/ansible/ansible/issues/24152 |
[13] | https://github.com/quarkslab/qb.backup |
[14] | https://quarkslab.com/irma/ |
[15] | https://docs.ansible.com/ansible/latest/user_guide/playbooks_intro.html |
[16] | https://docs.ansible.com/ansible/latest/plugins/become.html |
[CVE-2020-1733] | (1, 2, 3) https://access.redhat.com/security/cve/cve-2020-1733 |
[CVE-2020-1734] | (1, 2) https://access.redhat.com/security/cve/cve-2020-1734 |
[CVE-2020-1735] | (1, 2) https://access.redhat.com/security/cve/cve-2020-1735 |
[CVE-2020-1736] | (1, 2) https://access.redhat.com/security/cve/cve-2020-1736 |
[CVE-2020-1737] | (1, 2) https://access.redhat.com/security/cve/cve-2020-1737 |
[CVE-2020-1738] | (1, 2) https://access.redhat.com/security/cve/cve-2020-1738 |
[CVE-2020-1739] | (1, 2) https://access.redhat.com/security/cve/cve-2020-1739 |
[CVE-2020-1740] | (1, 2) https://access.redhat.com/security/cve/cve-2020-1740 |
[CVE-2020-10684] | (1, 2, 3) https://access.redhat.com/security/cve/cve-2020-10684 |
[CVE-2020-10685] | (1, 2) https://access.redhat.com/security/cve/cve-2020-10685 |