Overview
Once you get Docassemble up and running, securing it doesn’t have to be hard. It’s a matter of:
- Understanding what kind of risks you want to protect against
- Having a backup and continuity plan
- Protection of user data with encryption and strong passwords
- Keeping the underlying server operating system up to date
- Keeping the web application up to date
- Turning off services you do not need
- Using the principles of least privilege to prevent misuse of the web application, object storage (such as S3), AWS, GitHub, etc
The rest of this guide discusses an approach to addressing each of these topics in order.
View the accompanying checklist for securing your Docassemble installation here.
Docassemble is open source. Is it safe to run in commercial law firms?
Yes, Docassemble is safe to run in a commercial law firm, so long as you keep it up to date and use the best practices outlined in this article. Some of the world’s most popular products, including the Android operating system, WordPress, the core code in Mac OS, and the Linux operating system, are all open source. While some features of open source software are unique, such as the fact that the source code is publicly visible, many in the open source community argue that this makes the software more secure, not less, because bugs can be identified by customers rather than remaining hidden. While this argument may not convince everyone, in my opinion, analyzing an open source product’s security should be approached the same way that we analyze any software vendor’s solution.
- How frequently updated is the software, and how widely is it in use?
- Is the software well supported?
- Is the software developer transparent about security problems when they are discovered?
- Does the architecture of the product have any inherent risks, and if so, are they acceptable and mitigated in an appropriate way?
The Linux Foundation has also released its own scoring system for open source applications, named the OSS Security Scorecard. It’s a high-level score that cannot evaluate code, but Docassemble’s score of 7.4/10 is a good indicator that Docassemble’s code follows commonly recommended best practices.
Docassemble’s changelog reveals that its primary author, Jonathan Pyle, updates it frequently, on about a weekly basis. Improvements also come from outside contributors. Security-related improvements are disclosed in the changelog, and updating is a simple one-click process in most instances.
Docassemble’s official support channel (#questions) is an active Slack community with more than 1,700 members from around the world.
The key factor that differentiates Docassemble from most commercial form platforms is that it incorporates a web server that you run and maintain yourself, unless you use a hosting service like that offered by Lemma Legal. But the technologies that underpin Docassemble: Python, and the Flask framework, are widely deployed and tested by both a broad open source community and commercial entities.
When you host Docassemble yourself, you do not need to worry about the information getting used or misused for data mining or ad tracking purposes. The list of data processors and controllers is totally up to you. This isn’t something you can control when you use a commercial product.
If you’d like to read some additional articles discussing the safety of open source software, check out these links:
- The Digital Economy Runs on Open Source. Here’s How to Protect It. (hbr.org)
- Just How Secure Is Open-Source Software? (makeuseof.com)
- Open Source Software Security: Is Open Source Software Safe? (percona.com)
Before you get started
This guide assumes that you deploy a standard Docassemble image with Docker. I won’t discuss securing Docassemble without Docker or the use of custom builds of Docassemble. You can follow the guide that I wrote about installing Docassemble to use the best practices laid out below.
Use at least 2 servers
Most deployments should involve a development and a production environment. This will allow you to validate that your interviews work on the latest version of Docassemble as well as to allow you to better secure your production environment.
Object storage or DABackup volume
I recommend that most deployments separate out at least object storage, with AWS’s S3 or Microsoft Azure’s object storage solutions, or a compatible system. But in certain high security deployments on premises, for example in a government deployment with a private data center, object storage may be unnecessary. If you do not use object storage, you should use a separate backup volume. In the Docassemble instructions, this volume is typically named dabackup.
I’ll talk about the risks associated both with and without the use of S3 or an equivalent object storage.
The layers of a Docassemble installation
I’ll talk about three layers of your Docassemble environment:
- The host environment, which is typically a virtual Linux server on AWS, Azure, Digital Ocean or perhaps in your local VMWare or Hyper-V environment
- The Docker container
- The Docassemble web application
What kind of negative outcomes are you concerned with?
Although I use the phrase “security” as a catch-all, it’s a broad umbrella. In reality, different actors can cause very different kinds of problems. And just like we don’t use Fort Knox level security on our home’s front door, some problems aren’t as relevant or are less important to secure against for some deployments. This analogy isn’t perfect; while you can’t try to break into 1000 houses at the same time, it’s not hard to automate breaking into 1000 computers at a time. But still, some kinds of attacks are only worth trying on certain kinds of computers, which means that you can prioritize the very strongest kinds of security measures for only certain kinds of targets. In particular, your comfort level about how long you are willing for the server to be offline has a significant impact on the cost of hosting and safety measures you take.
Think about whether your deployment has these kinds of concerns:
- Data leakage and compliance concerns, which are typically relevant if your deployment assembles documents that contain:
- Private health care information
- Financial account information
- Social security numbers or other ID numbers tied to a name
- Confidential company information or sensitive personally identifiable information
- System resource misuses
- Capture for code execution, including Bitcoin mining
- Capture for ransom
- Unauthorized use of storage or bandwidth
- Unauthorized use of credentials stored on the system, including API access to financial systems like Stripe
- System availability risks
- Denial of service
- Loss of access to critical real-time records, or delay in access to those records
- Reputational risks
- Data integrity concerns
Before you approach the rest of this document, you should understand which of these risks are important for you. I’ll discuss the relevant risks when discussing each way to protect your deployment.
Backup and continuity planning
Backups are point-in-time recovery options for your system. An appropriate backup strategy requires deciding how much data you are comfortable losing. Separate but conceptually paired with your backup strategy is a strategy for getting your server back online, or a continuity plan.
Docassemble makes some choices for you; by default, backups are made once a day at 6 AM (depending on your server’s time zone) and old backups are kept for up to 14 days by default. That means if you rely solely on Docassemble’s backup files, you risk losing up to 1 day of data. If this isn’t acceptable, you will need to consider a high availability deployment, especially one that uses a separate database server. High availability deployments are more expensive: you need at least 3 servers.
Even with a high availability deployment, backups are necessary. The usual rule is to have multiple forms of backup. Point in time backups are important to protect against the risk of data becoming corrupted. You may need the ability to go “back in time” a few days to restore a functioning system.
Consider how to backup and how to re-create:
- The command and other information you use to start Docassemble (typically an env.list file, and possibly a docker compose file)
- The packages and applications you run on your Docassemble server
- Your Docassemble configuration (config.yml)
- The files your users create
- The session data your users create
Backup the env.list manually
Your env.list file and possibly a docker compose file (not part of most setup guides online) live on the host server. They are not available from within the Docassemble web application. Your env.list file contains information you will be able to recreate relatively easily, but not without time. You should have a strategy of keeping a copy of this file in a safe place. Luckily, it does not change often.
With minor exceptions, count on Docassemble to place backups in your data storage location and your data storage location to be available
Everything else: the Docassemble configuration, files, and session data will be stored in a restorable form in either the backup volume or the S3 or equivalent object storage location. You should have a backup strategy for this, or understand the risks. A list of installed packages will also be part of the backup, but if you have packages that are in a private GitHub repository or that were installed from a .ZIP file, you should make sure that you have the original files and a working personal access token to install from GitHub.
Consider whether you need AWS region diversity or another cloud provider to increase reliability of your backups
If you have a local deployment of Docassemble (as opposed to a cloud deployment on AWS or Azure), and you used the dabackup volume parameter in your startup command, you can use your normal Linux virtual machine backup strategy to protect Docassemble. The full virtual machine will be backed up, and it will contain point-in-time snapshots that are intended to be recoverable at 6 AM each day.
You can follow your usual data protection strategies to decide if you should have a second, offline and a third off-site copy of your Docassemble environment.
If you use a cloud service to host Docassemble, you may be confident in the cloud service’s uptime and data protection guarantees. They are sufficient to protect most deployments, and likely far better than you can achieve by hosting on premises. Chances are your server will be offline for at most a few hours, and chances are that you will not lose any data due to a mistake by the cloud provider. The larger risk is of abuse of your credentials or improperly configured or failed backups.
However, consider if AWS’s 99.5% Service Level Agreement (for single instance deployments) is enough for you. You can increase the safety by mirroring your S3 environment to a second region, or to another cloud service. Only you know if this extra safety has value for your deployment. It depends on the negative consequences of losing the historical data.
Because your object storage is always online and because it uses the same credentials as your live Docassemble deployment, relying solely on AWS or Azure to protect your object storage has other risks:
- It will not protect against a malicious actor who has access to your AWS credentials and can delete or encrypt all of your data
- It will not protect against an outage of AWS or Azure
For most deployments, a tiny risk of an outage of a day or two (equivalent to the 99.5% SLA) is fine as long as the data is ultimately recoverable. Make sure you understand the tradeoffs and are prepared for the necessary time to re-build the system in a worst case scenario. You can further mitigate the risks of relying on a single instance, single availability zone deployment by following these steps:
- Turn on S3 versioning, which keeps the old versions of files from S3 indefinitely
- Turn on multi-factor delete, which requires you to use a second authentication method to permanently delete files or turn off bucket versioning
- Implement auditing and alerts
S3 versioning can also be an effective protection against the risk of ransomware.
Keep in mind that if AWS fails to meet their service level agreement, their “penalty” is reimbursing you for the extra hours of downtime, at most a few dollars. It does not insure against your lost business.
Getting back online in the event of a failure
If your AWS EC2 container or Lightsail instance fails but your backups ran as expected on S3, you should be able to get back online within 1-2 hours by creating a new container, updating it and installing Docker, copying over your separately backed-up env.list file, and running the docker run command.
Allocate some time for troubleshooting, as this process is not guaranteed to run smoothly. For example: you may need to move the files from a previous backup into position so Docassemble restores from an earlier point in time. Familiarize yourself with the S3 console in AWS so you understand how to download and upload files and you understand the file structure of Docassemble’s S3 deployment.
Verify backups periodically, and test if appropriate
Docassemble makes a snapshot of the database once a day. But Docassemble allocates a limited amount of time for the database export. Therefore, it’s a good idea to check the backup location to confirm the backup size, especially before stopping the container. If the database backup is zero bytes or is much smaller than any previous backup, you should assume that it did not properly run.
You can view the contents of a database dump (the file named “docassemble” in the “postgres” directory on S3) in one of two ways:
- Use pg_restore -l (see Dump and restore – Azure Database for PostgreSQL – Single Server | Microsoft Learn)
- Use a graphical tool, like DBeaver, to open and view the contents on your local desktop.
Protecting user data and your API keys
Docassemble’s defaults work well to keep user data safe. But anyone who can install a package or use the playground on your server has access to all unencrypted data. And all variables in use by your interview are visible by default to anyone with access to the session. Poor interview design can expose API keys or other data you intend to be private. Consider:
- Configure Let’s Encrypt for encryption of user data in transit with https/ssl
- Turn on encryption at rest for interviews that store confidential, medical, or financial records
- Encryption at rest will complicate or disable sharing of interviews as well as remote signing features, and make it impossible to recover user data
- Use separate development and production environments.
- You should also use separate API keys in each environment.
- Limit administrator and developer access to your production environment
- Use complex passwords for administrator accounts
- Turn off the Docassemble playground on production environments
- Turn off debug mode
- Disable the visibility of user variables via JavaScript unless required by your interview (on a per interview basis).
- Advanced authors that build features that load in API keys should also design their interviews to keep API keys and sensitive data out of the user dictionary. You can do this by storing the API key in the global configuration, deleting them after use, or by making them global variables in a Python module and avoiding adding them as attributes on DAObjects.
Use automated nightly updates and secure the host operating system
In general, it’s a good idea to host the Docassemble docker container on a Linux server. There are many good guides on securing Linux servers. This blog will summarize the most important considerations for deploying Docassemble with Docker.
I recommend using the latest available long term support (LTS) release of Ubuntu. These LTS versions are released every 2 years and supported for 5, and most cloud hosting environments will offer them as a supported install option. In AWS Lightsail, the Ubuntu Server image has good defaults in place. Securing the Docassemble host is no different from securing any standard Linux server.
- Use the firewall to limit access to ports 22 (SSH), 80 (http), and 443 (secure http). In Lightsail, you will need to add a rule to allow access to port 443. Lightsail’s firewall can be entirely configured from the Lightsail console. In some cloud deployments, you’ll need to configure the ufw firewall with the commandline instead.
- Make sure that password-based login is disabled, and that you can only login with an SSH key. This eliminates many brute force login attempts, and is the default in AWS Lightsail.
- Login as a regular user (not the root user), and use sudo to run commands with administrator privilege. This is the default in Lightsail as well.
- Use automatic updates. On AWS Lightsail, the default configuration of this server is to install security updates automatically each night. Most security updates can be installed without restarting the server. (How to set up automated updates on Ubuntu)
Occasionally, a kernel update for Linux requires a restart. It’s a good idea to login to the server periodically and to restart it if needed. The server’s login message will alert you if a restart is required. Typing sudo reboot should do it for you.
Some more guides that may be useful here:
- Linux Server Security – Best Practices for 2021
- How to secure your Linux web server (freecodecamp.org)
- Unix & Linux Server Security: 10 Best Practices | BeyondTrust
Update the Docassemble web application at least monthly
Docassemble’s author regularly updates both the application and its core dependencies. While using the latest version has some risks, including the risk that an update breaks a feature that used to work, you should regularly install the available updates. Using the latest version ensures that you get security updates from both Docassemble and the various components that it is built on.
Monitor the change log and the Slack channel. Unless the update is an urgent security fix, delaying a few days or a week is fine, but get in the habit of updating at least once a month.
To reduce the risk of an update breaking your interviews, you can monitor the Slack for the day or two after the update is announced to see if others are experiencing problems with the latest update. You should install updates on your development or test environment first. (Suffolk’s Hall Monitor script might help). You can use automated tests as a first line of defense against changes breaking your interviews.
Minimize privileged accounts on your production server and limit access for each account to what is needed
- Use a separate account for each person who has access so access can be properly audited. Don’t use shared accounts.
- Only give administrator or developer access to the production server to a small number of staff who need the permission to install packages. Those staff will also be able to view any unencrypted sessions on the server.
Developers and admins have the same privilege level in Docassemble: the only difference is that some menu items are hidden. A developer has the ability to run any code, fully privileged, which can be used to modify the server’s configuration and the contents of the database. Therefore it is important to limit the number of developer accounts on your production server.
Limit access to API keys and use quotas and alerts on those keys
Docassemble can make use of several API keys that may ultimately lead to bills for you or lead to reputational harm or bans if they are misused. If you integrate with services like GitHub, Google Maps, Twilio, or Sendgrid, make sure that you use the available controls on these third-party services to limit API access from the Docassemble server.
Only enable access for each API key to the specific services that are needed.
You may also want to use multiple API keys to fully control access.
- Limit access to the public IP of your server to control access for any services that are accessed in the background or via code, like Google Geocoding, Twilio, and Sendgrid.
- Use the web address of your server to limit access for any services that are accessed only in the frontend, like Google Maps Autocomplete.
Finally, understand the threat model and risks that you are comfortable with when you get unexpectedly high usage. If the service you use has an alert system, always set a threshold to get alerts. Remember that high usage could come in the middle of the night, especially if you get an intentional abusive denial of service attack. If you are comfortable with the risk of downtime vs an unexpectedly high bill, it may be best to enable hard quotas on usage as well. Typically you can set budgets or API query limits on a per-hour, daily, or monthly basis to control usage.
Don’t panic
Securing a web server that runs Linux and docker is not a trivial task, but it can be done with some planning and attention to detail. By following the steps outlined in this guide, you can protect your Docassemble installation from common threats and ensure its reliability and availability. Remember to always backup your data, encrypt your connections, update your software, and limit your exposure. Docassemble is a powerful and flexible tool for creating interactive legal applications, and with proper security measures, you can use it with confidence and peace of mind.
Our team at Lemma Legal is ready to help you with either fully managed hosting or training and support. Set up a free 15 minute consult to learn more.