Testing Software
Featured articles
-
End-to-end-ish tests using fake HTTP in Flutter
We write tests in order to prove our features work as intended and we run those tests ...
End-to-end-ish tests using fake HTTP in Flutter We write tests in order to prove our features work as intended and we run those tests consistently to prove that our features don't stop working as intended. Writing end-to-end tests is pretty expensive. Typically, they use real devices or sometimes a simulator/emulator and real backend services. That usually means that they end up being pretty slow and they tend to be somewhat flaky. That isn't to say that they're not worth it for some teams or for a subset of the features in your app. However, I'm here to tell you (or maybe just remind you) that tests and test coverage aren't the goal in and of themselves. We write tests in order to prove our features work as intended and we run those tests consistently to prove that our features don't stop working as intended. That means that our goal when writing tests should be to figure out how to achieve our target level of confidence that our features work as intended as affordably as possible. It’s a spectrum. On the one end is 100% test coverage using all the different kinds of tests: solitary unit tests, sociable more-integrated tests, and end-to-end tests; all features, fully covered, no exceptions. On the other end of the spectrum there are no tests at all; YOLO, just ship-it. At Betterment, we definitely prefer to be closer to the 100% coverage end of the spectrum, but we know that in practice that's not really a feasible end state if we want to ship changes quickly and deliver rapid feedback to our engineers about their proposed changes. So what do we do? Well, we aim to find an affordable, maintainable spot on that testing spectrum a la Justin Searls' advice. We focus on writing expressive, fast, and reliable solitary unit tests, some sociable integrated tests of related units, and some "end-to-end-ish" tests. It's that last bucket of tests that's the most interesting and it's what the rest of this post will focus on. What are "end-to-end-ish" tests? They're an answer to the question "how can we approximate end-to-end tests for a fraction of the cost?" In Flutter, the way to write end-to-end tests is with flutter_driver and the integration_test package. These tests use the same widgetTester API that regular Widget tests use but they are designed to run on a simulator, emulator, or preferably a real device. These tests are pretty easy to write (just as easy as regular widget tests) but hard-ish to debug and very slow to run. Where a widget test will run in a fraction of a second to a second, one of these integration tests will take many seconds. We love the idea of these tests, the level of confidence they'd give us that our app works as intended, and how they'd eliminate manual QA testing, but we loathe the cost of running them, both in terms of time and actual $$$ of CI execution. So, we decided that we really only want to write these flutter_driver end-to-end tests for a tiny subset of our features, almost like a "smoke testing" suite that would signal us if something was seriously wrong with our app. That might include a single happy-path test apiece for features like log-in and sign-up. But that leaves us with a pretty large gap where it's way too easy for us to accidentally create a feature that depends on some Provider that's not provided and our app blows up at runtime in a user's hands. Yuck! Enter, end-to-end-ish tests (patent pending 😉). These tests are as close to end-to-end tests as we can get without actually running on a real device using flutter_driver. They look just like widget tests (because they are just widget tests) but they boot up our whole app, run all the real initialization code, and rely on all our real injected dependencies with a few key exceptions (more on that next). This gives us the confidence that all our code is configured properly, all our dependencies are provided, our navigation works, and the user can tap on whatever and see what they'd expect to see. You can read more about this approach here. "With a few key exceptions" If the first important distinction of end-to-end-ish tests is that they don't run on a real device with flutter_driver, the second important distinction is that they don't rely on a real backend API. That is, most apps rely on one (or sometimes a few) backend APIs, typically powered by HTTP. Our app is one of those apps. So, the second major difference is that we inject a fake HTTP configuration into our network stack so that we can run nearly all of our code for real but cut out the other unreliable and costly dependency. The last important hurdle is native plugins. Because widget tests aren't typically run on a real device or a simulator/emulator, they run in a context in which we should assume the underlying platform doesn't support using real plugins. This means that we have to also inject fake implementations of any plugins we use. What I mean by fake plugins is really simple. When we set up a new plugin and we wrap it in a class that we inject into our app. Making a fake implementation of that plugin is typically as easy as making another class, prefixing its name with Fake and having it implement the public contract of the regular plugin class with suitably real but not quite real behavior. It's a standard test double, and it does the trick. It's definitely a bummer that we can't exercise that real plugin code, but when you think about it, that plugin code is tested in the plugin's test suite. A lot of the time, the plugin code is also integration tested as well because the benefits outweigh the costs for many plugins, e.g. the shared preferences plugin can use a single integration test to provide certainty that it works as intended. Ultimately, using fake plugins works well and makes this a satisfyingly functional testing solution. About that fake HTTP thing One of the most interesting bits of this solution is the way we inject a fake HTTP configuration into our network stack. Before building anything ourselves, we did some research to figure out what the community had already done. Unfortunately, our google-fu was bad and we didn't find anything until after we went and implemented something ourselves. Points for trying though, right? Eventually, we found nock. It's similar to libraries for other platforms that allow you to define fake responses for HTTP requests using a nice API and then inject those fake responses into your HTTP client. It relies on the dart:io HttpOverrides feature. It actually configures the current Zone's HTTP client builder to return its special client so that any code in your project that finds its way to using the dart:io HTTP client to make a request will end up routed right into the fake responses. It's clever and great. I highly recommend using it. We, however, are not using it. How we wrote our own fake HTTP Client Adapter As I said, we didn't find nock until after we wrote our own solution. Fortunately, it was a fun experience and it really took very little time! This also meant that we ended up with an API that fit our exact needs rather than having to reframe our approach to fit what nock was able to offer us. The solution we came up with is called charlatan and it's open-source and available on pub.dev. Both libraries are great and each is designed for a specific challenge, check both of them out and decide which one works for your needs. Here's what our API looks like and how we use it to set up a fake HTTP client for our tests. Here you can see how to construct an instance of the Charlatan class and then use its methods like whenGet to configure it with fake responses that we want to see when we make requests to the configured URLs. We've also created an extension method withDefaults that allows us to configure a bunch of common, default responses so that we don't have to specify those in each and every test case. This is useful for API calls that always behave the same way, like POSTs that return no body, and to provide a working foundation of responses. When a test case cares about the specifics of a response, it can override that default. The last important step is to make sure to convert the Charlatan instance into an adapter and pass that into our HTTP client so that the client will use it to fulfill requests. Here's a peek inside of the Charlatan API. It's just collecting fake responses and organizing them so that they're easy to access later.As you can see, the internals are pretty tiny. We provide a class that exposes the developer-friendly configuration API for fake responses, and we implement the HttpClientAdapter interface provided by dio. In our app we use dio and not dart:io's built-in HTTP client mostly due to preference and slight feature set differences. For the most part, the code collects fake responses and then smartly spits them back out when requested. The key functionality (Ahem! Magic ✨) is only a few lines of code. We use the uri package to support matching templated URLs rather than requiring developers to pass in exactly matching strings for requests their tests will make. We store fake responses with a URI template, a status code, and a body. If we find a match, we return it, if we don't then we throw a helpful exception to guide the developer on how to fix the issue. Takeaways Testing software is important, but it's not trivial to write a balanced test suite for your app's needs. Sometimes, it's a good idea to think outside the box in order to strike the right balance of test coverage, confidence, and maintainability. That's what we do here at Betterment, come join us! -
Stability through Randomness
We only recently enabled test randomization and as a result found that some of our tests were ...
Stability through Randomness We only recently enabled test randomization and as a result found that some of our tests were failing. Through fixing the tests, we learned lessons that could help others have a less painful migration themselves. If you’re writing tests for your Flutter application, it’s safe to assume that your goal is to build a robust, reliable piece of software that you can be confident in. However, if your tests aren’t run in random order, you may have a false sense of confidence that the assertions you’re making in them are actually accurate. By default, running flutter test will run your tests in the order they’re written within your test file. In other words, the following test file will always exit successfully, despite the fact that there are obvious issues with how it’s set up. In this case, our second test is relying on the side effects of the first test. Since the first test will always run before the second test, we’re not privy to this dependency. However, in more complex testing scenarios, this dependency won’t be as obvious. In order to avoid test inter-dependency issues, we can instead run our tests in a random order (per file) by passing the --test-randomize-ordering-seed flag to flutter test. The flag takes a seed that can be one of two things, either a 32 bit unsigned integer or the word “random”. To ensure true randomness, always pass “random” as the seed. The benefit of having the option to pass an integer as a seed becomes apparent once you come across a test that fails when run in an order other than that which it was defined. The test runner will print the seed it chose at the beginning of test execution, and you can reliably use that seed to reproduce the failure and be confident in your fix once the test begins passing. If you have been using the randomization flag since the inception of your codebase, you’re in a fantastic position and can be confident in your tests! If you haven’t, there’s no better time to start than now. Of course, introducing the flag may cause some tests to begin failing. Whether you choose to skip those tests while you work on fixing them so the rest of your team can keep chugging away, or address the issues immediately, the following tips should help you quickly identify where the issues are coming from and how to resolve them. Tip 1: Assume every test within a test file will run first The first snippet above highlights the anti-pattern of assuming a consistent test execution order. We can rewrite this test so that each test would pass if it were run first. Hopefully it’s easy to look past the trivial nature of using an int and imagine how this might apply to a more complex test case. Tip 2: Keep all initialization & configuration code inside of setUp() methods While it may be tempting to set up certain test objects directly in your main function, this can cause sneaky issues to crop up, especially when mocking or using mutable objects. Don’t Did you know that even when run sequentially, this will print A,B,D,C,E? This is because code in the body of the main function and the bodies of groups only runs once and it does so immediately. This can introduce sneaky testing bugs that may not surface until the tests themselves run in random order. Do This will correctly print A,B,C,A,D,E (A prints twice because setUp is run before each test) Tip 3: Scope test objects as closely as possible to the tests that need them In the same way that we prefer to keep shared state as low in the Widget tree as possible, keep your test objects close to the tests that utilize them. Not only does this increase test readability (each set up method will set up only the dependencies needed for the tests below it and within the same scope in the testing tree), but this reduces the scope for potential problems. Don’t Do By keeping test dependencies tightly scoped to where they’re used, we avoid the possibility that a test will be added or changed in such a way that impacts the tests previously consuming the dependency. Instead, when a new test is introduced that requires that dependency, the decision can be made to share it in such a way that its state gets reset prior to each test or to not share it at all and have each test create and set up the dependency itself. Keep in mind, descriptive group names go a long way in adding clarity to what dependencies that bucket relies upon. For example, a group named “when a user is logged in” tells me that the group of tests relies upon a user in the authenticated state. If I add another group named “when a user is logged out”, I would expect both groups to have setUp() methods that correctly create or set up the user model to have the correct authentication state. Following the above tips should put you well on your way to fixing existing problems in your test suite or otherwise preventing them all together! So if you haven’t already, make sure to enable test randomization in your Flutter codebase today! -
Finding a Middle Ground Between Screen and UI Testing in Flutter
We outline the struggles we had testing our flutter app, our approaches to those challenges, ...
Finding a Middle Ground Between Screen and UI Testing in Flutter We outline the struggles we had testing our flutter app, our approaches to those challenges, and the solutions we arrived at to solve those problems. Flutter provides good solutions for both screen testing and UI testing, but what about the middle-ground? With integration testing being a key level of the testing pyramid, we needed to find a way to test how features in our app interacted without the overhead involved with setting up UI tests. I’m going to take you through our testing journey from a limited native automated testing suite and heavy dependence on manual testing, to trying flutter’s integration testing solutions, to ultimately deciding to build out our own framework to increase confidence in the integration of our components. The beginning of our Flutter testing journey Up until early 2020, our mobile app was entirely native with separate android and iOS codebases. At the onset of our migration to flutter, the major testing pain point was that a large amount of manual regression testing was required in order to approve each release. This manual testing was tedious and time consuming for engineers, whose time is expensive. Alongside this manual testing pain, the automated testing in the existing iOS and android codebases was inconsistent. iOS had a larger unit testing suite than android did, but neither had integration tests. iOS also had some tests that were flaky, causing CI builds to fail unexpectedly. As we transitioned to flutter, we made unit/screen testing and code testability a high priority, pushing for thorough coverage. That said, we still relied heavily on the manual testing checklist to ensure the user experience was as expected. This led us to pursue an integration testing solution for flutter. In planning out integration testing, we had a few key requirements for our integration testing suite: Easily runnable in CI upon each commit An API that would be familiar to developers who are used to writing flutter screen tests The ability to test the integration between features within the system without needing to set up the entire app. The Flutter integration testing landscape At the very beginning of our transition to flutter, we started trying to write integration tests for our features using flutter’s solution at the time: flutter_driver. The benefit we found in flutter_driver was that we could run it in our production-like environment against preset test users. This meant there was minimal test environment setup. We ran into quite a few issues with flutter_driver though. Firstly, there wasn’t a true entry point we could launch the app into because our app is add-to-app, meaning that the flutter code is embedded into our iOS and Android native applications rather than being a pure flutter app runnable from a main.dart entry point. Second, flutter_driver is more about UI/E2E testing rather than integration testing, meaning we’d need to run an instance of the app on a device, navigate to a flow we wanted to test, and then test the flow. Also, the flutter_driver API worked differently than the screen testing API and was generally more difficult to use. Finally, flutter_driver is not built to run a suite of tests or to run easily in CI. While possible to run in CI, it would be incredibly costly to run on each commit since the tests need to run on actual devices. These barriers led us to not pursue flutter_driver tests as our solution. We then pivoted to investigating Flutter’s newer replacement for flutter_driver : integation_test. Unfortunately integration_test was very similar to flutter_driver, in that it took the same UI/E2E approach, which meant that it had the same benefits and drawbacks that flutter_driver had. The one additional advantage of integration_test is that it uses the same API as screen tests do, so writing tests with it feels more familiar for developers experienced with writing screen tests. Regardless, given that it has the same problems that flutter_driver does, we decided not to pursue integration_test as our framework. Our custom solution to integration testing After trying flutter’s solutions fruitlessly, we decided to build out a solution of our own. Before we dive into how we built it, let’s revisit our requirements from above: Easily runnable in CI upon each commit An API that would be familiar to developers who are used to writing flutter screen tests The ability to test the integration between features within the system without needing to set up the entire app. Given those requirements, we took a step back to make a few overarching design decisions. First, we needed to decide what pieces of code we were interested in testing and which parts we were fine with stubbing. Because we didn’t want to run the whole app with these tests in order to keep the tests lightweight enough to run on each commit, we decided to stub out a few problem areas. The first was our flutter/native boundary. With our app being add-to-app and utilizing plugins, we didn’t want to have to run anything native in our testing. We stubbed out the plugins by writing lightweight wrappers around them then providing them to the app at a high level that we could easily override with fakes for the purpose of integration testing. The add-to-app boundary was similar. The second area we wanted to stub out was the network. In order to do this, we built out a fake http client that allows us to configure network responses for given requests. We chose to fake the http client since it is the very edge of our network layer. Faking it left as much of our code as possible under test. The next thing we needed to decide was what user experiences we actually wanted to test with our integration tests. Because integration tests are more expensive to write and maintain than screen tests, we wanted to make sure the flows we were testing were the most impactful. Knowing this, we decided to focus on “happy paths” of flows. Happy paths are non-exceptional flows (flows not based on bad user state or input). On top of being less impactful, these sad paths usually give feedback on the same screen as the input, meaning those sad path cases are usually better tested at the screen test level anyway. From here, we set out to break down responsibilities of the components of our integration tests. We wanted to have a test harness that we could use to set up the app under test and the world that the app would run in, however we knew this configuration code would be mildly complicated and something that would be in flux. We also wanted a consistent framework by which we could write these tests. In order to ensure changes to our test harness didn’t have far reaching effects on the underlying framework, we decided to split out the testing framework into an independent package that is completely agnostic to how our app operates. This keeps the tests feeling familiar to normal screen tests since the exposed interface is very similar to how widget tests are written. The remaining test harness code was put in our normal codebase where it can be iterated on freely. The other separation we wanted to make was between the screen interactions and the tests themselves. For this we used a modified version of Very Good Venture's robot testing pattern that would allow us to reuse screen interactions across multiple tests while also making our tests very readable from even a non-engineering perspective. In order to fulfill two of our main requirements: being able to run as part of our normal test suite in CI and having a familiar API, we knew we’d need to build our framework on top of flutter’s existing screen test framework. Being able to integrate (ba dum tss) these new tests into our existing test suite is excellent because it meant that we would get quick feedback when code breaks while developing. The last of our requirements was to be able to launch into a specific feature rather than having to navigate through the whole app. We were able to do this by having our app widget that handles dependency setup take a child, then pumping the app widget wrapped around whatever feature widget we wanted to test. With all these decisions made, we arrived at a well-defined integration testing framework that isolated our concerns and fulfilled our testing requirements. The Nitty Gritty Details In order to describe how our integration tests work, let's start by describing an example app that we may want to test. Let's imagine a simple social network app, igrastam, that has an activity feed screen, a profile screen, a flow for updating your profile information, and a flow for posting images. For this example, we’ll say we’re most interested in testing the profile information edit flows to start. First, how would we want to make a test harness for this app? We know it has some sort of network interactions for fetching profile info and posts as well as for posting images and editing a profile. For that, our app has a thin wrapper around the http package called HttpClient. We may also have some interactions with native code through a plugin such as image_cropper. In order to have control over that plugin, this app has also made a thin wrapper service for that. This leaves our app looking something like this: Given that this is approximately what the app looks like, the test harness needs to grant control of the HttpClient and the ImageCropperService. We can do that by just passing our own fake versions into the app. Awesome, now that we have an app and a harness we can use to test it, how are the tests actually written? Let’s start out by exploring that robot testing technique I mentioned earlier. Say that we want to start by testing the profile edit flow. One path through this flow contains a screen for changing your name and byline, then it bounces out to picking and cropping a profile image, then allows you to choose a preset border to put on your profile picture. For the screen for changing your name and byline, we can build a robot to interact with the screen that looks something like this: By using this pattern, we are able to reuse test code pertaining to this screen across many tests. It also keeps the test file clean of WidgetTester interaction, making the tests read more like a series of human actions rather than a series of code instructions. Okay, we’ve got an app, a test harness, and robots to interact with the screens. Let’s put it all together now into an actual test. The tests end up looking incredibly simple once all of these things are in place(which was the goal!) This test would go on to have a few more steps detailing the interactions on the subsequent screens. With that, we’ve been able to test the integration of all the components for a given flow, all written in widget-test-like style without needing to build out the entire app. This test could be added into our suite of other tests and run with each commit. Back to the bigger picture Integration testing in flutter can be daunting due to how heavy the flutter_driver/integration_test solutions are with their UI testing strategies. We were able to overcome this and begin filling out the middle level of our testing pyramid by adding structure on top of the widget testing API that allows us to test full flows from start to finish. When pursuing this ourselves, we found it valuable to evaluate our testing strategy deficits, identify clear-cut boundaries around what code we wanted to test, and establish standards around what flows through the app should be tested. By going down the path of integration testing, we’ve been able to increase confidence in everyday changes as well as map out a plan for eliminating our manual test cases.
All Testing Software articles
-
Finding and Preventing Rails Authorization Bugs
Finding and Preventing Rails Authorization Bugs This article walks through finding and fixing common Rails authorization bugs. At Betterment, we build public facing applications without an authorization framework by following three principles, discussed in another blog post. Those three principles are: Authorization through Impossibility Authorization through Navigability Authorization through Application Boundaries This post will explore the first two principles and provide examples of common patterns that can lead to vulnerabilities as well as guidance for how to fix them. We will also cover the custom tools we’ve built to help avoid these patterns before they can lead to vulnerabilities. If you’d like, you can skip ahead to the tools before continuing on to the rest of this post. Authorization through Impossibility This principle might feel intuitive, but it’s worth reiterating that at Betterment we never build endpoints that allow users to access another user’s data. There is no /api/socialsecuritynumbers endpoint because it is a prime target for third-party abuse and developer error. Similarly, even our authorized endpoints never allow one user to peer into another user’s object graph. This principle keeps us from ever having the opportunity to make some of the mistakes addressed in our next section. We acknowledge that many applications out there can’t make the same design decisions about users’ data, but as a general principle we recommend reducing the ways in which that data can be accessed. If an application absolutely needs to be able to show certain data, consider structuring the endpoint in a way such that a client can’t even attempt to request another user’s data. Authorization through Navigability Rule #1: Authorization should happen in the controller and should emerge naturally from table relationships originating from the authenticated user, i.e. the “trust root chain”. This rule is applicable for all controller actions and is a critical component of our security story. If you remember nothing else, remember this. What is a “trust root chain”? It’s a term we’ve co-opted from ssl certificate lingo, and it’s meant to imply a chain of ownership from the authenticated user to a target resource. We can enforce access rules by using the affordances of our relational data without the need for any additional “permission” framework. Note that association does not imply authorization, and the onus is on the developer to ensure that associations are used properly. Consider the following controller: So long as a user is authenticated, they can perform the show action on any document (including documents belonging to others!) provided they know or can guess its ID - not great! This becomes even more dangerous if the Documents table uses sequential ids, as that would make it easy for an attacker to start combing through the entire table. This is why Betterment has a rule requiring UUIDs for all new tables. This type of bug is typically referred to as an Insecure Direct Object Reference vulnerability. In short, these bugs allow attackers to access data directly using its unique identifiers – even if that data belongs to someone else – because the application fails to take authorization into account. We can use our database relationships to ensure that users can only see their own documents. Assuming a User has many Documents then we would change our controller to the following: Now any document_id that doesn’t exist in the user’s object graph will raise a 404 and we’ve provided authorization for this endpoint without a framework - easy peezy. Rule #2: Controllers should pass ActiveRecord models, rather than ids, into the model layer. As a corollary to Rule #1, we should ensure that all authorization happens in the controller by disallowing model initialization with *_id attributes. This rule speaks to the broader goal of authorization being obvious in our code. We want to minimize the hops and jumps required to figure out what we’re granting access to, so we make sure that it all happens in the controller. Consider a controller that links attachments to a given document. Let’s assume that a User has many Attachments that can be attached to a Document they own. Take a minute and review this controller - what jumps out to you? At first glance, it looks like the developer has taken the right steps to adhere to Rule #1 via the document method and we’re using strong params, is that enough? Unfortunately, it’s not. There’s actually a critical security bug here that allows the client to specify any attachment_id, even if they don’t own that attachment - eek! Here’s simple way to resolve our bug: Now before we create a new AttachmentLink, we verify that the attachment_id specified actually belongs to the user and our code will raise a 404 otherwise - perfect! By keeping the authorization up front in the controller and out of the model, we’ve made it easier to reason about. If we buried the authorization within the model, it would be difficult to ensure that the trust-root chain is being enforced – especially if the model is used by multiple controllers that handle authorization inconsistently. Reading the AttachmentLink model code, it would be clear that it takes an attachment_id but whether authorization has been handled or not would remain a bit of a mystery. Automatically Detecting Vulnerabilities At Betterment, we strive to make it easy for engineers to do the right thing – especially when it comes to security practices. Given the formulaic patterns of these bugs, we decided static analysis would be a worthwhile endeavor. Static analysis can help not only with finding existing instances of these vulnerabilities, but also prevent new ones from being introduced. By automating detection of these “low hanging fruit” vulnerabilities, we can free up engineering effort during security reviews and focus on more interesting and complex issues. We decided to lean on RuboCop for this work. As a Rails shop, we already make heavy use of RuboCop. We like it because it’s easy to introduce to a codebase, violations break builds in clear and actionable ways, and disabling specific checks requires engineers to comment their code in a way that makes it easy to surface during code review. Keeping rules #1 and #2 in mind, we’ve created two cops: Betterment/UnscopedFind and Betterment/AuthorizationInController; these will flag any models being retrieved and created in potentially unsafe ways, respectively. At a high level, these cops track user input (via params.permit et al.) and raise offenses if any of these values get passed into methods that could lead to a vulnerability (e.g. model initialization, find calls, etc). You can find these cops here. We’ve been using these cops for over a year now and have had a lot of success with them. In addition to these two, the Betterlint repository contains other custom cops we’ve written to enforce certain patterns -- both security related as well as more general ones. We use these cops in conjunction with the default RuboCop configurations for all of our Ruby projects. Let’s run the first cop, Betterment/UnscopedFind against DocumentsController from above: $ rubocop app/controllers/documents_controller.rb Inspecting 1 file C Offenses: app/controllers/documents_controller.rb:3:17: C: Betterment/UnscopedFind: Records are being retrieved directly using user input. Please query for the associated record in a way that enforces authorization (e.g. "trust-root chaining"). INSTEAD OF THIS: Post.find(params[:post_id]) DO THIS: currentuser.posts.find(params[:postid]) See here for more information on this error: https://github.com/Betterment/betterlint/blob/main/README.md#bettermentunscopedfind @document = Document.find(params[:document_id]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1 file inspected, 1 offense detected The cop successfully located the vulnerability. If we attempted to deploy this code, RuboCop would fail the build, preventing the code from going out while letting reviewers know exactly why. Now let’s try running Betterment/AuthorizationInController on the AttachmentLink example from earlier: $ rubocop app/controllers/documents/attachments_controller.rb Inspecting 1 file C Offenses: app/controllers/documents/attachments_controller.rb:3:24: C: Betterment/AuthorizationInController: Model created/updated using unsafe parameters. Please query for the associated record in a way that enforces authorization (e.g. "trust-root chaining"), and then pass the resulting object into your model instead of the unsafe parameter. INSTEAD OF THIS: postparameters = params.permit(:albumid, :caption) Post.new(post_parameters) DO THIS: album = currentuser.albums.find(params[:albumid]) post_parameters = params.permit(:caption).merge(album: album) Post.new(post_parameters) See here for more information on this error: https://github.com/Betterment/betterlint/blob/main/README.md#bettermentauthorizationincontroller AttachmentLink.new(create_params.merge(document: document)).save! ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1 file inspected, 1 offense detected The model initialization was flagged because it was seen using create_params, which contains user input. Like with the other cop, this would fail the build and prevent the code from making it to production. You may have noticed that unlike the previous example, the vulnerable code doesn’t directly reference a params.permit call or any of the parameter names, but the code was still flagged. This is because both of the cops keep a little bit of state to ensure they have the appropriate context necessary when analyzing potentially unsafe function calls. We also made sure that when developing these cops that we tested them with real code samples and not just contrived scenarios that no developer would actually ever attempt. False Positives With any type of static analysis, there’s bound to be false positives. When working on these cops, we narrowed down false positives to two scenarios: The flagged code could be considered insecure only in other contexts: e.g. the application or models in question don’t have a concept of “private” data The flagged code isn’t actually insecure: e.g. the initialization happens to take a parameter whose name ends in _id but it doesn’t refer to a unique identifier for any objects In both these cases, the developer should feel empowered to either rewrite the line in question or locally disable the cop, both of which will prevent the code from being flagged. Normally we’d consider opting out of security analysis to be an unsafe thing to do, but we actually like the way RuboCop handles this because it can help reduce some code review effort; the first solution eliminates the vulnerable-looking pattern (even if it wasn’t a vulnerability to begin with) while the second one signals to reviewers that they should confirm this code is actually safe (making it easy to pinpoint areas of focus). Testing & Code Review Strategies Rubocop and Rails tooling can only get us so far in mitigating authorization bugs. The remainder falls on the shoulders of the developer and their peers to be cognizant of the choices they are making when shipping new application controllers. In light of that, we’ll cover some helpful strategies for keeping authorization front of mind. Testing When writing request specs for a controller action, write a negative test case to prove that attempts to circumvent your authorization measures return a 404. For example, consider a request spec for our Documents::AttachmentsController: These test cases are an inexpensive way to prove to yourself and your reviewers that you’ve considered the authorization context of your controller action and accounted for it properly. Like all of our tests, this functions both as regression prevention and as documentation of your intent. Code Review Our last line of defense is code review. Security is the responsibility of every engineer, and it’s critical that our reviewers keep authorization and security in mind when reviewing code. A few simple questions can facilitate effective security review of a PR that touches a controller action: Who is the authenticated user? What resource is the authenticated user operating on? Is the authenticated user authorized to operate on the resource in accordance with Rule #1? What parameters is the authenticated user submitting? Where are we authorizing the user’s access to those parameters? Do all associations navigated in the controller properly signify authorization? Getting in the habit of asking these questions during code review should lead to more frequent conversations about security and data access. Our hope is that linking out to this post and its associated Rules will reinforce a strong security posture in our application development. In Summary Unlike authentication, authorization is context specific and difficult to “abstract away” from the leaf nodes of application code. This means that application developers need to consider authorization with every controller we write or change. We’ve explored two new rules to encourage best practices when it comes to authorization in our application controllers: Authorization should happen in the controller and should emerge naturally from table relationships originating from the authenticated user, i.e. the “trust root chain”. Controllers should pass ActiveRecord models, rather than ids, into the model layer. We’ve also covered how our custom cops can help developers avoid antipatterns, resulting in safer and easier to read code. Keep these in mind when writing or reviewing application code that an authenticated user will utilize and remember that authorization should be clear and obvious. -
Guidelines for Testing Rails Applications
Guidelines for Testing Rails Applications Discusses the different responsibilities of model, request, and system specs, and other high level guidelines for writing specs using RSpec & Capybara. Testing our Rails applications allows us to build features more quickly and confidently by proving that code does what we think it should, catching regression bugs, and serving as documentation for our code. We write our tests, called “specs” (short for specification) with RSpec and Capybara. Though there are many types of specs, in our workflow we focus on only three: model specs, request specs, and system specs. This blog post discusses the different responsibilities of these types of specs, and other related high level guidelines for specs. Model Specs Model specs test business logic. This includes validations, instance and class method inputs and outputs, Active Record callbacks, and other model behaviors. They are very specific, testing a small portion of the system (the model under test), and cover a wide range of corner cases in that area. They should generally give you confidence that a particular model will do exactly what you intended it to do across a range of possible circumstances. Make sure that the bulk of the logic you’re testing in a model spec is in the method you’re exercising (unless the underlying methods are private). This leads to less test setup and fewer tests per model to establish confidence that the code is behaving as expected. Model specs have a live database connection, but we like to think of our model specs as unit tests. We lean towards testing with a bit of mocking and minimal touches to the database. We need to be economical about what we insert into the database (and how often) to avoid slowing down the test suite too much over time. Don’t persist a model unless you have to. For a basic example, you generally won’t need to save a record to the database to test a validation. Also, model factories shouldn’t by default save associated models that aren’t required for that model’s persistence. At the same time, requiring a lot of mocks is generally a sign that the method under test either is doing too many different things, or the model is too highly coupled to other models in the codebase. Heavy mocking can make tests harder to read, harder to maintain, and provide less assurance that code is working as expected. We try to avoid testing declarations directly in model specs - we’ll talk more about that in a future blog post on testing model behavior, not testing declarations. Below is a model spec skeleton with some common test cases: System Specs System specs are like integration tests. They test the beginning to end workflow of a particular feature, verifying that the different components of an application interact with each other as intended. There is no need to test corner cases or very specific business logic in system specs (those assertions belong in model specs). We find that there is a lot of value in structuring a system spec as an intuitively sensible user story - with realistic user motivations and behavior, sometimes including the user making mistakes, correcting them, and ultimately being successful. There is a focus on asserting that the end user sees what we expect them to see. System specs are more performance intensive than the other spec types, so in most cases we lean towards fewer system specs that do more things, going against the convention that tests should be very granular with one assertion per test. One system spec that asserts the happy path will be sufficient for most features. Besides the performance benefits, reading a single system spec from beginning to end ends up being good high-level documentation of how the software is used. In the end, we want to verify the plumbing of user input and business logic output through as few large specs per feature that we can get away with. If there is significant conditional behavior in the view layer and you are looking to make your system spec leaner, you may want to extract that conditional behavior to a presenter resource model and test that separately in a model spec so that you don’t need to worry about testing it in a system spec. We use SitePrism to abstract away bespoke page interactions and CSS selectors. It helps to make specs more readable and easier to fix if they break because of a UI or CSS change. We’ll dive more into system spec best practices in a future blog post. Below is an example system spec. Note that the error path and two common success paths are exercised in the same spec. Request Specs Request specs test the traditional responsibilities of the controller. These include authentication, view rendering, selecting an http response code, redirecting, and setting cookies. It’s also ok to assert that the database was changed in some way in a request spec, but like system specs, there is no need for detailed assertions around object state or business logic. When controllers are thin and models are tested heavily, there should be no need to duplicate business logic test cases from a model spec in a request spec. Request specs are not mandatory if the controller code paths are exercised in a system spec and they are not doing something different from the average controller in your app. For example, a controller that has different authorization restrictions because the actions it is performing are more dangerous might require additional testing. The main exception to these guidelines is when your controller is an API controller serving data to another app. In that case, your request spec becomes like your system spec, and you should assert that the response body is correct for important use cases. API boundary tests are even allowed to be duplicative with underlying model specs if the behavior is explicitly important and apparent to the consuming application. Request specs for APIs are owned by the consuming app’s team to ensure that the invariants that they expect to hold are not broken. Below is an example request spec. We like to extract standard assertions such as ones relating to authentication into shared examples. More on shared examples in the section below. Why don’t we use Controller Specs? Controller specs are notably absent from our guide. We used to use controller specs instead of request specs. This was mainly because they were faster to run than request specs. However, in modern versions of Rails, that has changed. Under the covers, request specs are just a thin wrapper around Rails integration tests. In Rails 5+, integration tests have been made to run very fast. Rails is so confident in the improvements they’ve made to integration tests that they’ve removed controller tests from Rails core in Rails 5.1. Additionally, request specs are much more realistic than controller specs since they actually exercise the full request / response lifecycle – routing, middleware, etc – whereas controller specs circumvent much of that process. Given the changes in Rails and the limitations of controller specs, we’ve changed our stance. We no longer write controller specs. All of the things that we were testing in controller specs can instead be tested by some combination of system specs, model specs, and request specs. Why don’t we use Feature Specs? Feature specs are also absent from our guide. System specs were added to Rails 5.1 core and it is the core team’s preferred way to test client-side interactions. In addition, the RSpec team recommends using system specs instead of feature specs. In system specs, each test is wrapped in a database transaction because it’s run within a Rails process, which means we don’t need to use the DatabaseCleaner gem anymore. This makes the tests run faster, and removes the need for having any special tables that don’t get cleaned out. Optimal Testing Because we use these three different categories of specs, it’s important to keep in mind what each type of spec is for to avoid over-testing. Don’t write the same test three times - for example, it is unnecessary to have a model spec, request spec, and a system spec that are all running assertions on the business logic responsibilities of the model. Over-testing takes more development time, can add additional work when refactoring or adding new features, slows down the overall test suite, and sets the wrong example for others when referencing existing tests. Think critically about what each type of spec is intended to be doing while writing specs. If you’re significantly exercising behavior not in the layer you’re writing a test for, you might be putting the test in the wrong place. Testing requires striking a fine balance - we don’t want to under-test either. Too little testing doesn’t give any confidence in system behavior and does not protect against regressions. Every situation is different and if you are unsure what the appropriate test coverage is for a particular feature, start a discussion with your team! Other Testing Recommendations Consider shared examples for last-mile regression coverage and repeated patterns. Examples include request authorization and common validation/error handling: Each spec’s description begins with an action verb, not a helping verb like “should,” “will” or something similar.