.Net, Azure and occasionally gamedev
While I haven't spent a lot of time working on the new HomeApp infrastructure a significant amount of time that I did spend on it went into making sure automated testing will be possible.
Since I haven't really mentioned it so far I decided to make it the main point of this post.
As with most of my hobby projects thorough testing has always been an afterthought.
Usually I will write plenty of unit tests that cover the basics but when it comes to integration or system tests I'll steer clear and do manual tests instead. So far it has always been "good enough".
However with its multi component system I knew from the start that thorough testing will be a necessity for this new homeapp architecture unless I want to drive myself crazy.
Infact the previous homeapp infrastructure that is still running just fine (with only the app + one web app + one program on the raspberry) could have already benefited a lot from integration tests.
Back then I decided I don't need integration tests/can just debug the issues manually and I'm already paying for that: for plenty of issues that arose I had to debug the webapp + the raspberry process in tandem which isn't really a nice experience and involved a lot of trial and error.
So for the new homeapp infrastructure I wanted to do testing correct from the start.
I have already talked about automation in previous posts. And I want to echo again how important it is to have the infrastructure automated so that testing can benefit from it.
Along with my usual builds in VSTS I have unit and integration tests running as part of the build. If any test fails, the build is not released.
After a build has finished successfully it is automatically deployed to the internal testing environment.
After said deployment I have further validation tests running automatically against the test environment. If anything fails here I get instant feedback and the code is not moved to production environments.
This setup allows me (with a relatively high degree of confidence) to release continuous updates. If I were to work on this project fulltime, my multiple git pushes a day would result in multiple production releases a day.
Should anything go bad after the release (e.g. a case not covered by tests) I have even more safeguards in place: First of all nothing is released straight to the "production" environment. I have placed another environment in front of that: the "preview" environment.
I haven't yet made a decision whether users will get access to said preview environment or not (they would benefit from earlier/faster feature updates at the cost of their being more issues) so for now only I have access to it and use it as a pre release testing environment.
For now I have to manually approve the move from preview to production, but I fully intend to make it automatic as well because I have yet another safeguard in place:
Should a mistake make it all the way to production I can role production back to the previous state (this topic is probably worth its own post later).
What I want to point out with all that is that good automation gives you higher confidence in releases and allows you to move faster.
Instead of having a "big bang" release where app + hub + web must all be updated at once (and god help you rolling back when one of those causes issues) I instead am able to do hundreds of tiny releases so when an issue arises it will be easier to pin point the root cause as each release is smaller in size.
With all that as a premise I now want to get to the main point..
With all the automation in place I have a pretty solid starting point for testing.
The most obvious that I don't think I need to cover much.
For all my components I have tests that mock dependencies and test the individual units. So far I have "only" a few hundred of them and they run very fast (less than 5s for all of them).
Since a big part of my code (called "HomeApp.Hub") is running on a raspberry pi and is acting as a central hub of IoT devices I had to find a way of mocking all the IoT communication.
Luckily there are only two ways how my code on the raspberry is communicating with said IoT devices: http and gpio.
Usually with integration tests you would pick two components and mock away all their dependencies to then test them in combination with each other.
The problem with my hub is that almost all components need to communicate with the web (command receiver needs to pull commands via SignalR from the cloud, image capture manager needs to fetch images from cameras inside the local network, image uploader needs to upload images to the web, ...).
So for pretty much every test I had to write http interceptors to test that certain conditions are met (first POST here, then a GET followed by another GET to this endpoint, ..).
In writing all these http interceptors I came up with the idea of simulating the devices.
Instead of writing the http mocking code in my tests I "simply" went ahead and built my own software IoT devices.
The simulation suite has "only" three layers of logic stacked on top of each other.
The top layer is a simple WPF application that is not needed for automated testing. I can use it for manual testing as well as to click together a device suite and save said device suite to a json configuration file.
I can then use said configuration file as the input for the next simulator layer: a .Net Core based simulator; either controlled by the WPF application or via cli (for automated testing).
For my tests, I just start the core simulator with the config file and any further commands can be relayed via the console input of the process. Since the config file already tells the simulator what to do, the only command I usually have to send is "exit" at the end of my test.
The real finesse is now at the final layer: The actual IoT devices.
Each IoT device is also spawned as a separate .Net Core process and is controlled by the simulator via IPC.
If I want to simulate a camera, I just feed the config to the simulator or send it command via standard input: "camera 8080 "/snap\.jpg\?param=.*" test.jpg".
The simulator will then spawn the new process. In the case of a camera, the new process will open a webserver at the specific port (8080) and will listen for requests.
The regex will then either match any incoming requests (and serve the file from disk) or not match (and return 404).
Additionally the simulator core and all simulated IoT devices are connected via IPC. If the core simulator shuts down, all IoT devices will also shut down (they send continuous keep-alive ping requests between them: if the Ping-Pong stopps, the IoT device just kills itself).
This architecture allows me to spawn a whole array of devices fully automated and my hub software can't tell the difference between a real device and the simulation (well except for the localhost:8080 vs. actual camera ip; but it doesn't treat it any different internally).
For GPIO I haven't yet finished the simulator but it will be a IPC based plugin that is loaded into the hub only for tests (since GPIOs are only available on linux via "/sys/class/gpio" files the plugin will mock out the GpioManager with its IPC based counterpart).
That way I am again able to control the software devices via my simulators.
With all the setup done for my integration tests I can now also write really short system tests:
Verify that the hub makes a http request to camera 1 and camera 2 (because group 1 contains cameras 1 and 2).
Verify that the hub makes a http request to "the cloud" (which I intercept with my mock cloud endpoint).
Here I can easily verify that the specific image upload endpoints where hit with the expected parameters (images retrieved from simulators + correct camera ids).
Or if I want to test the system from end to end, I could spin up a new instance of the web api, have the hub upload images to the new instance and verify that the images are saved to blob storage in full + thumnail resolution.
The final part of my testing suite I have already mentioned: Tests that run after deployments.
So far I have set up a few of those only but they are easily extended.
I am using azure functions that are triggered as post deployment gates that run automated tests such as:
If any of those tests fail I know that the latest deployment was faulty and I can quickly pinpoint the root cause.
With all these test layers in place I am much more confident in adding new features as I can be sure they don't break existing stuff without me having to manually retest everything all the time.