Recently I started GrappleGrid, Jiu Jitsu tournament software aimed at making it dead-simple for gym owners to run their own in-house tournaments. Initially, I focused a lot on the techncial aspect of making a great experience (and in my not-too-humble opinion, it is a great experience), but I missed a few very critical components that ended up teaching me some valuable lessons about building a software business.
What is Event-Based Saas?
If you’ve never heard the term “Event-Based SaaS” don’t fret——I just coined it. I’m defining “Event-Based SaaS” as software that lives or dies upon its usage within a very limited amount of time. As opposed to software that is used sporadically over time such as email, Twitter, YouTube, etc. We expect and even tolerate these tools to fail from time-to-time, but you can’t exactly “check back later” if your tournament software goes down. The show absolutely must go on. Streaming software, tournament software, and real-time voting software are all examples of this “Event-Based SaaS” idea.
Software within the Event-Based SaaS model really only gets a tiny window to shine, and if it fails——it fails completely and risks not ever being used again. Especially because, if you’re paying for something ONCE, you might tolerate a failure once and try again. But SaaS reaches into your pocket over and over. Folks are much less willing to tolerate failure when the ONE time they need your softare to work——it fails.
With this idea in mind, let’s explore the failure of running my very first tournament.
Failure 1 - Unknown unknowns can sink you
Before the tournamnet began, I spent hours running through theoretical situations and verifying all the functionality worked as expected. I was confident that all the core logic was rock-solid.
Little did I know that it is usually the things you don’t even know to check for that grind things to a halt.
Like any good tournament software, mine allows you to display a scoreboard that updates in real-time. A tournament software without the ability to display the score is like a plane without gas——it’s theoretically all there, but it doesn’t actually do the thing that you paid for. For this tournament, we used a series of laptops connected to TVs in various ways. Primarily, we used HDMI cables and Apple AirPlay.
Five minutes before the tournament is set to begin, we have ~200 people waiting to watch their kids compete. One of the table workers looks at me and says, “Hey the score and time aren’t updating.” A pit forms in my stomach. “What do you mean?”, I reply. He points at the TV showing that no matter what he clicks, the TV doesn’t update.
It’s hard to over-emphasize what a high-stress situation this was. As the tournament is about to begin, we find out that one of the laptops just “can’t” update the score. I’m scrambling:
- Is the server down? Overloaded? Out of memory?
- Is the DB responding? Connections saturated?
- Is the network down?
Nope, nope, and nope. Everything is fine except for this one laptop. Oh and by the way, we’re starting the tournament in 5 minutes.
As I eventually figured out, the laptop had a VPN that blocked web-socket connections——so the scoreboard wasn’t able to update! How do you plan for bugs like this?
Needless to say, my failure here was in failing to anticipate and plan a contingency for displays linked to the TVs. In the future, I will be experimenting with bluetooth/wifi-enabled raspberry PIs that just render a single web page (the /television link) and can listen to the server for updates.
Failure 2 - Overly-tight Constraints
A software habit that many engineers have (myself included), is overly-constraining your software. Overly-constraining your software means adding restrictions that don’t have a compelling logical or business-related reason. This can backfire as emergent user-behavior can tell you far more about what to build than your own presuppositions.
An example from the tournament: Initially, it seemed like a good idea to remove the “Scoreboard” button once a match was completed This would prevent table workers from accidentally starting a match that had already ended, or showing the scoreboard on the TV after it was completed. This seemed logical in theory.
In practice, other UI short-comings led to the UI getting into a “stuck” state where you couldn’t play the next match, because the previous match was “completed” but the results hasn’t been “published’. This is a great example of an arbitrary constraint leading to real bugs. User behavior is extremely unpredictable, and can give you a lot of valuable insight into how your product is used. But it can’t do that if it’s overly-constrained in a way that creates frustration and friction.
Failure 3 - Not enough real-time debugging tools
As mentioned previously, you only get one shot to prove your software if you’re in the realm of “Event-Based SaaS”.
I grossly under-estimated the number of times that I would need to troubleshoot something “live”, while the tournament was running.
Through sheer luck, I had added a ?debug=true
flag the night before to let me see match IDs in such a way that I could “un-stuck” the matches, however, I really could’ve used MUCH more debugging capability.
Ideally, I would’ve built a few more “quick fix” tools into an admin panel to make changes on-the-fly when necessary, to keep the tournament flowing.
This issue was exacerbated by the overly-constrained UI and models that I described above.
Failure 4 - No backup plan
Like any software developer having their code properly stress-tested for the first time, I expected bugs to arise during the tournament. However, I didn’t expect to need to substitute my laptop as a display-driver due to operational failures. This resulted in me not having access to my Rails console to update issues as they popped up——something I had planned on doing. The end result was a lot of scrambling and somehow barely managing to fix or work around (most) of the issues that popped up during the tournament.
Takeaways
In the end, the tournament was a success. The kids had fun. Scores were tracked.
But I couldn’t help the feeling that I had failed. In trying to replace our last tournament software, I felt like I had created a much worse experience. This isn’t the part where I cop-out and tell you that now I’m generating $45k MRR on my tournament product. I haven’t run another tournament on my software yet. I hope to run another tournament in the future, but there’s a lot of holes to be patched and new functionality to build.
Most success stories I’ve read all include a few failures, some seemingly-permanent. The key is how they use setbacks and failures to become more robust and more resilient. We have to learn from our mistakes for them to be of any value.
With that in mind, here’s my key takeaways for anyone wanting to build an Event-Based SaaS product:
- Perform a few low-stakes events first. The more dry-runs you can do before “showtime” the better prepared you’ll be. I only did one tiny live bracket before this tournament and I regret not doing more.
- Allow emergent behavior in your software as much as you can safely allow. Don’t restrict users clicking on things just because of aesthetics.
- Build-in real-time debugging tooling so that you can manually adapt during your first proper stress test.
- Identify and mitigate (to the best of your ability) the external “hard” dependencies (i.e. Televisions) of your software.
While this tournament was easily the most stressful day of my entire software career, it also taught me a few very valuable lessons about software design. Not all SaaS is built the same, and what can be ignored in some businesses is mission-critical in others (looking at you real-time web socket updates!) I’ve been building software for over 8 years, but I’ve never spent time writing or talking about it. This year I aim to change that as I write, record, and tweet about the software I’m building.
Thanks for reading! If you enjoyed this, you may also enjoy following me on twitter!