The structure of an immersive theatre app
Published on:
You Me Bum Bum Train is an immersive theatre experience that I first became involved with in January 2016, during their previous show run. Details about the show are kept deliberately secret, an NDA covers most of it, but the general premise is that you as a patron (known in the show as a “passenger”) travel through a series of different scenarios back to back, each evoking a different experience or feeling. The show attracts an army of volunteers in the thousands and is a huge undertaking both from a personnel, and as I found out, technical, standpoint.
When I first got involved, it was mostly working backstage during the show and helping with the “get out” (the tearing down of the sets at the end of the show). During this time, I learned about the unique app that helps keep the show running smoothly. Unfortunately, this app was built on a cloud service that got shut down shortly after the last run ended… so I volunteered to help build a new version for the next show. This iteration ended up being used in the most recent 2024/2025 run, some 8 years later!
This post outlines some of the requirements of the software, the challenges the requirements brought, and what I ended up building to solve them. If you prefer to just read what the key learnings were, you can skip to Summary / Learnings
If you’re more curious about the what the show is, or how to get involved yourself there are a few links at the bottom of this post.
Requirements
As mentioned, a lot of what goes on as part of the show is secret, but the general requirements I was working to looked something like this:
- Must be able keep track of people moving between scenarios
- Must precisely manage people’s time in a scenario to keep the experience running smoothly
- Must allow people working in each scenario to keep track of who is coming/going and when
- Must have an overview for monitoring and managing the flow of people between scenarios
- Must work in real-time with instant updates as people move
- Must be hosted in the building
- Must continue to provide as much functionality as possible if network connection drops
The last couple of requirements above were added in response to surprise dropping of support for the previous version of the software, when the service shut down, and issues caused by it being hosted on the internet and so prone to breaks in service. Everything else was an existing requirement.
Development Process
For much of the project I was the sole developer of the software, although I had a few other people supporting me by answering my many questions on how things should work. This means a lot of the software stack and approaches taken fell into one of two categories:
- Things I’ve worked with a lot and can quickly build something with
- Things I’ve wanted to try on projects and not yet been able to
As a result I fell into a development process and stack similar to work I do at Global, although due to the app being developed entirely in my free time the timescale was very much stretched, with work generally being done over periods of an hour or so a week and the occasional intense coding session when I had some time off and no plans.
The app also evolved quite a lot over the 8 years, going through fairly major changes as patterns were tried and failed, or new libraries and frameworks came about.
Version 1
The first version of the software looked something like this by the end:
- Django backend, talking to a MySQL database, using Django Rest Framework to expose APIs and Django Channels to expost a WebSocket server
- A single React/Redux app with multiple screens for different roles
- nginx in front of everything
The system ended up using an event model to track every event going on within the show, and a state machine, of sorts, to handle the transition between states based on the events. All these events were sent from the React app to the backend via the WebSocket following a user action. The backend then recorded the event in the database, before relaying it out to other clients.
Using this approach meant that everything reacted pretty much instantly on any event changes and was fairly resilient. If an app ever needed to be refreshed or restarted, the APIs provided the means to rebuild the state by replaying the events.
Offline capability was provided by having each app have its own version of the state machine and the ability to update it locally based on its own events, so it would behave as though everything was changing, but without persisting to the backend. On reconnecting the events would be forwarded in bulk to sync up.
Problems
While the setup above satisfied all the requirements, a few issues started to show up when it was put to the test:
State Differences
Because the app and the backend both had a state machine, but they were written in different languages (Python vs. JavaScript) sometimes subtle differences in the agreed state would sneak in due to things like slightly different timestamp handling, or just language behaviour.
This was not a deal breaker, but was a big enough issue to be spotted on many occasions.
WebSocket Reliability
Django Channels was fairly new to me, being one of the pieces of tech I’d not used at Global and had wanted to try. It turned out that this inexperience made for a less reliable setup than I’d had before using NodeJS sockets. However, I’m sure Django Channels would be up to the task if I’d spent more time understanding how to properly use it.
App Separation Issues
With several screens for different user roles sharing one app there were some sepation issues, both with state for different screens using a central Redux setup, and just from a plain “who should be able to see what” aspect.
Next Steps
These 3 main issues led me on to building a second major version, mostly from scratch, with the new requirements added:
- Separate apps for each key role in the show
- Single implementation of the state machine
- JavaScript based WebSockets
I also had a few other upgrades in mind;
- Switch to using Postgres rather than MySQL
- Have everything in TypeScript rather than JavaScript for the apps
- Newer packages for pretty much everything
Version 2
The second and (currently) final version of the software became this:
- Django backend, talking to a Postgres database, using Django Rest Framework to expose APIs
- A BFF (Backend for Frontend) NodeJS server that acted as the interface between APIs and React apps
- 5 distinct React apps for each of the 5 key roles involved in running the show
- nginx in front of everything
The event model was solid, so that was kept, along with a single version of the state machine, now written in TypeScript, keeping the Django/Python side to a simple API database wrapper.
The new BFF middleman acted as the state machine of truth and also kept a snapshot of the current state for the different app roles, so they could instantly restore off the one source rather than replay events in their own state machine.
The React apps still communicated via WebSockets, but now with a NodeJS server in the mix it could be more familiar JavaScript WebSockets on both ends.
To maintain the offline capability, the apps still maintained their own state machine, but using a shared library the code was identical so the desync issues from Version 1 didn’t manifest.
Summary / Learnings
Having gone through 2 major versions of the software and now having had it operating successfully through an entire run of the show for approximately 6 months, these have been my major learnings:
Don’t Duplicate Important Logic
Having the state machine in two different places was a major pain, and I’m not sure what I was thinking trying to make it work seamlessly. If you need to split logic across different places, try and at least have it in the same language so potential differences like comparisons and timestamp parsing don’t trip you up.
Keeping Things In Sync Is Really Hard
More than just the state machine language split issues, keeping everything in sync was a real challenge, and there are still some smaller bugs that occur now and again due to the precise ordering and arrival of events under weird conditions. That being said, I’m not sure what I could have done better here without over-complicating things.
Type Safety Is Really Useful
Having TypeScript rather than plain JavaScript in the second version highlighted all kinds of issues that could have become bugs in the code. In the BFF especially, which needed to be really stable, avoiding issues due to missing or null data probably saved countless hours.
Know What You’re Getting Yourself Into
This was no small project, and what I thought would be a fun little thing to work on turned into a years-long project that required me to push some other projects aside. I’d still do it again, as I learnt a lot and tackled all sorts of interesting challenges along the way. In hindsight, I should definitely have thought a bit more about what the scope of the project might be before committing.
Links
The website of the You Me Bum Bum Train show itself