Written by
Duncan
on May 20, 2025
Following a Show HN tumbleweed moment, and with the onset of Christmas Holidays, my wandering (read: procrastinating1) mind was drawn to a possible use case for the WalledGarden.ai Data SIM Card: a global roaming2 AI device for the elderly and/or visually impaired. Specifically to assist my dear departed friend’s, Frank’s3, widow - Mabel - to remain in communication4 with friends and family, both when at home and abroad2:
This first (and last?, Ed.) blog entry delves5 into the travails associated with bending the will of a 7-year-old KaiOS phone to dance to my, octogenarian-assisting, tune. Said tech “adventure” being made possible by a Heath Robinson-esque collection of (uncooperative) tools and services.
It is not intended as a definitive How-To guide but to detail some of the pitfalls faced which I hadn’t found documented elsewhere. Who knows, maybe it will help someone else (with equally foolhardy ideas) in future…
Note: the blog is a work in progress. I’m conscious it may end up far too long and should be split into multiple posts. If you start to lose the will to continue to the end of this post - you’re not alone; it’s the same feeling I had when midway through creating Wabbit S2 (let alone writing the blog ¯\_(ツ)_/¯ ). Here, just for you - a TL;DR:
Nokia 8110 4G + Momoho BTS0011 + WalledGarden.ai SIM Card + JsSIP + Asterisk (FreePBX) + Jitsi Meet + Browserless.io + Slixmpp + Jami + Deepgram + OpenAI Realtime API + Sesame AI + what3words + RovisTech 3.4mm AirTag + Plusnet Hub One + Cisco SPA232D
= Wabbit S2
- Possible Issues
- Requirements
- VoIP (SIP/RTP)
- KaiOS (Wabbit S2 Client)
- Linux Audio (Wabbit S2 “Agents”)
- General Approach
- The Building Blocks
- Nokia 8110 4G & JsSIP
- Nokia 8110 4G & JsSIP Technical Tidbits
- TCP listener - Netcat
- TCP listener - JavaScript
- Hosts File
- Slider Position Detection
- Battery Life
- JsSIP VoIP calling & RTP codecs
- Manage SSL Certificates
- ADB, WebIDE, Marionette (AKA How to remotely start a KaiOS App)
- SSH daemon & armv7l-eabihf Binaries
- GPS
- Wifi SSIDs
- Nokia 8110 4G & JsSIP Technical Tidbits
- Momoho BTS0011 Bluetooth (“Hands Free”) Speaker
- BTS0011 Technical Tidbits
- Microchip RN52SRC Technical Tidbit
- Asterisk (FreePBX)
- Asterisk Technical Tidbits
- Call Files
- ExtenSpy ChanSpy
- Play Audio into Channel
- Asterisk Technical Tidbits
- Jitsi Meet, Browserless.io & Slixmpp
- Integration Testing
- PJSUA Twinkle Jami
- Deepgram
- ALSA? PulseAudio? JACK? PipeWire? WirePlumber?!?!
- Linux Audio Loopback Technical Tidbits
- Loopback
- pactl
- aj-snapshot & pw-jack
- Linux Audio Loopback Technical Tidbits
- Initial End-to-End Testing
- OpenAI Realtime API
- Sesame AI
- what3words
- RovisTech 3.4mm AirTag
- iPhoneSE (2016) Technical Tidbit
- AirTag Technical Tidbit
- Find My / AirTag, iPhone Battery Usage Mystery
- Plusnet Hub One & Cisco SPA232D
- SPA232D Technical Tidbits
- Plusnet Hub One, Frank
- Nokia 8110 4G & JsSIP
- Voice Command List (Call Morpheus)
- Jitsi Chat Command List
- Mabel, 2025
- Wabbit S2, The Future?
I believe it is better to have something that you may need in an emergency and not need it - than to need that thing in an emergency and not have it. A problem arises when you haven’t developed a habit of carrying said thing (e.g. a phone); it may not be available when you need it the most…
Alas between starting the process of creating Wabbit S2 and writing this blog such a situation has already occurred: the absence of Mabel’s mobile phone (Doro, left at home) when she found herself in “a bit of a pickle”6. Will the Wabbit S2 suffer a similar left-at-home fate?
Even if you do have a phone at hand - you should also be willing to use it. Unfortunately there may be a reluctance to ask for help, even in situations involving acute pain outside Kwik Save (“I didn’t want to mither”). It is also possible, although I hope not, that Mabel would be “reluctant to talk to Siri Wabbit S2 because it feels to her like there is an actual person there and she doesn’t want to bother them”.
Only time will tell if these potential problems come to pass, I’m hoping that the benefits of using Wabbit S2 will encourage it to be carried on a regular basis. We shall see…
In no particular order:
- Good battery life - this is essential; ensuring a device is charged regularly can be challenging. I’m hoping that standby times of a week or more can be achieved.
- Ability to monitor battery life remotely. Irrespective of how good (or, not) battery life is - remote monitoring enables me to intervene (i.e. charge the battery) if required. I don’t want Mabel to end up in a situation where the battery has run out on the Wabbit S2 (as periodically occurs with her Doro phone).
- A physical mechanism (e.g. flip phone), other than the pressing of a keypad button(s), to initiate and hang-up calls. Open = you’re talking to Wabbit S2, Closed= end connection to Wabbit S2.
- Once connected all interactions should be via voice.
- Ideally the device would be controlled remotely (a “cloud phone”). The need for on-device physical interaction should be minimal.
- Easy to add and/or swap out Voice/AI functionality without fundamentally changing the infrastructure or overall design.
- To be able to interact with “ChatGPT” via voice; providing an experience similar to dialing 1-800-CHATGPT in the US - but for a UK-based user without access to US toll-free numbers, and without the 15-minutes-per-month time limit.
- I like Doro’s implementation of their emergency/alert-button feature (send SMS & dialing a list of contacts), although would prefer more flexibility in the configuration of said functionality when implemented in Wabbit S2.
- Location-awareness. Either in-built functionality, or at least the ability to report on observed Wifi SSIDs.
- Nice to have: location tracking in the event of battery depletion and/or out of cellular range (e.g. in a hospital7).
- Nice to have: a second SIM slot, for backup connectivity.
- When presented with a person at the door, silent and holding a portfolio of paintings in one arm, it would be useful to know what it says on the card he is holding up8.
- Nice to have: spoken translations of postcards written in a foreign language (Hungarian).
- As the current topic du jour - <not yet>maybe find an excuse to shoehorn the Model Context Protocol (MCP) in there somewhere</not yet>.
- The Wabbit S2 should be built upon reasonably low-cost hardware, with a plentiful supply of them on the likes of eBay.
I imagined the broad brushstrokes of my eventual Landscape with Wabbit S2 Rabbits “masterpiece”9 to be as follows:
# VoIP (SIP/RTP)
From Wikipedia:
“Voice over Internet Protocol (VoIP), also known as IP telephony, is a set of technologies used primarily for voice communication sessions over Internet Protocol (IP) networks, such as the Internet. VoIP enables voice calls to be transmitted as data packets, facilitating various methods of voice communication”
For a service that is primarily voice-controlled, and utilises a data-only WalledGarden.ai SIM Card, what better way to transport this voice data than by using Session Initiation Protocol (SIP) and Real-time Transport Protocol (RTP). These protocols are long established, well understood, and a wide variety of both software and hardware solutions (clients and servers) have implemented them to support VoIP connectivity. The latter point being especially important for extending Wabbit S2 to other devices in future.
Each key component of the Wabbit S2 infrastructure will be a VoIP client, with a VoIP server in the core directing voice traffic (i.e. calls) as necessary. Excluding Mabel’s Wabbit S2 client(s) (that initiates calls), all other VoIP clients - the Wabbit S2 “Agents” - will only receive calls, and they will be configured to auto-answer. It is the answering of the VoIP call that will start the agent’s underlying functionality (e.g. stream audio to AI). When the call ends so does the audio processing on the agent.
# KaiOS (Wabbit S2 Client)
The superb KaiOS.dev site provides a brief history of KaiOS, with some additional background in GSMArena’s Flashback: Firefox OS burns down, KaiOS rises from the ashes.
The use of HTML5/CSS/JavaScript as the primary method for developing KaiOS apps made it quick to get started with creating the Wabbit S2 client. A wide variety of feature phones run KaiOS, a number of which I thought could be likely candidates for Wabbit S2 (mainly the flip phones):
- Banana Hackers: List of KaiOS Devices & their Debug status
- KaiOS.dev: The Complete List of KaiOS Phones (2024)
- KaiOS Tech: Explore KaiOS-enabled phones
Providing some comfort as to the possibility of success with Wabbit S2 on KaiOS - there was at least one other KaiOS app, WhatsApp, that had implemented voice calling over IP* - so I knew it was possible (in theory). Hope springs eternal…
See this Reddit Post for background on How does KaiOS (and Firefox OS in general) work? (Gaia > XULRunner > Gecko > Gonk). For the sake of simplicity I’m going to stick to only two concepts for the rest of the blog - Linux10land (e.g. stuff available via ADB/shell commands) and FirefoxOSville (e.g. KaiOS-specific APIs available only via JavaScript).
* albeit, I assumed, not VoIP using SIP/RTP. Meta has also removed WhatsApp support for KaiOS devices and it no longer works.
# Linux Audio (Wabbit S2 "Agents")
I naively thought the easiest, implementation agnostic*, way to bridge the agents’ VoIP client’s audio to their respective functionality (e.g. Speech to Text) would be via audio loopbacks (AKA virtual audio cables). Linux audio had surely moved along (read: now it will “just work”) since my painful ALSA days in the early 2000s (The Year(s) of the Linux Desktop). It’ll be a cinch!
* regardless of the agent’s function (e.g. OpenAI Realtime API) - as long as the application can send and receive audio to my virtual sound device(s) it can interoperate with the Wabbit S2 VoIP infrastructure. It also has the benefit of being able to use the audio loopbacks to record audio on the agents, helping debug audio issues (e.g. is it a problem with the VoIP connection to/in the core?).
# General Approach
To quickly (ha!, good luck with that!, Ed.) establish audio connectivity between the chosen handset and AI(s), using the proposed building blocks below. Each element will be tested in isolation, to confirm basic functionality and suitability for the task, before eventually linking them all together and testing as a whole (i.e. Wabbit S2).
Should fundamental problems arise at the early stages of testing then things can be changed or the entire endeavour reassessed.
A recurring theme throughout the process of developing Wabbit S2, that I had to remind myself on numerous occasions, after losing many hours (read: days…) down technical rabbit holes:
Perfect is the enemy of good
If something doesn’t work out and workarounds exist, even if inelegant or have imperfect trade-offs, then work around the problem(s) rather than delaying end-user testing by trying to fix it. The sooner the Wabbit S2 was in Mabel’s hands then the sooner I could iron out real problems - functionality or implementation issues that I hadn’t foreseen and/or as a result of the environment it is being used in (e.g. battery life/signal strength).
# Nokia 8110 4G & JsSIP
The excellent Banana Hackers Wiki and very detailed Notebookcheck.com review should provide you with all you ever wanted to know about the Nokia 8110 4G. It has a hardware slider, seemingly good battery life, there is a dual SIM variant, runs Firefox OS KaiOS, and you can get root access - what’s not to like?!?!? Well… an excerpt from the Banana Hackers Wiki entry for the phone:
“This device is not a good option as a daily driver. Its keypad gets unusable after 2-3 months of normal usage.”
Thankfully using the keypad isn’t, outside of my initial testing, one of the requirements. Having had a few of these from eBay I can certainly see the point they’re making; I’ve had the full range of experiences from “like new” to the keypad being very frustrating to use. On a positive note - even when the handset appears well-used (read: knackered) the slider functions perfectly well, albeit with a little more play in it.
One thing I’m not particularly enamoured with is the Nokia 8110 4G’s Micro USB charging port - I suspect it is lacking in physical support within the chassis of the phone. Unfortunately, over time, this can result in some wobble in the connector and the possibility of intermittent charging when being connected and disconnected by (septo|octo|nono)generian hands (i.e. my initial testers & end-users).
My first port of call for attempting VoIP on KaiOS, before expending effort on understanding exactly how WhatsApp may have achieved it, would be the JavaScript VoIP library JsSIP.
The Nokia 8110 4G - a KaiOS 2.5 device - is loosely based on Firefox 48 (released in 2016). On the possibly flawed assumption that I may have more success with KaiOS and a version of JsSIP from around a similar timeframe - I chose to use JsSIP 2.0.6 (also released in 2016). The phone itself is running the TA-1059-v12.00.17.06 KaiOS firmware from the Banana Hackers wiki.
To-Do: test with later revisions of KaiOS - I’ve no doubt that some issues (e.g. debugger-socket failure after first use) are resolved in later versions.
The Nokia 8110 4G’s network connectivity will be severely restricted, with access being provided by the WalledGarden.ai Cybersecurity SIM Card, and limited to only those connections required for Wabbit S2 to function.
Nokia 8110 4G & JsSIP Technical Tidbits
Battery Life
Fairly early on in developing Wabbit S2 I realised that standby battery life on the Nokia 8110 4G could be quite good, often more than 10 days - maybe even more than two weeks. Attempting to limit standby battery consumption was however severely hampered when running a(ny) KaiOS app permanently in the background (even in-built apps such as Messaging). It was clear what had to be done: anything required to be running or monitoring 24/7 (e.g. slider status) would need to run in Linuxland, with Wabbit S2’s FirefoxOSville apps (access to KaiOS APIs/etc) being run only when needed.
Having established my Wabbit S2 background-services baseline (e.g. slider position detection) I started to take more of an interest in monitoring battery usage programmatically, and setup remote polling of the 8110 4G’s battery percentage every 6 hours. I’d monitor for at least a week without any physical interaction with the phone, to see how things looked in a best-case scenario
Even with my Wabbit-S2-battery-consuming overheads - the 8110 4G, on the 3 (UK) network with 3-4 bars signal (4G), continued to operate on standby for more than 7 days. (Un)fortunately I’d also noticed a rather large increase in battery consumption if the phone was moved merely a meter from its original test position, that I assumed was directly related to a drop-off in signal to 2-3 bars (sometimes only 1 bar).
Long story short - this also occurred when being used in Mabel’s home :(. The signal-related battery depletion problem was severe; when in a handbag hanging on the back of a door - it caused the battery to drop to zero in only a few days (even on standby/not doing anything). Thankfully the WalledGarden.ai SIM is able to roam on every UK network. Switching to Vodafone (4G) resolved this issue with a much better signal, even in the middle of Mabel’s home. Standby times in excess of 10 days could now be achieved.
TCP listener - Netcat
Run a Netcat “HTTP Web Server” TCP listener in Linuxland (auto-starts on boot, note: runs as root - boo!) that responds with the battery percentage in JSON.
You’ll need to be root to make these edits:
mount -o remount,rw /system
busybox vi /system/etc/init.qcom.post_boot.sh
Append the following line (don’t forget the ampersand (&) at the end!):
mount -o remount,ro /system
Note: I suspect the cellular modem’s data connectivity (or KaiOS itself) may “auto-sleep” if no traffic has been sent/received for a while; I then don’t see a response from the phone’s TCP listeners (e.g. ADB). Once I wake the phone I see an NTP request (as also occurs on boot) and then everything starts working again. I poll battery life every 6 hours - this appears to be enough to prevent the “auto-sleep”/data not working.
TCP listener - JavaScript
How to run an “HTTP Web Server” TCP listener in FirefoxOSville (in a KaiOS app) using navigator.mozTCPSocket:
Embedding an HTTP Web Server in Firefox OS
Note:
Should be changed to (addition of “.socket”):
Hosts File
Entries added to /etc/hosts to limit how network-chatty (=consumes battery) the 8110 4G is:
Whatever processes call the above hosts didn’t seem to have an appreciable increase in battery usage by their failing to connect. In normal use, with the /etc/hosts additions above, the only non-Wabbit S2 connections I could see* was NTP (e.g. on boot) for time sync.
* when using WalledGarden.ai’s Packet Capture functionality to review TCP/IP traffic emanating from the phone’s cellular data connection.
Slider Position Detection
In Linuxland - put this in a loop and process the output: getevent -q -t -c 2 -n /dev/input/event1.
Manage SSL Certificates
To add SSL Certificates to KaiOS (e.g. signing cert for Asterisk’s WSS/WebSocket over TLS SIP transport, used by JsSIP) you can use b2g-certificates - A shell script to add root certificates to Firefox OS.
JsSIP VoIP calling & RTP codecs
From the outset I suspected that getting JsSIP and the Nokia 8110 4G to play nicely would be challenging… Unfortunately it was also on the critical path to successfully using the phone as a Wabbit S2 client :(.
A simple KaiOS VoIP app (index.html & index.js) using JsSIP that works on the 8110 4G:
Much to my amazement - success! my rudimentary* Nokia 8110 4G/JsSIP call testing to another SIP client (via Asterisk) worked well. Not only that, it - gasp! - had been easy! (hubris?, Ed.). At this point I was feeling rather pleased with myself; the first anticipated problem had been hurdled with nary a stumble - it surely augured well for other technical challenges with Wabbit S2… Yes - this early VoIP success was indeed an omen, Damien*.
JsSIP debug logs show the following codecs as supported/proposed:
- ILBC = ~26Kbps (≤22MB an hour), audio is “ok”/acceptable.
- PCMU (AKA uLaw/u-Law/Ulaw) = ~86Kbps (≤80MB an hour), audio is quite good.
Given I only needed audio options for “good” and “good enough” - I haven’t yet tested Opus, G722 or PCMA.
Within FreePBX, for the Nokia 8110 4G’s extension, I Disallowed Codecs: ilbc&alaw&g726&g722&gsm&opus and Allowed Codecs: ulaw. The SIP transport on the extension being set to wss (WSS = WebSocket over TLS).
The WalledGarden.ai SIM provides at least 1GB (per month) of data transfer11 - talk times of ~11 hours PCMU and ~45 hours ILBC can be achieved within that 1GB.
* Here Be Dragons!; read on for Jitsi Meet, Browserless.io & Slixmpp integration testing.
ADB, WebIDE, Marionette (AKA How to remotely start a KaiOS App)
Using ADB and Firefox’s WebIDE connecting to the debugger-socket (adb forward tcp:6000 localfilesystem:/data/local/debugger-socket) is a widely used method for managing and running apps on KaiOS. Similar functionality is also available via the same debugger-socket using the BananaHackers gdeploy tool - “an open and simple CLI application manager for KaiOS”.
Unfortunately, at least in the version of KaiOS I was using at the time, the debugger-socket would fail after it had been connected to only once (i.e. WebIDE would fail after subsequent retries). stop b2g and start b2g as root would resolve the problem but it wasn’t an ideal situation to be in.
I wanted a simple way to both remotely start KaiOS apps and send a message(s) to those apps from Linuxland; Gecko’s Marionette helped me achieve this aim. Marionette is “an automation driver for Mozilla’s Gecko engine. It can remotely control either the UI or the internal JavaScript of a Gecko platform, such as Firefox”.
Appending the below lines to (example profile path used) /data/b2g/mozilla/tr2ho9sw.default/prefs.js enables KaiOS’s Marionette server:
With Marionette running I’m able to run JavaScript in the context of the KaiOS Gecko engine - a useful side effect of this is the ability to send Notifications and activity Messages. Messages can be delivered to a KaiOS app (which is started if not already running) but crucially these only work if the phone is awake; you can not use Messages alone to start an app if the device is asleep. If you first send a Notification to the phone this wakes the device and ensures that a subsequent Message is delivered to the app (auto-started if required).
An older version of a golang Marionette client is available on GitHub.
Example JavaScript (can be run via Marionette):
var newNotification = new Notification('', {}); newNotification.close();
and then, assuming your KaiOS app is registered to receive “latest-message-js” activity messages:
var newMessage = new MozActivity({ name: "latest-message-js", data: {}});
SSH daemon & armv7l-eabihf Binaries
It’s possible to run an SSH daemon on the 8110 4G (yay!), as described in this post on the Banana Hackers (comp.mobile.nokia.8110) group. Unfortunately permanently running the SSH daemon appears to deplete the battery by an extra ~10% per day (boo!).
There are also lots of compatible (armv7l-eabihf) tools/statically-linked binaries (e.g. stunnel) that work with the 8110 4G on GitHub (yay!).
That said, don’t use binaries downloaded from unknown sources m’kay; use at your own risk/etc (boo!).
GPS
GPS on the Nokia 8110 4G, and similar variants of that HMD hardware - e.g. Nokia 6300 4G, DOES NOT WORK! when connected via LTE. I eventually discovered, although annoyingly can no longer find, a comment by a cheesed off KaiOS developer (Fabrice?) raising the point (maybe on Reddit?) that it is a hardware issue with these phones and wasn’t something that could be resolved by them in the OS.
The workaround, if it can be called that, is to switch to 2G. Though this may not be a problem in the UK, with the WalledGarden.ai SIM and its ability to roam on every UK network - including O2 (which has 2G), this may cause problems in other countries that no longer have any remaining 2G networks.
I spent far too long trying to figure out what the hell was going with GPS before realising it was a fool’s errand (at least with the 8110 4G) and I should have given up sooner. With potential increased battery usage, slow time to first fix, and other GPS-related limitations - I had never intended to use it much but it was a nice-to-have (e.g. in the absence of Wifi SSIDs).
To-Do: create some sort of no-wifi-ssids-get-gps-fix functionality - enabling airplane mode, getting the GPS fix, then disabling airplane mode and reconnecting.
Wifi SSIDs
Normally Wifi on the 8110 4G (Wabbit S2) is disabled to limit battery usage. Wifi is only enabled when GeoLocation is required - the signal strength of the SSIDs that it can see are reported on, Wifi is then disabled again.
Though reliance on Wifi SSIDs for precise(ish) location may limit Wabbit S2’s GeoLocation efficacy in non-urban areas - perfect is the enemy of good. 99% of the time Wabbit S2 will be used in urban areas6 - Wifi & Cell Tower GeoLocation is good enough.
# Momoho BTS0011 Bluetooth ("Hands Free") Speaker
TL;DR: No dice with the BTS0011, you can skip this section as it mainly exists for those looking for BTS0011 or RN52SRC information in future. Note: I haven’t (yet) completely given up on the Bluetooth sidecar idea…
Having decided upon the 8110 4G I thought it would be a useful addition to have a button (a BTS0011) that Mabel could humanely pin (geddit?12) to her cardigan to communicate with Wabbit S2, the phone remaining in her handbag unless required for camera duties. Despite access to the manual, and as is so often the case with technology, the devil is in the detail; there was no way to know how well it would actually work (or, not) with a Nokia 8110 4G. I duly ordered a BTS0011.
Spoiler alert: It didn’t take very long to be disabused of my plans for the BTS0011…
Positives:
- It’s small, has a single button, and I think would be quite good as a device to interact with AI via voice and/or button-presses.
- Though not immediately obvious from reviews, where the primary use case is as a portable speaker, it can be successfully used as a “hands free” speaker for calling. It works reasonably well for issuing voice commands to Wabbit S2 when the BTS0011 is placed near to your mouth (e.g. clipped to the top of a cardigan).
- Easy to turn on and off (long press of the button), with audible bing’n’bongs to indicate state (powered on, powered off, connected, etc).
- To minimise battery usage of the BTS0011 it would only need turning on when communication was required with Wabbit S2, once done it could then be powered off with a long press of the button.
Negatives:
- The microphone doesn’t pickup sound very well from a distance. It certainly wouldn’t be suitable as a “normal” hands free device (e.g. on a desk in front of the person).
- When it does pickup your voice the sound can be a bit robotic and wouldn’t be pleasant for other people on the call. Deepgram could however understand voice commands issued by the BTS0011-originated caller without issue.
- It would not work for the Wabbit S2 use case: even after initial pairing the Nokia 8110 4G’s Bluetooth had to be in discoverable mode for the BTS0011 to be able to reconnect to it (e.g, when powering off & on the BTS0011). Bluetooth discoverability eats into the 8110 4G’s remaining battery life at quite a clip, defeating a key objective of Wabbit S2 - good battery life.
BTS0011 Technical Tidbits
Details of the BTS0011 (when paired to a Mac and clicking the Apple icon > About This Mac > System Report > Bluetooth):
getevent info when the BTS0011 is connected to the 8110 4G:
getevent log when the BTS0011 button was pressed and released (Note: 0x73 = 115 = F4):
logcat when the BTS0011 button was pressed and released:
Though pressing and releasing the button could be seen in logcat and getevent it did not appear to be visible to Javascript’s keydown, keyup or keypress events (using document.addEventListener).
BTS0011>8110 4G Bluetooth ‘sidecar’
Given the 8110 4G/Bluetooth battery issues I’d discounted my initial idea regarding direct integration between the BTS0011 and 8110 4G. I hadn’t however completely given up on a dedicated Wabbit S2 button… Maybe I could create a dedicated Bluetooth ‘sidecar’, with better battery life & improved Bluetooth/battery management, that could bridge between the BTS0011 (wirelessly via Bluetooth) and the 8110 4G (TRRS 3.5mm jack, incl. headset-button signalling). I envisaged the ‘sidecar’ as one of those tube-like USB powerbank things with a small module (ESP32+RN52SRC+’other bits’) neatly packaged and plugged into the USB A output for power:
Unfortunately there appeared to be a dearth of options when it came to Bluetooth Audio Modules providing a) Master functionality and b) also supporting the BTS0011’s profiles. I assume the limited range of, currently in production, modules supporting this functionality is due to it being a rather niche requirement (gah!, the story of my life). I scotched any thoughts of proceeding with the sidecar idea. This was probably for the best as my mind was now seriously off-piste from the original plan (concentrate on the 8110 4G!). Bluetooth sidecar:
Fin
Microchip RN52SRC Technical Tidbit
The standard RN52 Bluetooth Audio Module and its variants support A2DP, AVRCP, HFP, SPP and iAP in the capacity of Bluetooth Slave role. The RN52SRC supports A2DP, AVRCP, HFP in the capacity of Bluetooth Master role (yay!).
It appears that the RN52 and RN52SRC are the same hardware - just with different firmware loaded. Alas the RN52SRC, required for my use-case as a Bluetooth Master to the BTS0011 Slave, may have unresolved problems and Microchip seems to have “fixed” this by removing the RN52SRC firmware link from their website (and possibly halting RN52SRC development after v1.00?).
Though the RN52SRC firmware link has been removed from Microchip’s website it is, as of writing, still hosted by them: RN52SRC v1.00 DFU Image (Firmware).
Note: the RN52/RN52SRC is discontinued (boo!).
# Asterisk (FreePBX)
Asterisk is a VoIP and telephony server, with FreePBX available as a web-based GUI to manage it. Asterisk contains all of the features I thought would be required for Wabbit S2 and using FreePBX allowed me to quickly get going with the core VoIP infrastructure:
- Extensions and Conferences.
- SIP over WebSocket support (for JsSIP).
- “Simple” to integrate with other SIP providers and VoIP Trunks for dial-in/dial-out.
- A REST API.
- Ability to play audio into channels.
- Audio channels can be duplicated and sent elsewhere (e.g. ExtenSpy, ChanSpy, audiofork).
Using the functionality above, I imagined it looking something like:
The Wabbit S2 client dials into an Asterisk conference number (1000): all audio communication with friends and family is directly via this conference number (dial-in, dial-out, or via Jitsi Meet’s Jigasi).
Where audio is required to be processed by machines this is piped from the Wabbit S2’s extension’s audio (i.e. Microphone’s output to Asterisk) into another, dedicated-per-function (e.g. transcription, Orchestrator=2000), audio conference. When the function (i.e. agent) is required to be activated its extension is dialed-in to its respective conference number.
Testing multiple realtime transcription services in parallel is as easy as dialing in additional agents to the Orchestrator’s conference number (2000). It is also possible to enable, within Asterisk, conference 2000 to always be recorded - these recordings are then available for non-realtime transcription testing and comparison.
Should two-way audio be required the response audio (e.g. OpenAI Realtime API responses) is piped from the agent’s extension’s audio (i.e. the output audio from AI) into the original conference - 1000.
In this configuration it is only the Wabbit S2 operator that issues voice commands or talks to AI, although all callers within conference 1000 are able to hear the responses. Commands can still be issued to the Orchestrator agent, from Jitsi Meet attendees, by way of Jitsi Meet chat messages. Realtime transcriptions from Deepgram are also copied into the Jitsi Meet chat.
Asterisk Technical Tidbits
Call Files
Asterisk Call Files provide “the ability to initiate a call from outside of the normal methods such as the dialplan, manager interface, or spooling interface”.
What’s that you’re wondering: how do I pull Jigasi into an Asterisk (FreePBX) conference using a Call File? (they don’t yet know what Jigasi is - they haven’t read that far, Ed.)
You can ignore the last line (SIPHEADERS) if you’re just dialing a normal extension (e.g. Wabbit S2 Agent) into a conference.
ExtenSpy ChanSpy
The somewhat novel approach to audio management, splitting audio streams between conferences, is achieved using Asterisk’s ChanSpy functionality. The Initial End-to-End Testing section details why I didn’t use ExtenSpy.
Play Audio into Channel
Playing MP3 audio in Asterisk resulted in more problems than I expected, given the maturity of both Asterisk and the MP3 format (“it seems it is using a build from 1997?!”).
Thankfully OpenAI supports outputting Opus audio files for its Text-to-Speech responses. Asterisk doesn’t have the same problems with Opus as I experienced with MP3* (low volume level/decoding errors/etc).
* yet again(!) I spent far too long trying to resolve the MP3 issues, rather than just working around them.
# Jitsi Meet, Browserless.io & Slixmpp
Jitsi Meet is described as “fully encrypted, 100% open source video conferencing solution that you can use all day, every day, for free — with no account needed.”. Ignoring the video part of this - it also provides audio conferencing and text-based chat, accessible both via a web browser and the Jisti Meet mobile app (iOS and Android). Think of it like Microsoft Teams but not a bag of sh*t.
Jiti Meet’s most relevant feature for integrating with Wabbit S2 (Asterisk VoIP) is Jigasi: “Jigasi will register on your SIP server with some identity and it will accept calls. When Jigasi is called, it expects to find a ‘Jitsi-Conference-Room’ header in the invite with the name of the Jitsi Meet conference the call is to be patched through to”.
Combining Asterisk and Jitsi Meet’s Jigasi gives me multiple routes for inbound connectivity: telephony, browser, app - and also provides an easily accessible platform for command and control (Jitsi Meet chat via browser or app). Anyone with the Mabel-specific Jitsi Meet URL can join the meeting while the Wabbit S2 is connected, useful when needing handover between callers during an emergency. Asterisk is also connected to a VoIP provider, allowing direct dial-in to the conference.
Unfortunately I believe the intention of Jigasi is only to join an existing Jitsi meeting but it isn’t intended to be used to start a new meeting. It wasn’t immediately obvious to me how I could easily start a Jitsi meeting programmatically without resorting to scripting it via browser automation.
My Heath Robinson solution to this problem was to use Bloodiko’s Jitsi Bot, that allows auto-joining - and hence starting - Jitsi meetings, and its jitsi.html?room=jitsiroomname URL. Using Browserless.io to call the Jitsi Bot URL, when the Nokia 8110 4G joins the Asterisk conference, allows the Jitsi meeting to be started by the browser-based Jitsi Bot. Jigasi is then able to join the meeting and bridge audio between Jitsi and Asterisk. Browserless.io eventually times out the session and leaves the meeting, with the meeting continuing until such time as the Wabbit S2 hangs up and all associated SIP calls (e.g. Jigasi) & conference channels are automatically destroyed (by a custom script). The, now empty, Jitsi meeting then ends.
Slixmpp is used to connect the Orchestrator (transcription/Deepgram) agent to the meeting’s Jitsi Meet chat; sending realtime transcriptions to the meeting, and receiving commands from meeting attendees.
One thing I haven’t yet got to the bottom of is how to increase the length of chat history provided to new joiners in a Jitsi meeting. I envisaged the chat history as a way for new joiners to get up to speed with whatever problem had arisen (i.e. why they had been sent an SMS asking them to join the meeting). Unfortunately this isn’t practical as new-joiner chat history is limited (I assume a default), and I’m not (yet) sure how to increase it.
To-Do: find alternative solution for new-joiner chat history visibility. An XMPP reverse proxy would also likely be able to rewrite (or insert if undefined) whatever value Jitsi is sending when requesting chat history.
Integration Testing
At this point I was ready for some basic integration testing. The Nokia 8110 4G & JsSIP was working with Asterisk, Asterisk and Jitsi Meet (Jigasi) was working - all should be good in the world of Wabbit S2. I roped someone in as the fellow Jitsi Meet user and then placed a 8110 JsSIP call to the Asterisk conference… and we started to converse…
After a very brief period of audio an unmistakable high-volume SCREECHING started! (MY EARS!). That hellish feedback noise brought about by the endless looping echo of audio (e.g. pointing a microphone at the speaker it is outputting to) :(
FOR F*CKS SAKE!
This was BAD!. It was clear - the problem was both a) an echo-induced problem, and b) a deal-breaker for using the 8110 4G as a Wabbit S2 client :(.
Before seeing sense my mind started conjuring up various “solutions” - would something like PBXMate help with echo cancellation? will they even sell me a single-extension license for their inline/SIP product?!?! could the echo problem be fixed natively in Asterisk?
STOP!
I was over-thinking things: surely this is an issue with the speaker, microphone and/or echo-cancellation of the 8110 4G?
I placed another call… this time when the screeching started I didn’t hang up - I looked at the phone while moving it about in front of me. The problem then became obvious…
WHY DIDN’T I REALISE THIS BEFORE!?!?
FFS! I could now tell that the audio from the call was coming out of the rear speaker(!?!?), and not from the handset’s ear speaker on the front of the phone. The microphone was picking up the audio from the external speaker and this was the root cause of the echo-loop.
In my defense I hadn’t noticed the rear-speaker-audio-output problem as the rear speaker isn’t that far from the earpiece - the rudimentary testing audio had seemed fine when the phone was at my ear. Nevertheless, excuses or not, the problem had to be resolved. As a fully paid-up member of the sunk-cost-fallacy-club I certainly wasn’t going to let my time invested in the Nokia 8110 4G go to waste - it required nothing less than investing(?) even more time in the Nokia 8110 4G.
It wasn’t immediately obvious to me how I could use the earpiece speaker - it didn’t seem to be exposed to Gecko/JavaScript. This also appeared to be the case with the microphones (I got the distinct impression that audio was being picked up by both microphones).
Note: based on my testing, I believe the both-microphones behaviour also occurs with normal voice calls via cellular (i.e. not using JsSIP).
dumpsys | grep "type:" | cut -d ":" -f 2 | sort | uniq
I knew the devices existed - but not how to access them. Thankfully I eventually I came up a solution (per JS and HTML in JsSIP VoIP calling & RTP codecs) that outputs through the earpiece speaker on the 8110 4G.
Note: I’ve subsequently had to set the 8110 4G’s Media Volume to 15 (maximum), to help hearing-impaired users - this being especially apparent when using the Wabbit S2 in a noisy environment. Unfortunately in a quiet environment the microphone does sometimes pick up a little of the audio from the, now maximum volume, earpiece speaker. Thankfully this doesn’t cause the awful echo-screeching, and in practice this is less of a problem as Mabel’s hair seems to provide a modicum of sound insulation - lessening the likelihood of the call’s audio being picked up by the mic(s).
# PJSUA Twinkle Jami
I spent a not inconsiderable amount of time trying to get to the bottom of reliability issues with both PJSUA and Twinkle SIP calls. PJSUA-based VoIP clients failed on a fairly regular basis and Twinkle’s CLI variant was even noted in the release notes as potentially being unstable, though initially Twinkle (standard/non-CLI) appeared to be working quite well. It wasn’t to be, obvs - read on for Initial End-to-End Testing.
Having given up on PJSUA and Twinkle as reliable (for my use case) VoIP clients I eventually arrived at Jami. A perfectly capable VoIP client which hides its SIP-client capabilities under a bushel, I assume prefering to lead with its rich suite of (non-SIP) communication functionality.
Not much else to add - Jami has been great, and the Jami Daemon can run in the background independent of the Jami UI. My experience of Jami as a VoIP client has only been positive.
# Deepgram
I chose Deepgram’s Speech to Text streaming transcription API as I noticed they have a phonecall variant of their nova-2 model that is, per Deepgram’s documentation, “optimized for low-bandwidth audio phone calls”; this matches my use case and works well with Wabbit S2’s VoIP audio. Deepgram’s free tier helpfully includes $200 of credit, and they also have a Python SDK with useful examples for streaming transcription.
The Wabbit S2 infrastructure design makes it easy to replace the Speech to Text audio processor, or even benchmark multiple transcription models and/or providers in parallel. Reviewing other streaming transcription providers/models (e.g. OpenAI) is an expected task for future me.
One missing (AFAIK) feature from the audio AIs (Deepgram streaming transcription, OpenAI Realtime API) is the ability to download the audio as sent or received by the AI from the AI’s perspective. For example: knowing what audio was received by Deepgram would help diagnose if it misheard words - or if the audio itself had issues due to underlying transport (i.e. internet/network/routing) problems.
Note: All Deepgram transcription responses are sent into the Jiti Meet chat; useful to provide late meeting joiners with some context as to the extent of the (“Help me*”) emergency.
* see Voice Command List section.
# ALSA? PulseAudio? JACK? PipeWire? WirePlumber?!?!
Needless to say, linux audio/audio loopbacks was a massive pain in the arse. I’m going to save you the pain of me waxing (ranting) lyrical about the fun’n’games I had with linux audio and just stick to what eventually worked for me.
Note: I accept that many of the initial linux audio problems may have been due to my, rather niche, requirements in trying to get specific combinations of clients and applications to all play nicely with audio, and the manner in which I was intending to achieve this. In real-world use sending voice data via VoIP-Asterisk-VoIP-Agent-AudioLoopback-VoIP-AI works quite well*, even if the convoluted nature of the arrangement would suggest otherwise.
* possibly due to the AI’s ability to deal with shortcomings of their input audio. Albeit the audio appears ok to me, at least at the last hop I can record it - the audio loopbacks. See Deepgram note above re: audio downloading nice-to-have.
Linux Audio Loopback Technical Tidbits
- habets.se (2021): Virtual audio cables
- sorrel.sh (2024): PipeWire how-to: uncomplicated loopbacks
- bennett.dev (2024): Automatically Link Pipewire Nodes with Wireplumber
In an effort to debug reliability issues of various items (VoIP clients/etc) I’d been testing between Debian Bookworm (12) and Trixie (13). Of course(!) the b*stard config file format for WirePlumber >0.5 “changed dramatically”, and of f*cking course the package versions in the Bookworm and Trixie repositories straddle either side of this change (you’re ranting, stop it, Ed.). I eventually settled on Debian Trixie.
Loopback
cat > ~/.config/pipewire/pipewire.conf.d/10-loopback.conf
pactl
aj-snapshot & pw-jack
/usr/bin/pw-jack /usr/bin/aj-snapshot -rjqx ~/aj.snapshot.file
# Initial End-to-End Testing
I’d finally got to the point where I had some confidence in each individual building block and now it was time to bring them all together. I was damn well going to break the back of this project, and I’d do it before daylight came around again.
It was coming up to 3:30AM… Success! Woop Woop! I’d managed to tie everything together; building a bridge from the Nokia 8110 4G all the way, over the technical-hell-gap/integrating the constituent parts of Wabbit S2, to AI(s).
Granted this metaphorical bridge, in its initial form, was held together with duc(k|t) tape and bailing wire - but it was my bridge and it served its purpose well (or so I thought…). Stomping up’n’down on the bridge would be tomorrow’s task - identifying/breaking possible weak spots, tweaking and hopefully ensuring reliable operation. I slept well that night, content with the day’s efforts.
Upon waking I couldn’t help myself… I’d try another few calls - why not? After all it had worked perfectly on numerous occasions - in fact only a few hours before. I placed the call… everything appeared connected but, what - WHAT?!?! WTF!, transcription had stopped working?!?!?!
FOR F*CKING F*CKS SAKE!
WHY WOULD THE TECHNOLOGY GODS SMITE THEE!
Long story short (a recurring theme) - after much investigation I realised I had two (FFS!) problems, both intermittent (FFS!):
- Reliability issues with the Twinkle VoIP client (as was used on the agents at the time).
- Asterisk’s ExtenSpy wouldn’t always function as expected. I suspect due to some sort of race condition(s) brought about by my making it do things it wasn’t really intended for.
The solutions:
- Replaced Twinkle with Jami.
- Switched to ChanSpy, that doesn’t appear to have the same issues. Albeit I’m making it do things it wasn’t really intended for.
Since making these changes everything has worked reliably.
Confession: I had an inkling something wasn’t quite right with both Twinkle and ExtenSpy as once or twice I’d had issues with them that I couldn’t explain. My profound “solution”: put it to the back of my mind - future me would resolve the problem(s).
# OpenAI Realtime API
An experimental OpenAI Realtime API client for Python. Appears to work ok, although the audio is sometimes a litle jittery/clipped.
Note: Per Deepgram note - it would be useful to be able to download a copy of the audio stream as sent/received by the OpenAI Realtime API.
# Sesame AI
Placeholder. Work In Progress: allow Mabel to talk to Sesame AI.
# what3words
Placeholder. To-Do.
# RovisTech 3.4mm AirTag
The RovisTech 3.4mm AirTag affords me a slim form factor AirTag that I can attach to the 8110 4G to meet my GeoLocation-with-no-signal/battery-depleted use case7. I’m currently experimenting with the best place to stick it to the 8110 4G - maybe on the slider…
Yes, Apple’s Find My does have its limitations but perfect is the enemy of good - the AirTag is the (GeoLocation) belt to complement the Wabbit S2 (GeoLocation) braces; two (one point five?) is one and one is none.
To hopefully limit “AirTag is following you” type alerts to other Apple users, when the AirTag has been out of range of its owner’s iDevice(s) for some time (8-24 hours?), I have an unused-but-iCloud-connected 2016 iPhoneSE with Bluetooth enabled that lives at Mabel’s house.
iPhoneSE (2016) Technical Tidbit
As part of my prevent-AirTag-alerts testing, and also to see how long a, ostensibly 100% battery-health, 100%-charged, iPhoneSE (2016) would last in the absence of power - I plugged it into a fully charged Voltaic V50 and left it. The V50 is 13,400mAh battery with Always On Mode. The iPhoneSE contains a 1,624 mAh battery.
With the iPhoneSE (iOS 15.8.3) set to:
- Optimised Battery Charging
- Low Power Mode force-enabled by a Shortcuts automation (i.e. IF Low Power Mode disabled THEN enable Low Power Mode)
- Connected only to cellular data (Wifi Disabled). With all apps, except Settings, disabled for Mobile Data access.
- Location disabled.
- Bluetooth enabled (“Hello AirTags! Don’t worry! You’ve not ventured far from your owner”).
- Send an iMessage (via Shortcuts automations) when connected & disconnected from power, and also at 10% intervals of battery depletion (90%,80%,etc).
It tooks 13 days before the battery on the V50 was depleted by charging the iPhoneSE. After a further 4 days without external power from the V50 the iPhoneSE’s battery eventually dropped to zero. Between ~80% and ~10% the battery depleted at a fairly consistent 10% every 7-8 hours. I assume between 10% and 0% it entered some sort of super-battery-saver mode (albeit still cellular connected*), as this took a further 20(!?!) hours before the phone eventually dropped offline. It had one last brief communication with setup.icloud.com before it finally croaked.
Over the duration of the testing the iPhoneSE, doing “nothing”* other than AirTag-whispering, transferred ~60MB of data using the cellular connection.
When only running on the internal battery the iPhoneSE appeared to consume power at approximately half the rate than when connected to an external power supply. Maybe it would be possible to increase total runtime by only delivering power to the iPhone when needed (95%>15%, charge back to 95%, etc). If this worked and the iPhoneSE retained a similar power consumption profile then ~30 days runtime (incl. V50) may have been possible, as opposed to the ~16 days (to ~10%) observed above.
* +iUploading of data to Tim’s mothership.
AirTag Technical Tidbit
The following day, after the eventual power down of the iPhoneSE, the AirTag started to jingle when moved (my tester had been carrying it on their person religiously). Interestingly they haven’t been warned about an unknown AirTag following them - I assume this is due to Location Services generally not being enabled on their phone.
The tester, not associated with my iCloud account, was carrying both AirTags (Apple & RovisTech) so I could see if they had the same Find My performance characteristics. I couldn’t discern any notable differences between the two when comparing location updates in Find My.
Find My / AirTag, iPhone Battery Usage Mystery
It is a widely held belief in the household of my AirTag-tester, formed during the ownership of numerous iPhones and iOS versions, that their iPhone will start to eat battery when an iOS upgrade is due to arrive. Invariably when battery life starts to drop precipitously a message about upgrading iOS is only around the corner, and if this upgrade is ignored then battery life continues to be impacted; “you will upgrade!, lest the battery-beatings continue!”.
During the weeks of the above AirTag testing I’d heard numerous complaints about the (recently) dying battery on their iPhone - having to charge twice as often as normal. It got to the point where they started to consider replacing their battery; I counseled against it - leave it a few weeks I said, let’s see how things go - maybe you’ll see an iOS upgrade notification… Though I couldn’t be sure, I did wonder - was this increased battery usage related to my AirTag testing?.
The morning that the iPhoneSE finally powered down* was the day that my test user’s iPhone (iOS 18.4.1) no longer needed to be charged twice a day. Though the AirTags remain in the household and being carried as before, the increased battery usage of the householder’s iPhone was gone (and no pending iOS upgrade for them either).
To-Do: re-run the testing to see if the iPhone battery mystery repeats itself.
* the tester wasn’t aware of this fact.
# Plusnet Hub One & Cisco SPA232D
Mabel is a regular user of her landline phone and, despite shortcomings of the current (old/damaged) handset, is resolute in not wishing to change it - I suspect due to long-term familiarity with the existing button layout. Connecting a Cisco SPA232D to Mabel’s landline allows calls to be initiated by the Wabbit S2 that could, if Mabel wishes, be continued by picking up the existing landline handset. As time passes an increasing number of people are getting rid of their landlines and this march of progress reduces the ease with which Mabel is able to contact them; having previously commited to memory, and/or handset-button muscle memory, their landline numbers. Controlling the dialing process via voice (with their contact name) should help with this problem. Enter stage left - Plusnet Hub One.
As with the Nokia 8110 4G, the Cisco SPA232D is long past its End of Support - thankfully they both exist in isolated networks with restricted and well defined connectivity to/from the device(s). The SPA232D’s network connectivity is via the Plusnet Hub One, loaded with OpenWRT, that provides a firewall and WireGuard VPN connectivity to FreePBX/Asterisk (routing outbound calls via the SPA232D).
SPA232D Technical Tidbits
The Richards Place blog has an excellent post on the Cisco SPA232D and FreePBX (2021) settings. Unfortunately, and as I assumed (i.e. that it would need a firmware upgrade), the blog states to “make sure you are running 1.4.1” as “if you don’t you may have issues with getting a 202 error making outgoing calls”. Brilliant… a(nother) problem squarely in the ballpark of my use case.
Gah! it “looks like cisco has taken the download down from their site” - with Cisco no longer providing a firmware download where could one find it?…
Cisco SPA232D Firmware 1.4.1, on GitHub (yay!).
Plusnet Hub One, Frank
For many years the Plusnet Hub One & BT Business Hub 5 have been my go-to routers for low-end OpenWRT installations; they’re cheap (<£10 on eBay), plentiful, and are of a reasonable specification. Some years ago I built - with the assistance of Frank - a self-contained, Raspbery PI Zero + Ethernet, OpenWRT-flasher.
The device consisted of two parts, first - a wooden base (under the router in this photo) upon which sping-loaded clips had been attached that securely hold the router down. The top two plastic extrusions (there are four of these posts in total, seen poking through the PCB in the first picture) are used as anchor points for the flasher:
Once the router was secured to the base - the flasher itself, shown below, is lowered onto the two extruded post and firmly held in place by clamps:
With extensive experience in the intricacies of clock making* Frank had no problems ensuring that the holes drilled, for the three pogo pins, perfectly aligned with their intended touch points on the router’s PCB. Had I attempted it myself I would have almost certainly ballsed it up. What better way to spend an afternoon than watching a clock being made by Frank, followed by creating an OpenWRT flasher. Decades ago Frank could never understand why I spent money on modems, or time spent playing with telephone systems; the modems may have gone - but the phones (Wabbit S2) have not :)…
* including, but not limited to, being on the team that produced the British Horological Institute’s:
- Queen’s Millenium Clock (Facebook post from the Sandringham Estate).
- Project 150 Clock (BBC News. BBC News Image - it’s the complicated looking thing on the left).
I miss Frank.
When Mabel slides open the Nokia 8110 4G it automatically connects to Wabbit S2 and she is greeted with “Hello Mabel, how may I help you?”. A list of the voice commands available are shown below, Wabbit S2 sends beeps to confirm receipt of a command. Once Mabel has finished talking to Wabbit S2 she closes the slider and the call is ended.
At the moment Wabbit S2 uses basic string matching (boo!) of the Deepgram speech-to-text output to activate the relevant function. I’m considering using13 going to use AI to derive intent from the spoken words, as opposed to being tied to a restricted list of commands. This would prevent having to add endless variants of the same command(s) - a real-world example being Mabel’s inclination to use “please put me through to”, as if talking to a telephone operator, as opposed to using the “call” command.
I’m under no illusions that Mabel will be able to remember all of these commands - the reality is, at best, only a few will be comitted to memory and used on a regular basis. However, they are available for use - even if Mabel calls me first so I can let her know what the relevant command is.
Note: items with strikethrough (like this) are currently a work in progress. All other commands function as documented.
“Where am I?”
Uses reported Cell Tower, Wifi SSIDs and signal strengths to GeoLocate the Wabbit S2. This is then converted to a street address using TomTom’s Reverse Geocode API, and also a what3words geocode reference. The what3words, street address and estimated accuracy (in meters) are spoken to the caller.
A link to a Google Maps Street View, for an approximation of the location the person finds themselves in, and also an OpenStreetMap link, are added into the Jitsi Meet chat.
Note: the Wabbit S2’s location is automatically determined at the start of every call, and the associated map links added to the Jitsi Meet chat. This process repeats every 10 minutes for the duration of the call. A similar process (start & 10 mins) exists for reporting on the Wabbit S2’s battery percentage and estimated depletion-to-zero time.
Google Maps Street View & OpenStreetMap Technical Tidbits
Google Maps Street View URL format:
OpenStreetMap URL format:
“Help me” | “I have fallen” | “I have an emergency”
Sends an SMS to a pre-defined list of numbers and also dials these numbers in a round-robin fashion to bring them into the call. The SMS contains a brief explanation, dial-in number, and links to the Jitsi meeting and usage instructions. To-do: add Doro-like “press zero to acknowledge and join conference” functionality, so as to be able to catch calls going through to voicemails/etc.
“What time is it?” | “What date is it?” | “What day is it?”
Speaks the time and date in the local timezone (the location of the Wabbit S2 is automatically determined upon joining the call).
“Speak photo” | “Take photo”
The phone vibrates three times, each one second apart, before taking a photo and describing what can be seen in the image (the AI prompt being “describe what is in this photo?”). The response from AI is then converted to speech using OpenAI’s Text to Speech API and the resultant (Opus) audio played into the call.
The text description of, along with a time-limited link to, the photo also are added to the Jitsi Meet chat.
Example: “Speak photo”
The spoken response14:
‘The photo shows a view of a harbor or port area. In the foreground, there are large rocks or boulders lining the water’s edge, along with a paved road. A few boats are visible on the water, including a larger ferry with the name “JADROLINIJA” on its side. In the background, there are buildings and possibly a cityscape, with mountains rising in the distance under a cloudy sky. The overall atmosphere appears to be calm and maritime.’
The spoken response14:
‘The photo depicts a rustic interior with a variety of decorative and functional items.
- Wall Decor: The wall features a large portrait of a man, surrounded by various plates that are likely decorative, showcasing colorful designs.
- Hanging Items: Above the plates, there are several hooks with small containers or utensils hanging from them.
- Tools and Utensils: Various tools and kitchen utensils are mounted on the wall, including rolling pins and other traditional items.
- Table Setup: In the foreground, there is a wooden table covered with a cloth. On the table, there are bowls, a jug, and some vegetables or fruits, suggesting a homely, traditional setting.
- Overall Atmosphere: The overall ambiance is warm and nostalgic, reflecting a cultural or historical theme.
The arrangement of items suggests a blend of functionality and decoration, typical of a traditional kitchen or dining area.’
“Speak photo question < question >” | “Take photo question < question >”
As per “speak photo” although the < question > is sent as the prompt to OpenAI.
Example: “Speak photo question How much is Sauvignon Blanc?”
The spoken response14:
’ The price for Spier Sauvignon Blanc is as follows:
- Can (250ml): £6.75
Let me know if you need more information!’
Example: “Speak photo question Translate this message into English”
The spoken response14, which came as a pleasant surprise (it was the first time translation had been attempted with the Wabbit S2):
’ Here’s the translation of the message into English:
“Dear Mabel!
We are sending our greetings from the spa: we think of you a lot!
Take care of yourself!
Kisses,
Huncsa
Thank you for the gift you sent!” ‘
“Last photo question < question >” | “Speak last photo question < question >”
Send the previously taken photo with a custom prompt. Can be used to obtain additional detail from the photo and/or refine the query.
“Call Morpheus”
Provides a 1-800-CHATGPT(ish) experience using the OpenAI Realtime API.
“Disconnect Morpheus”
Disconnect the connection to OpenAI’s Realtime API.
“Ask morpheus < question >”
Send a question to OpenAI and hear the answer.
“Stop talking”
Prematurely end the output of any currently playing audio, such as the audio answer response to “ask morpheus < question >”.
“Open sesame” | “Call sesame”
Placeholder. To-Do: connect Mabel to Sesame AI.
“Disconnect sesame”
Placeholder. To-Do: disconnect from Sesame AI.
Also available:
- “Put me through to < contact name >”
- “Speak to < contact name >”
- “Ring < contact name >”
- “Contact < contact name >”
Dials via Asterisk, outbound routing via the Cisco SPA232D.
Note: once the number is dialed Deepgram is disconnected;15 voice commands are no longer possible for the remaining duration of the call. This may be changed in future with a watch-word to re-establish a connection to Deepgram.
“Dial < contact name >“
Placeholder. To-Do: dial via SIM2.
Most of the commands produce the same results as the voice commands above. Only those that differ are commented below.
mabel location
Force a location update, with the results appearing in the Jitsi Meet chat.
mabel speak <words>
Speak <words> into the call.
mabel call morpheus
mabel disconnect morpheus
mabel take photo [flash|no flash] | mabel speak photo [flash|no flash]
Per voice command but with the additional option to control the flash when taking a photo. The default flash setting is “auto”.
mabel last photo <question> | mabel speak last photo <question>
mabel ask morpheus <question>
mabel stop talking
It does not stop mabel talking, see equivalent Voice Command above.
‘The photo14 shows an elderly woman sitting at a table in a restaurant. She has curly gray hair and is holding a glass of what appears to be a light-colored beverage. The background features a cozy, decorated interior with hanging lights, curtains, and various items on the walls, including plates and possibly other decorative elements. There are also other people in the background, contributing to the lively atmosphere of the restaurant.’
FOUR MORE YEARS! blog posts?
- Bending the will of more devices to integrate with Wabbit S2?
- Utilise the unused space under the curved back of the 8110 4G? A custom (flat) rear cover with integrated AirTag + Bluetooth Sidecar (to TRRS) + Wireless Charging16 (to Micro USB) + larger battery?
(cont. p.94)