Hi, first, very happy with the product, it’s working out great so far! Today, I was mapping a whole bunch of dead ends and doing a few other map edits. I noticed a couple anomalies. First, I did have a few occurrences of no gps. When I looked the rover had like 28 satellites. After a couple mins it came back, no big deal for me, not a complaint, just an observation. Second was on like 3 or 4 occasions, the rover simply stopped responding. No movement, no error, gps signal was fine. After a short while, maybe a minute or so, it just started responding again. So again, not a complaint really, just wondering what might make it pause like that. Woulda been funny if Cash just said"gimme a minute, I’m tired" cuz that what it seemed like.
This is most likely your phone switching from Bluetooth to WiFi or your WiFi dropping out and switching to cellular or a combination of all of the above. The app will misrepresent this as no GPS. Try turning off Bluetooth on your phone and that usually helps. If you’re walking around outside you can also disable wireless and just use cellular for a more stable connection.
For me, the brief stalling was caused by having wifi enabled on the core, itself.
The stall happens because they’re using TCP for things, and likely wrote the handler with blocking calls instead of an async handler. When a packet is lost with the wifi stack, the entire process stalls waiting for the network timeout.
This behavior, at least, isn’t related to the phone app, as it happens during tasks when the app is not involved. At some point, the core and DS need their timeouts reduced when using the wifi path.
The app acts like they use a polling loop to send/process data between the core and the app. With enough packet loss, the core and app can literally get 10 or 20 seconds “behind” each other; you issue a sequence of commands (to move forward and pivot left, then turn on the lights, for example) and the core executes them in sequence some time later. The core’s status or position on the map may likewise be some period behind. The behavior effectively acts like a buffering issue on the network, except that none of us are running buffers of that size on our routers lol. Behavior will persist until you completely exit the app and then re-run it (which makes a new TCP connection to the core).
The fix to the app’s lag-behind would be to not use any polling loops and make the network handler async / interrupt driven. Or if they’re stuck with a polling loop, be certain to completely flush the transmit queue before returning, and completely process the receive buffer. As it is, it (mostly) behaves as if they are sending / processing a fixed amount of data per tick of the clock, regardless of how much is in queue. This is probably a byproduct of how the remote-control driving thing on the phone app works. They should probably consider decoupling the send rate of the xmit queue from it, so that packet loss timeout won’t produce a time shift.
From DS to core, if they’re using a single TCP stream, they might consider splitting it into two. One for data that must be reliable (settings, polygons, commands, firmware, etc) and one for things that do not matter if a packet is lost (RTK, video, status etc). Configure the reliable TCP connection with a short (1000ms) timeout and one or two retries. Configure that second connection with the shortest timeout available and no retries.
Again, wifi only. I’ve not noticed this behavior via 4G or halow,
Fun stuff!
@steve Thank you for your very informative post!