Here4 loses many satellites and positional accuracy ~8:20min after being switched on

Hello,

Another update since yesterday. This morning I did two more tests on the ground for > 10 minutes and did not have any transient satellite drops. I then did a ~36 minute flight, and also left the drone on for about 25 minutes or so, total time 61 minutes logging, with no issues seen.

This is pretty compelling so far, at least with regards to getting rid of the original problem. Will update if anything else transpires, but for now, I will probably return to flights.

Still eager to hear back from anyone else, and especially CubePilot’s thoughts on the evidence so far.

Hello,

I figured that even though I haven’t heard back, I’ll go ahead and post some more information that I collected and learned today.

I ended up poking around the CubePilot GNSSPeriph-release GitHub repository and noticed there’s a gps_debug GUI in there whose glitch-detection rules are an exact match for the symptoms people are reporting (sat-count cliffs, fix downgrades, position jumps), and it includes raw UBX capture via a tunnel to the u-blox. It looks built for diagnosing this specific bug, so it seems clear CubePilot is aware and working on it. I built the GUI and used it successfully today to capture some logs from a new Here4 unit that was exhibiting the symptom.

The most useful capture was an unmolested one, where gps_debug was just recording in the background and the bug fired naturally about 8 minutes in. From the data, it looks like an internal u-blox firmware event rather than external RF. The RF spectrum is unchanged through the event, the receiver’s clock is undisturbed, and MON-HW shows a clean deterministic step in the L1-band AGC value. The disruption appears to specifically affect L1-band tracking loops (GPS L1 and Galileo E1) for about a second before recovering.

I also tried using u-center via the tunnel to enable some additional diagnostic NAV messages so I could see more receiver state. Interestingly, in a few of those sessions the bug seemed to fire within seconds of me enabling messages, much sooner than the usual 4-8 minute mark.

One loose theory is that the trigger relates to host CFG-MSG traffic volume rather than a fixed receiver-uptime timer, but I haven’t proven that.

On the unit I applied the “workaround” process on, as mentioned in earlier posts, it’s still been clean for over 10 test sessions, but I haven’t logged enough hours to feel that it’s 100% confirmed. Still under observation.

1 Like

Hi Jesus,

Thank you for sharing this level of detail. This is very useful.

Your observation that the RF spectrum remains unchanged while the event appears with a deterministic L1-band AGC step is important, because it points more toward an internal receiver / firmware / message-handling behavior rather than a simple external RF interference case.

The fact that gps_debug appears to include detection logic for sat-count cliffs, fix downgrades, position jumps, and raw UBX capture also suggests that this is the right diagnostic path for affected units.

At this stage, I think the most useful next step would be for CubePilot / maintainers to confirm the preferred logging package for affected Here4 units, for example:

  • gps_debug raw UBX capture
  • GNSSPeriph firmware version
  • Here4 hardware/version details
  • ArduPilot log around the event
  • exact timing from power-on to fault
  • whether additional CFG-MSG traffic changes the trigger timing

This would help users collect consistent evidence instead of everyone testing different things independently.

I would also be careful not to treat the workaround as fully confirmed yet until more units and longer test hours are reported, but your results are definitely a strong lead.

Dr. Fares Al Dhaheri

Al-Etihad Industrials, UAE

Hi, we are working with u-blox to investigate the issue. We are quite close to answer on this, and are currently looking for more logs with NAV-STATUS turned on, the tool you found is written to make it easier to collect logs and test the issue. If you have ublox logs that show the issue, will appreciate if you can share them with us. Feel free to DM me with the logs.

There is no official recommendation from our side yet. We are working closely with u-blox to find a solution.

Hi @sidbh thanks for the update. I have done several tests trying to activate different messages to capture with the bug, but doesn’t the Here4 firmware actively turn off the NAV-STATUS messages?

From my tests, and my understanding, I have not been able to keep it on long enough to capture the bug, even through u-center.

I looked into the GNSSPeriph codebase and it seems that every time a CFG response comes back, the driver calls _verify_rate(). If the rate it sees doesn’t match the rate it wants, it immediately fires another CFG-MSG to set the right rate.

Even though I tried turning NAV-STATUS on in u-center. the captures had no NAV-STATUS because GNSSPeriph actively turns it off on every poll cycle.

Hello,

I tested the workaround described above by @Jesus_Cervantes.

Since I already had two modules (1x Here4 Blue, 1x Here4 Black) with the bug, I didn’t have to do much.

1. Checked the firmware version AP_Periph. Both modules are on 1.15.C71AF5, etc.

2. Tested both modules for the bug. The bug occurred on both modules after about 8 minutes and 15 seconds.

3. Enabled SLCan passthrough

4. Performed U-Center View → Configuration View → CFG → select “Revert to default configuration”

5. Saved the configuration to the module

6. Performed a coldstart in U-Center. I didn’t discharge the internal battery/capacitor.

5. Connected the module to an FC running px4 1.15.4. Enabled UAVCAN Autoconfig for sensors and performed the dry test (without flying).

It rained all day. But even during the dry test without flying, the dropout could previously be observed. Now, after powering up, the module ran for ~12 minutes each time without a dropout. I then performed the dry test with both modules 3 times per module, without any further dropouts.

Well, so far the reset workaround seems to be working for us. If the weather is better tomorrow, I’ll conduct flight tests. The question is, what was different in the configuration before, and why was it developed that way, resulting in a non-U-Blox default configuration when the Here4 modules were shipped.

@Jesus_Cervantes you will need to disable GPS_AUTO_CONFIG by setting it to 0 in Here4 param, if that is reset than the firmware will not attempt at trying to configure it.

Hey,

As I haven’t been able to fly yet – the site is occupied today – I’ve done a U-Blox module parameter comparison between a Here4 Black module fresh from the warehouse and the Here4 Black module on which I applied the workaround yesterday.

Whilst applying the workaround yesterday, I kept wondering whether there were any differences, and whether the Here4 manufacturer, drawing on many years of experience in U-Blox module configuration, had adjusted any parameters – presumably for good reason – so what would I lose if the module were reset to the U-Blox default parameters?

Comparison:

Shipped Here4 Black vs. Workaround Here4 Black

Helpers:

  • cubeorange
  • Mission Planner
  • U-Center
  • diff
  • kdiff3

Procedure:

  1. SLCan Pass-through → U-Center via network socket
  2. In U-Center, open the Receiver Configuration via the menu bar: U-Center → Tools → Receiver Configuration and save the receiver configuration to file. Do this for both modules used for comparison.

Comparison results: Shipped vs Workaround:

In line 16, one block is different. Specifically, Shipped ‘20 01 8D’ versus Workaround ‘20 00 8D’

I’ve attached the files. I don’t work with U-Center often enough to know whether U-Center can display the difference in a way that’s easy for humans to understand. That’s why I’m asking here if anyone knows what this means :slight_smile: ? Or how U-Center can display the difference so that any changes between the two configurations are listed in a way that’s easy for humans to understand?

Files.zip (247.6 KB)

Hello,
Here again are the promised flight logs for the PX4 15.4 copter.

After resetting the modules to their firmware defaults in U-Center—as described above—and performing a cold start on each via U-Center, I mounted the Here4 Blue onto a PX4 test copter. In doing so, I made a point of ensuring that my procedure did not differ in any way from our production process, where these modules are installed and commissioned. This means that for the Here4, the UAVCAN_ENABLE parameter is simply set to 2 (“Sensors Automatic Config”). Subsequently, I briefly recalibrated the module’s compass—and that was it!

Since then, I have not observed any drop in the number of satellites, nor the accompanying EKF2 position drift.
Subjectively speaking, the flights performed no differently than they did before the workaround—aside, of course, from the absence of dropouts.
Here are the links to the logs uploaded to Flight Review.
A notable feature of these log files is that they record continuously from power-up until the first disarm following the completion of the initial flight.

Here4 Blue Workaround Flight 1

Here4 Blue Workaround Flight 2

Here4 Blue Workaround Flight 3

@sidbh I tried another test yesterday on a Here4 unit that had been unused since late 2024 I think. The first test was the control with nothing changed, and the bug fired at ~8:20 like always.

After that, I set GPS_AUTO_CONFIG to 0 on that Here4, and then powered off and on again, then started logging with the python tool. After that, I went into u-center and connected via the tunnel, and turned on the NAV-STATUS message through the messages view. Almost immediately afterwards, the bug fired.

I tried one more time, but the result was the same.

I let it run for up to 10 minutes afterwards, but there never seems to be a subsequent bug if it happens once.

So in this case, it’s as if the enabling of NAV-STATUS directly triggers the bug.

Is this helpful information, or is this something you’re already aware of? It seems that we can’t capture the “natural” progression of the bug (i.e. firing around the 8:20 mark) with NAV-STATUS enabled.

@wolke I looked at your logs and used the u-blox-F9 interface description document to help decode this. It looks like this single configuration difference is “CFG-MSGOUT-UBX_MON_SPAN_UART1”. Basically, turning off the streaming of a message (MON-SPAN) that is the RF spectrum snapshot message.

One thing I might try today if I have the time is taking a GPS unit that still has the bug, and explicitly turning off this single message via u-center. If the bug goes away, we probably have confirmed that this message is somehow related or triggering the issue.

1 Like

That is the entire frequency scan of L1 and L5, so quite a heavy message, may mess with serial port buffers in the F9P CPU?

Quick update from more testing yesterday. I ran a controlled A-B-A sequence on a fresh Here4: first a sanity-check capture with the unit in shipped state (MON-SPAN UART1 set to 1 in Flash) which fired the bug at the ~4 minute mark, then three captures with MON-SPAN explicitly set to 0 (via u-center configuration) in all three layers (RAM, BBR, and Flash) which all ran clean for >10 minutes each with no bug, then one more capture with MON-SPAN re-enabled (Flash + RAM) which fired the bug again pretty quick after enabling it.

Here’s a screenshot showing what was set to 0 explicitly:

The MON-SPAN-disabled captures did still show MON-HW agcCnt transitioning between 3510 and 3861 a couple of times in each session, but those transitions did not escalate into a visible glitch (no sat cliff, no phantom velocity, no hAcc spike). It looks like the agcCnt bistability is normal receiver behavior on its own, and the bug only manifests as a visible event when MON-SPAN is actively being generated. Disabling CFG-MSGOUT-UBX_MON_SPAN_UART1 could be the mechanism by which the resetting of default configuration was preventing the bug from happening.

Hopefully this is helpful or at least reinforces what CubePilot has already been seeing.

1 Like

@sidbh Any news that you can share, or any comments about the testing mentioned in the last post?

@wolke Any further issues? I have not seen anything else happen on mine in quite a few flights since doing this stuff I mentioned.

Hello,
I’m back to testing after being out sick for two weeks. We can confirm that all GNSS systems using the fix (factory reset) mentioned earlier in the thread no longer exhibit the bug during operation. Applying the factory reset is faster than manually changing the specific parameter that gets altered during the reset. As a company that would like to stick with the Here4 GNSS for various reasons, it would be helpful if CubePilot could comment on this—specifically, whether there was a particular reason for setting the parameter that way, or, looking at it another way, whether there are any known side effects to reverting to the default u-blox firmware parameters. We could then finally get back in the air!

/g

wolke

We are also facing this issue so CubePilot chiming in and a possible fix would be great. It great to see someone figure out what the underlining issue is.