Here4 GPS Loss But Mag Healthy

Hi guys,

I have been flying a custom drone with 2 Here4 units on it for quite a few months now. Until recently, I have not had any issues with it. Today, I had scary experience with it when I was flying and there was a sudden GPS loss (seem like both GPS units at the same time).

I am using ArduCopter firmware (4.4.4).

What happened was that the EKF failsafe triggered and the drone went into land mode.

I am looking at the logs and see that at the time of the failsafe, and right before, there is a sudden absence of GPS data in the plots, for example, when looking at the satellite count, you can see there is a gap in the data (nothing logged):

Then, in the messages tab, I see all these weird things:

During this GPS loss, I still have MAG data showing up, which tell me that the Here4 itself is not shutting down.

Power quality also look good to me:

Any idea on what can cause this loss of GPS?? Please help me get to the bottom of this if you can.

@sidbh I know that you have answered some technical questions about the Here GPS before.

Would you be able to help me understand what conditions could cause the messages that were received after this issue happened? Does it seem like the GPS was reset or something mid flight?

Thanks

@Jesus_Cervantes can you share the log? Also what firmware version are you running on Here4?

Hi @sidbh yes, please see here.

I will have to check tomorrow about the firmware version on the Here4 but I think it’s most up to date.

Please let me know if you have any ideas about what these errors could be caused by.

Thanks.

Hi @sidbh

Here is the firmware version info from the Here4:
image

It is 1.13.BF5E50D8

I have done very detailed inspection of all mechanical and wiring, but no clues about the cause. Do you know what those error codes mean that are coming from the GPS? Is there any guide to interpreting these?

@Jesus_Cervantes I had a look at the log, it definitely tells quite a wild story. This is what I got from the Log:

  • Both Ublox modules went down for ~10s at the same time. And both came back almost at the same time as well. The bitmast you see there is marking that its reconfiguring the ublox module to send correct messages. And it pretty much starts from scratch. For your reference ardupilot/libraries/AP_GPS/AP_GPS_UBLOX.h at master · ArduPilot/ardupilot · GitHub

  • Votage levels on Vcc are healthy, can you confirm that Here4s are powered through same 5V rail as Cube?

  • Magnetometer data was being sent properly, meaning both MCU and RM3100 power supplies were most likely ok, which powers Ublox as well.

Likelihood of these caused due to power issue is very very low. So, I have following theories as to the cause of this:

  • If you are sending GNSS RTK corrections, maybe some spurious packet made through and commanded both units ublox module to hangup.
  • There was some external factor that made the unit go bad, I might be going out on a limb here but maybe an anti drone device.

Both units doing almost exactly the same thing at the same time is the unusual bit. Is this happening frequently, or only once? If it was hardware related or even power related, I would have expected some level of staggered response. But there response seems to almost be in lockstep. That tells me that the likelihood is common external factors like the above.

@sidbh I appreciate you taking a look at the log and analyzing it.

Regarding your question, yes, the Here4s are powered through the same 5V as the Cube. The drone I believe use the PSM from CubePilot and the GPS modules are powered from one of the protected outputs of the PSM.

I don’t think that there was anything with RTK corrections happening at the time. It’s true that I use it before, but I was not using on that day. However, not sure if Mission Planner would do anything like that by error if the RTK NTRIP had been previously configured (although not connected).

This event happened only once, but I have not flown again since, because I wanted to investigate everything before trying to fly again.

Since then I have checked everything I possibly can to look for mechanical or hardware issues, with no luck.

Then I started to do experiments and actually I was able to artificially reproduce the same sort of problem. First of all, I added a connection from the 5V GPS power to the ADC port of the Cube and now I’m logging the ADC value.

I intentionally did a very brief short circuit between 5V and GND on the GPS line. This was literally just touching as quickly as possible. To my surprise, the log showed that the GPS itself reset and showed the same symptoms in the log. The mag had a little change but still kept running. The POWR flag to show overcurrent was sometimes activated during my tests, but there was a time when I did the short circuit fast enough that it didn’t trigger (see below):

Since then I decided to power each GPS separately (second GPS using its own UBEC and not coming through PSM), and nervously I will go out and do more test flights.

I wish I could prove it was due to the reasons you listed as possibilities, because then I would feel better about the hardware.

Let me know if you have any other thoughts or if you think you have seen anything similar happen before.

Thank you.

@Jesus_Cervantes Even the Cube has support for backup power supply. I do recommend that you use two isolated power supplies to power the Cube as well. And yes its definitely a good idea to isolate the power supply going to second Here4. That should definitely make for a more robust setup.

Hi @sidbh I guess I explained poorly, but I do have a backup power supply going into the Cube carrier board, and both power supplies go through the PSM on the cube carrier board.

I flew another test for an hour today and could not produce anything.

Mostly I am very curious to know if this was an isolated random error with the Here4 or if it’s still likely there was a setup problem.

I guess the only answer might be more hours of flight testing to see if it ever comes back.

Ok, that’s good. Then the changes you made be fine, giving a separate power supply to backup GPS unit is definitely a good idea.

Thanks, I wish I had more clarity about the error and what caused it, but I guess all I can do is more flight testing now if there’s nothing obvious. I’ll post again if I reproduce it.

@sidbh One more thing - do you have any idea what the power or current usage of the Here4 is while in operation? I can’t seem to find any info about this.

in another users log that has this same issue we notice the issue happened just at the time that a 32 bit microsecond value would wrap. It isn’t quite at that time for this log, but perhaps you did a pre-flight reboot which would have reset the clock on the flight controller but not on the Here4.
Do you have the tlog for this flight so we can look at the possibility of a pre-flight reboot?

@Jesus_Cervantes The issue had nothing to do with Hardware, it was a software bug related to time wrap of microseconds in a 32bit wide variable. This has now been fixed, a new release will be made.

I’m not sure if I have those telemetry logs but I will check to see. It’s quite possible I did a pre-flight reboot because I do this often for one reason or another, but I just can’t remember 100% in this case. So if I wanted to recreate this without updating firmware, how long would I need to be running before I would expect it?

Do you think it’s worth me testing again without changing firmware to prove the exact same thing happens? Or should I just update the firmware and hope it never happens again? When do you think it would be released?

Hey Andrew,

I did find a tlog in my Mission Planner folders. I was using Mission Planner at the time to look at telem data.

Unfortunately the tlog had data from multiple different flights, but I saw that the very last flight in this log is the one with the issue. Specifically, the issue happens around the 96.16% percent time mark.

Any chance you can take a look and see if your theory could be correct? I’m not too expert on tlog analysis.

the tlog only covers 15 minutes so we can’t use it to confirm, but @sidbh has got a reliable reproduction now and a fix, so we’re pretty confident of the issue.
I think you can expect a new firmware release for here4 with a fix soon

Hi Andrew,

Thanks for the reply. I actually was able to reproduce this here just on the table today. I set it to log disarmed and let it run about 71.6 minutes and the same thing happened. Interestingly (of course I have no idea what’s actually going on with the firmware in the backend), it did not have another rollover after an additional 71.6 minutes. I would have thought it would rollover again. I left it connected to power and logging for about 2.5 to 3 hours.

But yes, I assume the firmware release will solve this. It has given me some head scratching looking for other nonexistent power issues.

1 Like