Communication issue

Discuss anything which doesn't fall into the other categories here.
mroch
Posts: 17
Joined: Wed Sep 02, 2015 11:26 pm

Communication issue

Post by mroch » Tue Nov 10, 2015 1:53 pm

I've been having problems keeping my GEM "online". As far as I can tell, it stays connected to the network, but has a hiccup while sending packets when in real-time mode. I tried using SEG, but since I couldn't see what was happening on their end, I tried switching to btmon. Now I have what I think is the same problem.

So, btmon is listening on port 2000. The GEM is in client mode (via application settings on port 80), and set to real time mode every 10 seconds.

I captured a pcap, and can see conversations looking like:

GEM: SYN
btmon: SYN, ACK
GEM: ACK
GEM: PSH, ACK
btmon: ACK
btmon: FIN, ACK
GEM: ACK

every 10 seconds for exactly an hour.

On the 601st connection, only the first 3 (SYN/SYN-ACK/ACK) take place, and it seems that the GEM never sends its packet after establishing the connection. btmon blocks indefinitely on recv().

To fix the blocking, I added a timeout to btmon. After the timeout, the GEM makes a new connection and starts sending data again, so the GEM's obviously not dead. But it does seem to hiccup reliably.

Any idea what could be going on here?
mroch
Posts: 17
Joined: Wed Sep 02, 2015 11:26 pm

Re: Communication issue

Post by mroch » Tue Nov 10, 2015 1:57 pm

Attaching the pcap...

Note that because btmon isn't designed to recover from a timeout, it doesn't close the next connection and keeps reading data for exactly 5 minutes before the GEM closes the connection itself. Obviously this is better than reconnecting every time, so I'll fix btmon to keep the connection open once I figure out what's up here.

After a couple of those 5 min long conversations, btmon starts sending spurious retransmissions and everything goes to hell. I'm going to ignore that for now, though it's suspicious.
Attachments
btmon-bad.pcap.zip
(10.92 KiB) Downloaded 129 times
ben
Site Admin
Posts: 4266
Joined: Fri Jun 04, 2010 9:39 am

Re: Communication issue

Post by ben » Tue Nov 10, 2015 1:58 pm

You can verify the GEM is still sending packets while this issue persists by looking at the Sys LED on the GEM. If it flashes red periodically, it's at least trying to send the packet out.

Have you tried COM1 Auto Reset on the GEM? That'll reset the GEMs WiFi/Ethernet module at a regular interval in order to make sure there isn't any issues.

Not too sure what exactly causes issues like this, it could be some sort of buffer issue with the WiFi/Ethernet module.
Ben
Brultech Research Inc.
E: ben(at)brultech.com
mroch
Posts: 17
Joined: Wed Sep 02, 2015 11:26 pm

Re: Communication issue

Post by mroch » Sat Nov 14, 2015 8:33 pm

Interesting! I had the COM1 Auto Reset set to 250. Then I set it to 200 because it's less than an hour, and I was seeing issues after about 60 minutes. 200 * 16 sec = 53 minutes, and I started seeing issues after 51 minutes... so, I changed the Auto Reset to 20 (5 minutes) and it repro'd!

So, COM1 Auto Reset must be resetting the module in the middle of a transmission. btmon is definitely not configured to recover from that.

Current plan is to disable Auto Reset and see if it stays stable. If I do need to turn Auto Reset back on, I'll post a patch to make btmon use a timeout and then properly recover from it. Probably a good idea to fix btmon anyway... maybe I'll do that.
ben
Site Admin
Posts: 4266
Joined: Fri Jun 04, 2010 9:39 am

Re: Communication issue

Post by ben » Mon Nov 16, 2015 10:42 am

mroch wrote:Interesting! I had the COM1 Auto Reset set to 250. Then I set it to 200 because it's less than an hour, and I was seeing issues after about 60 minutes. 200 * 16 sec = 53 minutes, and I started seeing issues after 51 minutes... so, I changed the Auto Reset to 20 (5 minutes) and it repro'd!

So, COM1 Auto Reset must be resetting the module in the middle of a transmission. btmon is definitely not configured to recover from that.

Current plan is to disable Auto Reset and see if it stays stable. If I do need to turn Auto Reset back on, I'll post a patch to make btmon use a timeout and then properly recover from it. Probably a good idea to fix btmon anyway... maybe I'll do that.
Yeah, fixing it might be the better idea as there may be other cases in which this would happen (losing network connection mid-packet comes to mind).
Ben
Brultech Research Inc.
E: ben(at)brultech.com
Grimshad
Posts: 4
Joined: Tue May 19, 2015 12:10 pm

Re: Communication issue

Post by Grimshad » Mon Dec 07, 2015 1:38 pm

mroch,

Was this issue ever resolved? I am running my own script to handle this and send data to emoncms. I seem to be having a very similar issue as you. I setup btmon and see the same issue. Any chance I can see how you fixed this? I am going to be sending data to multiple locations now so I'm switching to btmon instead of redoing it all myself. I also need to setup btmon to send an e-mail if it stops receiving packets and then again when it starts receiving them again.

Are you using signals to accomplish this? because i'm running on a windows box. It would be great if you could paste your recv()

Thanks.
mroch
Posts: 17
Joined: Wed Sep 02, 2015 11:26 pm

Re: Communication issue

Post by mroch » Tue Dec 08, 2015 4:02 am

I haven't fixed btmon yet, but turning off the GEM's Auto Reset worked around it, and has been stable for weeks now. When I fix it, I'll definitely post the patch.
sub3marathonman
Posts: 95
Joined: Fri Feb 11, 2011 9:32 am

Re: Communication issue

Post by sub3marathonman » Wed Dec 09, 2015 1:06 pm

I have had the GEM (wifi only) working with btmon and using emoncms (on my home computer, not posting to their website) for many months now, not as a final finished work but enough to have some basic informaiton, and I have never had a problem with the GEM going offline. It has on occasion said the connection was forcibly closed by the remote host, I'm thinking when I'm entering the GEM setup mode, but it always has kept going back to normal.

I see that my GEM here is set to Auto Reset Timer Disabled for both COM1 and COM2.

Maybe somebody could explain to me, who has limited to nonexistent knowledge in this area, about why this feature should ever be turned on? Is it something that if you're at the edge of the GEM broadcast area you'd need this?
Grimshad
Posts: 4
Joined: Tue May 19, 2015 12:10 pm

Re: Communication issue

Post by Grimshad » Thu Dec 10, 2015 3:33 pm

sub3marathonman wrote:Maybe somebody could explain to me, who has limited to nonexistent knowledge in this area, about why this feature should ever be turned on? Is it something that if you're at the edge of the GEM broadcast area you'd need this?
Well i'm not sure about my level of experience. However, in previous versions of the GEM firmware, the wifi would somehow break. So to easily fix it automatically, I had set it to restart every x minutes so that I lost a little data as possible. Other than that I think it's just there because it can be and it's not really that important, hence it being listed under advanced. Just my guess though.
mroch wrote:I haven't fixed btmon yet, but turning off the GEM's Auto Reset worked around it, and has been stable for weeks now. When I fix it, I'll definitely post the patch.
I disabled the auto reset by setting all the coms to 0, but the same thing is still happening. The GEMS light blinks red at the interval. I see the packets being sent off, but btmon still fails.
ben
Site Admin
Posts: 4266
Joined: Fri Jun 04, 2010 9:39 am

Re: Communication issue

Post by ben » Thu Dec 10, 2015 4:26 pm

Grimshad wrote:
sub3marathonman wrote:Maybe somebody could explain to me, who has limited to nonexistent knowledge in this area, about why this feature should ever be turned on? Is it something that if you're at the edge of the GEM broadcast area you'd need this?
Well i'm not sure about my level of experience. However, in previous versions of the GEM firmware, the wifi would somehow break. So to easily fix it automatically, I had set it to restart every x minutes so that I lost a little data as possible. Other than that I think it's just there because it can be and it's not really that important, hence it being listed under advanced. Just my guess though.
Pretty much this, it's there in case any issues arise with communication modules that can't be fixed via GEM firmware.
Ben
Brultech Research Inc.
E: ben(at)brultech.com
Post Reply