Hi fellow Insteon hackers! Apologies for writing a novella here, but I am hoping to get some input on what I believe is very odd behavior.
Quick intro: I am a ~15 year Insteon user and hacker at my own home. I've been developing a PHP-based controller application called Footprint which I hope to make public at some point in the future. My installation at my rural home consists of ~100 active Insteon devices and ~40 Insteon groups/scenes. My application uses a 2413S PLM to interface with the devices. Overall, I'd describe my setup as mature and working very well. I consider myself to have an advanced understanding of the PLM and the Insteon hardware, including ALDB commands and concepts. My application manages all links and scenes programmatically. And almost all of my devices today are I2CS but my application can programmatically manage I1 and I2 style link databases accurately and efficiently. With all the device management and scene links, my PLM currently hosts ~500 total links for the installation.
Last year, we finished a remodel where we added a new master bed room to the house which involved installing a half dozen new 2477D and 2334-222 devices, with some new scenes (e.g. SwitchLink Dimmers linked to Keypad buttons). Since adding these latest devices and scenes, I have been following a very weird problem where the "Master Bed Closet" scene (group 19) intermittently respond to a group 22 ON command sent by the PLM. It doesn't happen but once every other day or so, but unfortunately when it does, its early in the morning and right next to my wife's bedside -- not good in the WAF department! The issue tends to occur in the morning because the "Ceiling Fans" scene (group 22) is usually triggered in the morning hours by the HVAC system calling for heat. Also, I should add that group 22 cycles on at least a dozen times a day since it is triggered by our thermostat calling for heat.
Here's are the 2 most recent examples of the devices belonging to Insteon group 19 (0x13) responding to an All-Link ON command sent to Insteon group 22:
Dec 26 05:19:00 <user.info> lion php: [footprint] transportInsteon: Tx ALL-Link command 0X11|0X00 to group 22 (0261161100) (ACK, 125 ms)
Dec 26 05:19:02 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X13 from Master Bedroom - Entry (ACK, 1/4 hops, 1941 ms)
Dec 26 05:19:03 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X13 from Master Bedroom - Closet (ACK, 1/4 hops, 1949 ms)
Dec 28 04:27:00 <user.info> lion php: [footprint] transportInsteon: Tx ALL-Link command 0X11|0X00 to group 22 (0261161100) (ACK, 125 ms)
Dec 28 04:27:01 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X16 from Master Bedroom - Ceiling Fan (ACK, 1/4 hops, 1271 ms)
Dec 28 04:27:02 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X16 from Bedroom - Ceiling Fan (ACK, 1/3 hops, 884 ms)
Dec 28 04:27:03 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X16 from Office - Ceiling Fan (ACK, 1/3 hops, 2004 ms)
Dec 28 04:27:04 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X16 from Living - Ceiling Fan (ACK, 1/3 hops, 2009 ms)
Dec 28 04:27:05 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X13 from Master Bedroom - Entry (ACK, 1/4 hops, 3100 ms)
Dec 28 04:27:07 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X13 from Master Bedroom - Closet (ACK, 1/4 hops, 3110 ms)
In the first example, none of the actual members of group 22 responded to the command. In the second example, a mixture of the intended and unintended recipients responded to the command. Not sure if this means anything or not.
As you fellow hackers might understand, when faced with this challenge I threw all assumptions of stability to the wind and embraced it as an opportunity to find a bug in my own code. I've completed an exhaustive audit of all levels and layers in my application. I looked for erroneous ALDB records, missing bin2hex/hex2bin translations, PLM communication timing issues, etc. and have come up absolutely empty handed. I've used HouseLinc2 to independently check the links of both the PLM and the devices thinking something was missing. I've done physical device factory resets and reprograms. While I have found and fixed a handful of interesting bugs in my code, the situation remains unchanged and I have not been able to find anything responsible for this behavior. And, what I have absolutely confirmed at this point, is that my PLM is sending one and only one 0261161100 command at the time that these group 19 members choose to intermittently respond. So I am reaching out to the community here in hopes that someone can either (a) share similar experiences that were hardware related or (b) help me uncover a stone I have missed.
To recap, here are the variables as I see them:
1) The PLM and device ALDB maintain 2 independent groups (22, 19) and the correct controller and responder links are in the right places.
2) The PLM receives a 0261161100 from the controller application.
3) Group 22 devices reliably respond to the command (e.g. >99% of the time).
4) Group 19 devices intermittently respond to the same command (e.g. <10% of the time).
5) This "interplay" does not occur anywhere else on the property between any other devices or groups.
Thank you in advance for your time and assistance here...
Best,
Daniel
Quick intro: I am a ~15 year Insteon user and hacker at my own home. I've been developing a PHP-based controller application called Footprint which I hope to make public at some point in the future. My installation at my rural home consists of ~100 active Insteon devices and ~40 Insteon groups/scenes. My application uses a 2413S PLM to interface with the devices. Overall, I'd describe my setup as mature and working very well. I consider myself to have an advanced understanding of the PLM and the Insteon hardware, including ALDB commands and concepts. My application manages all links and scenes programmatically. And almost all of my devices today are I2CS but my application can programmatically manage I1 and I2 style link databases accurately and efficiently. With all the device management and scene links, my PLM currently hosts ~500 total links for the installation.
Last year, we finished a remodel where we added a new master bed room to the house which involved installing a half dozen new 2477D and 2334-222 devices, with some new scenes (e.g. SwitchLink Dimmers linked to Keypad buttons). Since adding these latest devices and scenes, I have been following a very weird problem where the "Master Bed Closet" scene (group 19) intermittently respond to a group 22 ON command sent by the PLM. It doesn't happen but once every other day or so, but unfortunately when it does, its early in the morning and right next to my wife's bedside -- not good in the WAF department! The issue tends to occur in the morning because the "Ceiling Fans" scene (group 22) is usually triggered in the morning hours by the HVAC system calling for heat. Also, I should add that group 22 cycles on at least a dozen times a day since it is triggered by our thermostat calling for heat.
Here's are the 2 most recent examples of the devices belonging to Insteon group 19 (0x13) responding to an All-Link ON command sent to Insteon group 22:
Dec 26 05:19:00 <user.info> lion php: [footprint] transportInsteon: Tx ALL-Link command 0X11|0X00 to group 22 (0261161100) (ACK, 125 ms)
Dec 26 05:19:02 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X13 from Master Bedroom - Entry (ACK, 1/4 hops, 1941 ms)
Dec 26 05:19:03 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X13 from Master Bedroom - Closet (ACK, 1/4 hops, 1949 ms)
Dec 28 04:27:00 <user.info> lion php: [footprint] transportInsteon: Tx ALL-Link command 0X11|0X00 to group 22 (0261161100) (ACK, 125 ms)
Dec 28 04:27:01 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X16 from Master Bedroom - Ceiling Fan (ACK, 1/4 hops, 1271 ms)
Dec 28 04:27:02 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X16 from Bedroom - Ceiling Fan (ACK, 1/3 hops, 884 ms)
Dec 28 04:27:03 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X16 from Office - Ceiling Fan (ACK, 1/3 hops, 2004 ms)
Dec 28 04:27:04 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X16 from Living - Ceiling Fan (ACK, 1/3 hops, 2009 ms)
Dec 28 04:27:05 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X13 from Master Bedroom - Entry (ACK, 1/4 hops, 3100 ms)
Dec 28 04:27:07 <user.info> lion php: [footprint] transportInsteon: Rx Standard Group Cleanup Direct Message 0X11|0X13 from Master Bedroom - Closet (ACK, 1/4 hops, 3110 ms)
In the first example, none of the actual members of group 22 responded to the command. In the second example, a mixture of the intended and unintended recipients responded to the command. Not sure if this means anything or not.
As you fellow hackers might understand, when faced with this challenge I threw all assumptions of stability to the wind and embraced it as an opportunity to find a bug in my own code. I've completed an exhaustive audit of all levels and layers in my application. I looked for erroneous ALDB records, missing bin2hex/hex2bin translations, PLM communication timing issues, etc. and have come up absolutely empty handed. I've used HouseLinc2 to independently check the links of both the PLM and the devices thinking something was missing. I've done physical device factory resets and reprograms. While I have found and fixed a handful of interesting bugs in my code, the situation remains unchanged and I have not been able to find anything responsible for this behavior. And, what I have absolutely confirmed at this point, is that my PLM is sending one and only one 0261161100 command at the time that these group 19 members choose to intermittently respond. So I am reaching out to the community here in hopes that someone can either (a) share similar experiences that were hardware related or (b) help me uncover a stone I have missed.
To recap, here are the variables as I see them:
1) The PLM and device ALDB maintain 2 independent groups (22, 19) and the correct controller and responder links are in the right places.
2) The PLM receives a 0261161100 from the controller application.
3) Group 22 devices reliably respond to the command (e.g. >99% of the time).
4) Group 19 devices intermittently respond to the same command (e.g. <10% of the time).
5) This "interplay" does not occur anywhere else on the property between any other devices or groups.
Thank you in advance for your time and assistance here...
Best,
Daniel
Comment