Jump to content

Some bugs ..


Recommended Posts

For Devs,

Found some more bugs in XR500, I will update this thread with more detailed description in time..

The bugs have to do with sticky offline/online clients in Device Manager, port numbers, extenders in AP mode with ethernet backhaul or Extender mode with wifi backhaul causing havoc on qos/dpi while roaming.

Here is one quick one, but I need someone else to check if its not only occurring on my router:

PORT NUMBERS

Look at Settings, Monitoring, Statistics  -- note Netgear settings list port number LAN 1,2,3,4 connection status.  In order to see it you need to have less than 4 lan ports connected, so for example say it reports LAN 1, LAN 2 is connected at whatever speed, for however long time...  (i have two switches connected to the router, switch lan1, switch lan2)

Now look at Table View in Device Manager to compare:  the port #s are reverse ordered - so if LAN 3 and LAN 4 are connected in Netgear settings,  they are shown as Port 1 and 2 in table view.

Anyone else see this?

Will update later with detailed other bugs and what I've noticed with merging virtual macs, and duma is not clearing properly -- even though arp table and swconfig switch0 on router shows IP  in use, duma shows as offline, but still reports an IP attached to device when clicked on, and the side effects of this with qos / dpi kills active packet/port connections if this table is hosed - sometimes I've noticed devices has all 00:00:00 mac address on br0 interface -- and for sure if device is being reported offline in device manager, but still has an ip assigned to it and its live, weird sh*t happens when trying to use tcp/udp ported traffic. def a bug in device manager still (and port order may or may not have anything to do with it, but something is losing track of devices in the latest firmware) 100% work around is to disable QOS and traffic can flow without being killed, but this is not the solution. it only kills unknown traffic (if the database is hosed)

more later

 

Link to comment
Share on other sites

  • Administrators

This is probably related to a visual port discrepancy on the router, some had them displayed backwards. So likely Netgears settings is still operating on that assumption while we are working on the correct understanding. I'll pass your topic on to the dev team.

Link to comment
Share on other sites

Hi Fraser,

As far as the port #s, yes I know about that issue from way back and I'm pretty sure my router has the misprint (it doesn't matter to me)
but this is in the software side though - not the outside plastic shell.
I thought that issue was the led's didn't match the port numbering in the back only?
I agree on the numbering you have to get straight with Netgear which is truly which, as far as I can tell on mine (misprinted), when I plug into port 1 on LAN,
port eth4 lights up on the router front led, but it's being shown as port 1 in netgear, settings, statistics (software side matches actual port# plugged into LAN 1)

In device manager the port# is showing as port 4 (table view) same as router led... just cosmetic misprint on back of router (yes), but worth a look if it may effect
internal functioning of the device manager or not. Not sure why the Netgear software side of things thinks its port 1, if its supposed to be port 4? unless this is deeper then just a printing issue
I wonder if Netgear pushed a fix to renumber ports software side in statistics based on router sn, but duma isn't aware of it and is using the correct one
but Netgear statistics is using a renumbered one, which may or may not be correct now - I'm wondering :)
I know you can request a replacement, but this one has been doing too well to take a chance on a refurb. unless they can guarantee me a sealed unused box,
then I'm not doing it.  And if I do get one that matches, I'm not sure that would fix the issue software side as everyone who has the correct numbered routers,
will also have the opposite situation in statistics of what I am describing no matter what is printed on the router, no?

ok onto APs:

I wrote this real quick, may have some mistakes, but you will get the general idea of where to start looking and thinking about any solutions if needed..
apologize for being so long and all over the place, but I really can't explain this unless you know how it works , hopefully the devs understand what I'm trying to get across


APs AND EXTENDERs

I have 3 extenders on this network 2 EX8000, 1 EX2700 (old, explained below)

Tested with 2 EX8000s
First test - One in AP mode, one in Extender mode.
EX8000's are connected to a their own switch which in turn is connected to XR500 LAN port (also tested direct connections, both in AP, no difference)
so AP->switch-> LAN 1, Extender->switch-> LAN 2 (which happens to be 3 & 4 duma OS side, 1 & 2 netgear software side, nothing to do with printing on the actual router)
only reason why I have them on their own switch is they are managed switches so I can set priority over the other devices
..logically you'd think direct run to XR would be better (in some scenarios it is, 1 less hop) but since you can't control flow other devices can jump ahead of important packets
I want to route APs/extenders first and I trust the switches more than the built in lan switch0 or priority qos on the XR500 to do the right thing as it seems buggy

All Netgear extenders appear to default to 192.168.1.250 ip on boot -they DHCP request/change their ip if they detect another extender using the ip.
If the extender is in AP mode, its network name always initially says hello my name is "-R"
No matter if you set a static IP or not, it will start there with .250 on a reboot
Duma records the IP/MAC address/merged association in its Device Manager database when they are first seen.
It will record the .250 ip to each extender while both are online in addition to their DHCP network IP.
Note this information is one of the key requirements for other duma apps to function properly (geo, prioritization, vpn, qos, etc)
Now if you have a 2nd or 3rd AP/extender that joins the network - they also start at .250
Duma records their mac and merges it with the .250 IP -- then they switch to a unique DHCP ip/static IP you set, but duma holds this .250 association.
It will record the .250 ip on each extender while they are connected/seen.
I also have a simple EX2700 extender running one device that's an old iot 802.11b/g network device that has trouble seeing all the wifi channels,
 it can connect to the EX2700 (older), but not the EX8000s higher channels.
Unfortunately netgear didn't update the EX2700 firmware to be aware of other extenders, as it also requests a .250 ip, but does not appear on the list of multiple extenders
when trying to manage them via http - using the default mywifiext hijack. even if you try to go to one of the AP/extender IPs its hijacked to ask which extender you want
to connect to...but the EX2700 is never seen
additionally, "Connected devices" on the EX8000 has a lot of trouble showing connected devices, multiple refreshes are required and the information is mangled when it does
appear &ltunknown&gt, wifi devices on its own DHCP ip, etc. It doesn't effect functionality though, I think these extenders still have bugs that Netgear needs to address, maybe you can let them know lol. the list is just a little crazy, but the underlying extender router is working as it should i watched the packets.

So given this scenario: if one extender is not online (powered off, or rebooting), it's still shown as online since the other AP/extender also has the same .250 ip association
duma shows it as online even when its not.

If one is in AP mode (with ethernet backhaul to XR500) it will be shown as Online on the LAN side of the tree. Which is correct.
If it disconnects it will remain as Online even though it definitely is not, because the .250 address that was associated to the other extender is still online.
Its clients initially appear to pop online and offline normally at first on the LAN side - The AP extender shows its wifi clients on the LAN side, If in EXTENDER mode, on the wifi side.
As time goes on.. and devices pick up a different virtual mac over time from a 2nd or 3rd extender, it starts to get hairy.
No idea how duma clears out or re-verifies the associations of last seen IP/MACs merged with named devices, but I don't believe it does, and this information is never removed until the device is deleted ...  
I believe the extenders don't store virtual macs they used after a reboot sometimes (could be a bug in EX8000), but duma does. They definitely won't if they are factory reset.
even after a router reboot XR seems to hold prior IP/mac addr associations, until the device is actually deleted from the tree.
The base network layer on the router does indeed know what IPs are online and what MAC they are using, but dumas tracking of it still has some issues..

Another scenario -- if you reboot the router, an extender will try to connect via wifi backhaul to the other AP extender wifi thats still online,
and now have a double mac recorded for its child wifi clients when router comes back  (its own true mac, virtual mac from the AP, and the virtual mac from the extender if it managed to join a different extender during the router reboot)
duma will then merge that new virtual mac with a named device from the first extender -- before you know it you have multiple devices with multiple macs
listed as wired (via AP) and wireless all rolled into one device name. And it holds this information forever it seems.

You get a visual representation of clients that show as online on the wired LAN side, even though they may be connected to wifi extender or you may have situations where clients
are shown as offline, but online since some other device that contains multiple virtual mac IDs associated with it has the same virtual mac.
If you try to delete the offline device, you get an error message that "Cannot delete device as it is online." as duma still thinks that MAC/IP is connected
(but its been given to a different device, or still being reported as active by the extender)

As time passes the list gets more and more distorted as devices stay stuck offline even though they are online, or online with the wrong device name.
This happens only when using it as an Extender vs an AP.  I know you mentioned you tested this with an EX7900, but you need to test it with multiple extenders
and then you will see the mess that happens, get netgear to give you two EX8000's :)
Try one in AP , one in Extender, both in Extender, or both in AP to fully test this and see this occur.

I noticed arpwatch runs every 12 hours, and other checks every hour, or every 3 seconds but it is not comparing all the information gathered efficiently enough ..
(swconfig, arp, wlan statinfo, wlanconfig) -- note the actual router has it right though in all of this, it has to do with the duma apps just losing sync as time passes and the network changes shape.

Next test..

Ok, keep in mind I have not factory reset as I try to fix all the issues manually.
Set both EX8000 to AP mode with ethernet backhauls to XR via managed switches.  
Need to DELETE every device that has multiple virtual MAC addresses or IPs associated with it from prior test in device manager list.
(not an easy task, since if they pop online during the process they can't be deleted because duma says they are online, lol)

Finally once everything is cleaned in Device Manager so each device has only 1 true mac address, and their true DHCP ip.. testing begins with both EX8000 in AP mode..

This seems to be the most stable configuration as almost all devices are online on the LAN side where they should be,
but over time, as devices such as phones and laptops roam onto to a different APs wifi, duma loses track and the device gets stuck in an offline state, or if a device that shifted to another extender and then was abruptly turned off, br0 holds the ip active, but sets the hw addr to all 00:00:00's -- this seems to clear itself up eventually, but the device will never shift to online again. even when it is online ..and even tho when clicking on it in Device Manager, you can still see it has an active dhcp IP address. Device Manager needs to look at the router arp to see its still online as the router has the current situation correctly handled.

The problem comes now with QOS/DPI.  If the device is listed inaccurately in Device Manager (offline, but is actually online)
you can be sure you are going to have some connectivity issues .. most basic Internet and stuff still works and you may not notice any issue at first.

If a roaming device happens to be misrepresented in this list it starts to effect port based applications on them. I am not 100% sure its because of this or some
other QOS/DPI bug. It may be a bug in the qos/dpi.

For example, I have some Arlo cameras - they are able to be accessed from the Internet.  But from the internal network they may appear as offline if trying to be accessed
from a device that is marked as offline, but is online in Device Manager.
The packets are either being misrouted from wifi0 to eth0, -> br0.. or the dpi engine is mangling the return path in the headers.
Especially if the device got stuck in an offline state and is actually online. Once QOS is disabled, everything works without issue as the router has it together.
Deep down router core knows whats going on, and its doing the right thing and keeping track of what is online, offline, and what mac is in use.
But duma database has gotten confused, so now qos/dpi does not know how to handle clients that are offline, but actually online (misreported in device manager)

Another example, I have some laptops that connect to SonicWall Netextender VPN. Someone else posted once they were having trouble with their Cisco VPN.
If the device manager goes into this chaotic state these take forever to create the VPN connection,
then once an RDP session is started within seconds the rdp session is closed and the netextender vpn closed as there was a break in the network traffic.  
QOS/DPI has either dumped the packets, misrouted, or mangled them since it isn't aware if the device is online or offline, or which IP is being used and where the packet
is supposed to be returned to.  When it applies qos to lan or wan it applies with bad info.
My guess is it detects unknown or unexpected traffic and just closes the port that the traffic is going out on.
You can test this by connecting to a VPN like express/nord and then try a netextender or other type of vpn connection (or app that isnt working) -- on a device that is giving issues.
it works without an issue because qos/dpi can't inspect the express vpn/nord vpn tunnel. everything works over the vpn tunnel like it should. just like as if disabling qos on the router..

workaround -- If a device is in this misrepresented state - Disable QOS, everything works perfectly.  As soon toggle it back on, same behavior. As time passes (maybe 12 hours, arpwatch?) you can start qos again, and the device will function normally until the chaotic state of duma database occurs again.
So the router itself knows what is and isn't connected and who they are, but duma qos/dpi (with corrupted information) just decides it doesn't like a packet and drops it
causing vpn apps or other apps that rely on high port #s to maintain connection to be unstable and drop, or just do weird things

Since duma doesn't clear out these associations efficiently QOS maps all these devices based on this bad information, so it can act unexpectedly at times.
Disabling QOS throws a TON of (NOP) to clear out the subclasses it created - I have a feeling it has something to do with the bad merging of the IPs,
or just plain old losing track of true online/offline status of a device.

Device manager needs to be rechecked and made more robust especially if other services are relying on this information to run right ..
it needs to be tested with more than one extender in AP mode. If you can get it to work with 2 flawlessly, hopefully it works with 200 :)

Keep in mind the router itself is doing the right thing, I checked and it is aware of whats going on, and everytime duma shows this behavior I check the routers tables
and they are correct, yet duma still shows the device offline with the correct DHCP ip address assigned.  dhcp.leases is correct, arp is correct, swconfig dev switch0 show is correct.
just duma loses sync between device manager and qos/dpi internal databases.

TLDR;
device manager bugs
qos/dpi bugs
test with 2 or more EX8000/7900 in AP mode and Extender mode and alternate
Need some way to clear or flush this if you get a bad or sticky device - or a different method to clear a stuck device besides rebooting like a FORCE DELETE even if online.
check port #s software side netgear statistics n duma device manager table view

 

Link to comment
Share on other sites

yes, lol. sorry!

something is definitely going on since this firmware --  i see more and more posts of people with sticky devices that are shown as offline, but still have an ip.

i have one right now on the list that appeared (a smart tv) its wifi.. its for sure on and working, but on the offline list in device manager. clicking on it shows its current mac & ip.

it could be the tv initially joined the routers native wifi0,1, but then found a better signal from an AP and jumped ship, so duma app mark it as offline, but its still online - it never updated its db to bring him back online in the tree but on the LAN side instead (via the AP with ethernet backhaul)

checked at router level and it shows it has a legit leased dhcp ip, active, and is listed in the arp table,  something isn't refreshing or doing a double check in the device manager code

(router has it right in its tables)

and if qos decides to use this internal database with the offline device (bad info) an apply qos to lan/wan its going to mark that device as offline, or not have a qos/dpi tag as active

so if it tries to do some more advanced networking connection (besides youtube or simple streaming) the packets will be dropped most likely or mangled .. dpi  'wtf is this i see data from a device that's not in my list of active online devices .. I don't know what to do with these packets.... dump>>null,splat)

if anyone sees a device like that, and its giving you some problems connecting to advanced services, try to disable qos and see if clears it up (its under qos, antibufferbloat settings).. it may still be on the offline list, but will work 100% i bet, since router has its shit together. if you wait a while then u can toggle qos back on. it will refresh its databases with current and maybe now correct info, but the device may never appear online again until a full reboot, or if you delete it.

if you disable qos you will lose the detailed breakdown of internet traffic in network monitor, and obviously buffer-bloat and priority and all that jazz, its not a solution, but a good test

 

 

 

 

Link to comment
Share on other sites

 I have the same issue with the ports not reading the correct device. In the network map table it says im using port4. In the system monitoring statistics its saying port 1. This is far more then a visual issue. It impacts the performance of this router greatly!! Another issue I have is I cant delete a device in Device Manager even though its offline. When I go to delete it I get a message saying cant delete because the device is online..when it is not.. Again this is not a visual issue only.. I cant stress enough of how major of a performance impact it has on this router.. All QoS becomes useless at this point.. All key Netduma features are impacted in some sort or fashion.. Another strange issue is I cant reboot the router through the GUI. Once I do this it never reconnects.. And I don't need any special credentials from my ISP to manual enter. Should be plug and play but it isn't.. Hard resetting it has the same issue. Its more then a pain in the ass to get this router to finally connect.. The only way I can get this router to reconnect is to power cycle it numerous times and hope and pray.. This could take up to hours and not minutes..  Once I do finally get it to reconnect and if I quick jump in and play a online game it performs well.. But something takes place where all of a sudden the performance of online gaming becomes unplayable!! And its coming from the router..  I have no idea if this is a Netgear issue or a Netduma issue.. One thing I do know is its a real pain that this router has technically two softwares/firmwares on it.. One Netduma and one Netgear.. Double the chance for issues to pop up! Ask me how my experience has been with this router thus far?? Any takers?? Ask me how much time ive screwed around with this?? Ask me if my patience is getting thin?? And lets not forget about the price tag!! 

Zippy.

Link to comment
Share on other sites

i think they will look at this more closely, it should work in theory.

i found a quicker way to test, instead of struggling to delete devices that have been screwed, i found the clear-devicemanager script, lol. just backup the database 1st. i do not want to retype 40+ device names to start with an empty database, should add an option to backup this db in duma interface, along with a force delete maybe in an advanced section somewhere down the road

i'm almost positive the router is handling everything correctly as far as routing is concerned. the lua apps are not, maybe 97% working . but the 3% tends to creep up and get you somewhere after a day or two uptime

i think they will eventually get it with testing and some refining.

i wonder guys, this xbox instant on bug, is it possible at all that it is not a microsoft bug?

since the xbox goes into low power mode, switching from 1000mbps to 100mbps and waking up randomly is kind of like plugging in and disconnecting a network cable

if the device manager database gets hosed during this process (a day or so), and it thinks the xbox is in an offline state, but is actually active but not active as far as the qos.sh status table -- it can cause the dropping of packets i'm talking about above also.

if xbox wakes up and tries to login to xbox live using high port #s, qos/dpi is the culprit dumping the unknown packet traffic, since it sees data from a mac/device that isn't marked as active/online in its database, which is relying on device manager data, which in turn, is wrong.. catch 22  -- if the xbox can't reach xbox live or its tunnel is closed eventually it gives up and takes a shit and just says no internet connection

^it may not be, but i've seen it  dump vpn 4433 traffic , or 443 traffic when the device is reported as offline and qos is enabled, why not xbox live traffic too.  

it works behind a switch because the switch handles it differently, its not reporting to switch0 that the client periodically turns on, or when it is in a low power state when on a switch.. but when it truly powers up, then the switch relays a connection info and duma identifies properly

does this bug occur when qos is disabled and xbox is in instant on mode and plugged directly into XR lan port? someone who still has this issue should just disable QoS and see if xbox stays online for a couple days, just as a test...

^^just thinking/typing out loud, it could be something else, but who knows..

to zippy,

if you click on the device does it report a mac address and ip address still when its offline? and you are 100% sure the device is off? I've seen this, try to disable qos and wait a while, try a quick on/off, then off wait an hour, or wait 12 hours :)

the br0 interface most likely is holding onto this ip with a mac addr of 00:00:00:00:00:00, once you close qos and reopen qos it may refresh this and clear out the empty hwaddr then u can delete the device, but it will just happen again at some point until they track this down, disabling qos kills the br0 interface (its a bridge) and recreates it when you turn back on

a reboot will def empty it out, and you can then delete it for sure, but you shouldn't have to do this, linux wasn't made to be rebooted over and over and over, technically you can fix this without rebooting if you kill/refresh certain processes

this is not recommended tho, if you f something up you will be reflashing, lol just wait for them to fix it software wise. hopefully they find something not being checked in the device manager code, and hopefully they can recreate these scenarios to be able to fix them

 

 

Link to comment
Share on other sites

small update: i just saw another device that was listed as offline and had an ip address still attached when clicked on in device manager. it was another wifi smart tv that was turned off much earlier

this time it did not have all 00's as hwaddr in arp table as it was truly offline. if its in this state and online it will have 00's

just for a test, i tried pinging its ip from router root this time, and as soon as I pinged it and got no response the ip disappeared from Device Manager's listing pretty much instantly .

maybe what needs to be done is add a function that runs every 5 seconds to throw a ping at all offline devices that have an ip attached in the database

if it doesn't reply, it will just clear itself right then and there if its sticky and truly offline

if it does reply, rerun whatever needs to be done to figure out if its now on wifi or now on lan side  (via AP ethernet backhaul) and then update the database

if it doesn't reply, and ip doesn't disappear its not responding to ping requests, but is online, maybe need a better way to detect actual connectivity in this case

this works to clear an offline device, but probably not the best solution if device doesn't respond to icmp -- it will clear an offline device, but not a device thats offline but connected

maybe quicker to send a couple arp packets, nmap , or tcpconnect if it doesn't want to respond to pings, need a surefire way to make sure the ip is no longer active, need to find most efficient solution that generates the least amount of load and backround chatter

still doesn't address resetting of qos info

Link to comment
Share on other sites

Guest Killhippie
4 hours ago, xr500user said:

small update: i just saw another device that was listed as offline and had an ip address still attached when clicked on in device manager. it was another wifi smart tv that was turned off much earlier

this time it did not have all 00's as hwaddr in arp table as it was truly offline. if its in this state and online it will have 00's

just for a test, i tried pinging its ip from router root this time, and as soon as I pinged it and got no response the ip disappeared from Device Manager's listing pretty much instantly .

maybe what needs to be done is add a function that runs every 5 seconds to throw a ping at all offline devices that have an ip attached in the database

if it doesn't reply, it will just clear itself right then and there if its sticky and truly offline

if it does reply, rerun whatever needs to be done to figure out if its now on wifi or now on lan side  (via AP ethernet backhaul) and then update the database

if it doesn't reply, and ip doesn't disappear its not responding to ping requests, but is online, maybe need a better way to detect actual connectivity in this case

this works to clear an offline device, but probably not the best solution if device doesn't respond to icmp -- it will clear an offline device, but not a device thats offline but connected

maybe quicker to send a couple arp packets, nmap , or tcpconnect if it doesn't want to respond to pings, need a surefire way to make sure the ip is no longer active, need to find most efficient solution that generates the least amount of load and backround chatter

still doesn't address resetting of qos info

Interesting you mention that, I had to re add my smart TV's Wi-Fi password as the router was saying my Android TV was attacking the router with [DoS Attack: ARP Attack] it wasn't but it was blocking the TV from self updating apps and renewing its lease.

 Also I think I found out the cause of my PS4 issues with high priority traffic not working. My switch has died (S8000) and right now I'm using Wi-Fi (a sin I know) to use my PS4 Pro over 802ac, as its to far away to run a Ethernet cable and I'm in a wheelchair so I have to wait for help to run new switches. The Ookla speed test app for iOS 12 has had a update to show single and multi threaded tests now, but it confuses the DPI engine if you do a speed test when QoS and Antibufferbloar is in use while gaming. The iPad was stuck with what the router thought was constant VPN traffic from the iPad. Since removing the app and speed testing via web based applications 'after' gaming is over so far this bug has not returned (fingers crossed).

I have a later revision of the router without the misprint, so that port issue should not apply, tbh I don't think it should be an issue, its only hard coded LED's on the PCB. The switch knows what port is port #1 no matter what's printed on the back. That's why it could not be fixed in firmware, you cant alter the ports numbering, only things like the flash rate etc for traffic throughput. I asked Voxel way back if you can change port numbering in firmware and he said you cant as its hard coded. He confirmed you can only alter LED state and thats it.

Link to comment
Share on other sites

yeah, a bad switch can send out some weird traffic, i have some sx10 which r holding up pretty well so far, but time will tell , i mean i still have business switches that are 15 years old and still working -- no switch should die that soon .. unless power surge or somethin' - does the s8000 have a reset pinhole, i believe the sx10 does .. worth a shot

as far as the port numbers yeah, the lights are toggled on by the OS, i saw led files in there .. that's not a big deal to me , can probably fix my light numbering myself, but will be lost with an update .. but what i'm sayin is its in the firmware -- as the port marked on the back (1) is the one shown as connected to LAN 1 on the netgear side in admin panel (check settings, monitoring, statistics)

but in the duma side its shown as port 4 (check device manager, Table view)

I'm curious since yours has the right markings it should still be effected, but the reverse.. it can only can be noticed if you are not using all 4 lan ports on the router and can see the numbering discrepancy

they changed something in the qos/dpi since .32 and its making some mistakes or not updating its table correctly

i dont think the numbering would matter since all 4 ports should be just considered one switch (switch0) but if they are specifically addressing individual ports 0x1, 0x2,etc then yes if netgear side thinks the numbering is reversed of course a qos rule will get applied to the wrong lan port# and shit will stir after a while

 

 

Link to comment
Share on other sites

More investigating ..

sticky devices

scenario 1:

if device joins native router wifi radio wifi0,1 (ath0,1) then it is listed correctly in 'wlan stainfo'
--> pops device online and shows as online in device manager

if said device jumps to an ethernet backhauled AP (jumps to APs wireless radio and AP is backhaul wired to switch->router LAN2)
--> duma pushes device to offline tree and shows as offline - stays stuck in offline tree...

in actuality, device now active on LAN instead (via AP backhaul)

checking:

cat /proc/net/arp

Device listed in arp table is now with 0x2 Flag -- so now on lan, and active (via AP wired backhaul)

When click on device in Device Manager it is reporting correct mac, correct ip, but 'Connection Type' is Wireless and in "Offline" side of tree.
Should be moved to LAN/Online in this case

arp table flags:

Flags 0x2 = online
Flags 0x0 = incomplete
Not listed, Not on LAN (switch0)

0x0 flagged devices will disappear from arp in a few minutes after device offline
if device has 0x0 flag it will hold mac addr for a little bit, devices that are stuck in Online or Offline tree turn into 00:00:00:00:00:00 hwaddr sometimes
(now offline, but duma still thinks has an active ip and listed as online)
^eventually removed from arp table

any net activity sent towards ip from router root that is stuck online or offline and is truly offline  (a ping, or telnet "No route to host", etc) but is actually offline will drop / remove IP from arp (if present), and it will drop from Online to Offline status in Device Manager tree instantly.

scenario 2:

Device joins AP w/ethernet backhaul wifi first
--> duma pops device Online on LAN side of tree
Device then shut off, or wifi closed
--> duma keeps device listed online, with ip still listed.

arp flags are quickly set to 0x0, and eventually removed from arp table (happens rather quickly , seconds)

swconfig dev switch0 show
--> still holds onto MAC and obsolete PORTMAP info here for a while longer then arp, but then eventually it is removed after a few minutes
--> duma still keeps device listed as online (stuck) from now on, with ip still listed as active/connection type 'wired'

any net activity sent towards ip from router root that is stuck online (a ping, or telnet "No route to host", etc) the device drops from Online to Offline side of tree instantly.

I have not found a way yet to move an offline device that is actually truely online to the correct tree or make duma refresh this device status without turning off its wifi, and deleting it.

I hope you guys can re-create this.  It's the same if it's directly wired taking the switch out of the picture.
Something is not being updated when devices move around, need a double check on all online and offline devices periodically, can't just assume they are online and still online or offline and still offline..

additional:

So I have 2 switches (1 AP on switch 1/router lan1, 1 AP on switch2/router lan2) that are plugged into ports labeled 1 and 2 on back of router (this is correct i think) but 3 & 4 LEDs light up on router.

Core OS thinks ports listed on back of router are the correct port#s and are active, not the LEDs.

but in admin page Settings, Monitoring, Statistics (netgear software side) shows they are listed as LAN3 and LAN4 (same as router LEDs)

3 & 4 LEDs are the right most LEDs on front of my router.

double checking...

swconfig dev switch0 port 1 show
swconfig dev switch0 port 2 show   
both -->link:up - so back of router numbering and actual port is correct
--> 3,4 = link:down (LEDs)

duma Device Manager table view shows port 1,2 are connected - also correct

I'm not sure but I think all devices will show some discrepancy between netgear settings, monitoring, statistics and duma table port view "misprinted" or not?

why does netgear (software side) settings,monitoring,statistics show LAN3 & LAN4 as active?
is it only listing link up/down activity based on LED light status and not actual swconfig port status??? <duh>
is the switch board installed upsidown internally? can i open the router and flip/reverse a connector?
or am i missing something? all pictures of XR500 show LEDs on the front as 1,2,3,4 respectively and 1,2,3,4, WAN on the back
as long as the information from statistics means nothing to duma then it doesn't really matter, but if I can fix it I will.
wheres that screwdriver .. I'm going in ..lol

Link to comment
Share on other sites

Guest Killhippie
22 hours ago, xr500user said:

yeah, a bad switch can send out some weird traffic, i have some sx10 which r holding up pretty well so far, but time will tell , i mean i still have business switches that are 15 years old and still working -- no switch should die that soon .. unless power surge or somethin' - does the s8000 have a reset pinhole, i believe the sx10 does .. worth a shot

as far as the port numbers yeah, the lights are toggled on by the OS, i saw led files in there .. that's not a big deal to me , can probably fix my light numbering myself, but will be lost with an update .. but what i'm sayin is its in the firmware -- as the port marked on the back (1) is the one shown as connected to LAN 1 on the netgear side in admin panel (check settings, monitoring, statistics)

but in the duma side its shown as port 4 (check device manager, Table view)

I'm curious since yours has the right markings it should still be effected, but the reverse.. it can only can be noticed if you are not using all 4 lan ports on the router and can see the numbering discrepancy

they changed something in the qos/dpi since .32 and its making some mistakes or not updating its table correctly

i dont think the numbering would matter since all 4 ports should be just considered one switch (switch0) but if they are specifically addressing individual ports 0x1, 0x2,etc then yes if netgear side thinks the numbering is reversed of course a qos rule will get applied to the wrong lan port# and shit will stir after a while

 

 

I'll have a look, I'm not using all my ports so should see what you mean. Most odd... Yes you are right, my iMac is shown as plugged into LAN port 4 in table view but is plugged into LAN port 1 in Netgears statistics, and I have the correct port numbering on the back. that's bizarre. The S8000 was plugged into a sine wave  UPS alas the pinhole did not save the day.

Link to comment
Share on other sites

  • Netduma Staff

I've passed this to our developers, so when they get a spare moment they may be able to take a look and see what's going on in this case. I shouldn't think it's anything other than aesthetic; if it were affecting other systems we'd know by now. Thanks for investigating guys!

Link to comment
Share on other sites

not sure if the "correct" print may actually be the misprint now, lol. 

most of the images I see online of the router the rear port numbers go left to right  [1,2,3,4    WAN]  -- but you have  [4,3,2,1   WAN] on the back I assume?

The front led's are always left to right 1,2,3,4   (some just have only numbers above the leds 1 2 3 4, mine has eth1, eth2, eth3, eth4 printed above the LEDs -left to right-)

In your case, the Netgear side thinks the port closest to the WAN(Internet) port is port 1 (this is the error), as it is port 4 according to the router kernel.

I think there was some confusion when the issue first appeared that it was a manufacturing printing defect only, but its actually software and printing .. software+rear panel numbering reversed+netgear lighting the wrong led at bootup

4 3 2 1 WAN will light the correct corresponding LED on the front, but it's not the correct port according to the router kernel

the lights on the front light up during bootup and they are software lit, the fix probably should of been to isolate the SN's of the routers that had wrong back port printing [4,3,2,1 WAN] and if its in the range to just reverse the front LEDs that are lit at boot -- if they can't do that then an option in the admin panel to reverse them with a checkbox. Then they just had to send you a sticker to stick above the rear ports to renumber them [1,2,3,4 WAN]

I'm almost positive the Duma side device manager (table view) is showing the correct kernel port (in your case port 4) and it matches what the OS swconfig switch port id's actually are to the kernel

I think [1,2,3,4 WAN] on the rear numbering should be the correct numbering... the lights won't match until the netgear software is fixed to light the right led. that's why i had a feeling netgear side is displaying statistics ports based on led activation and not actual swconfig true port configuration? that has to be fixed.. or it could be that some cable is reversed inside the router (flipped) during assembly causing all of this .. but need to know what is the correct one and if we can manually fix it if it is this

It really shouldn't make a difference network wise at if all 4 ports are considered just 1 switch (switch0) in all transactions -- but if by chance netgear or duma side applies any firewall or qos or dpi rules to a specific LAN port, or netgear side keeps statistics on the wrong LAN port it may be an issue if applied to the wrong one, probably on the netgear side more since the OS is reporting the same as duma as to which port is the active port -

on another note, it may be causing some issues with device manager.. in regards to sticky devices (a device that is showing an IP still attached to it in Tree view but is on the offline side of the tree) that does not allow you you to delete it (claiming its online):  I noticed the same device has no IP attached to it in Table view (correct), but still cannot be deleted .. so it may have something to do with this as some of duma apps may be trying to figure out online/offline/active status are requesting activity (arp,ip neigh show, arpwatching, etc) in a way from sources that aren't being updated in the best manner if netgear side thinks there is no connection to that LAN port (as tree view is unaware the ip has vanished) but table view does know its no longer in use/offline --it shows N/A. -- i know its more and more confusing now... may not be the cause, but it just adds to the mess and possibly f things in the future

I think it really needs to be looked at and a fix figured out for all with both printings / led situations

and it should show and always use the true kernel port no matter what the printing or led light is on the router, i could care less if its reverse printed or the wrong led on case, just that everything is correct in the OS side

 

 

Link to comment
Share on other sites

hey fraser, do you ever sleep? I'm starting to think your a bot :) j/k

more ...

scenario 3:

Duma device manager has yet to see this device (brand new)

Device connects to local router wifi0,1 and is seen.
--> Duma shows client connected on wifi tree
Device leaves wifi network (a phone for example - left for work)
--> Device drops offline.

Device returns later but this time connects to AP first that's ethernet backhauled to LAN
--> Device sticks offline, but now has an active IP listed when clicked on. device manager and qos db never updated.
Device stuck to offline tree.

It's as if the last or current 'Connection Type' was 'Wireless', duma assumes its always Wireless or will return to Wireless and if it rejoins Wired (or the reverse -> joins wired and moves wireless) there's a problem.

scenario 4:

The reverse may also be true, causing client to stick to Online tree (i have not tested this scenario yet, but i have a feeling its valid)
Duma device manager has yet to see this device (brand new)
You'd have to be connect to AP wifi first and be shown as online in the LAN tree, then roam/jump to local router wifi0,1 over time.
Device will remain stuck to LAN online side and online but listed as connected to local wifi in 'wlan stainfo' ---- then at some point device abruptly leaves the network.
Device then never moves from Online LAN side to Offline side, remains stuck online as 'Wired' connection type.
IP it claims its currently using when clicked on in device manager is not listed in arp table on router, or may be listed as 0x0 flag with correct mac addr.
I believe table view may have IP listed as "N/A" in this case. Table view is doing something a little different then tree view it seems.

It appears initially device manager assumes Wireless when it shouldn't.  
Maybe shouldn't rely on Connection Type as a trigger as it can vary -- last seen connection type is not reliable especially when using multiple APs that are set up with ethernet backhauls, or mixed ethernet backhauls & extenders in extender mode.

as said previously, when forcing off a stuck 'truly' offline device that's listed as online on the LAN tree by ping or sending it a few arp packets they drop to offline side instantly.
router lists the ip of device at that point in neigh table as 'FAILED' because there is no route to host, it's not online.
arp table may or may not have it listed with 0x0 flags, or the device not even listed at all, but will appear for a short time with 00:00 hwaddr and 0x0 flag if the ip isn't in use.
it depends on which table is looked at first before attempting to remove the device by sending it some packets.
whatever ip you try to send packets to (just make any up 192.168.1.xxx) that's offline - gets a temporary arp entry 0x0 00:00:00 until its removed shortly after automatically and ip neigh show marks as FAILED if not online. neigh table clears shortly after also. arp seems to clear quicker then neigh

..right after device drops to offline tree the logs show device manager db updated and also qos db updated.

so it's important this detection and db is correct because over time qos gets hosed for online devices that are thought to be offline
qos has rules set to DROP everything first (assumed ddos?) so some packets will drop even though it may have basic internet connectivity but stuck in the offline tree.
only solution for this is to disable qos and then device has full internet access when in a 'thought to be offline' state (a qos marked device) and the tree status is still wrong.

its super slick what your doing, but it depends on that device database being totally accurate at all times and in all conditions to work flawlessly.

I'll send you some more coffee, I'm sure you'll figure it all out.. fraser get some sleep lol

Link to comment
Share on other sites

Guest Killhippie
19 hours ago, xr500user said:

not sure if the "correct" print may actually be the misprint now, lol. 

most of the images I see online of the router the rear port numbers go left to right  [1,2,3,4    WAN]  -- but you have  [4,3,2,1   WAN] on the back I assume?

The front led's are always left to right 1,2,3,4   (some just have only numbers above the leds 1 2 3 4, mine has eth1, eth2, eth3, eth4 printed above the LEDs -left to right-)

In your case, the Netgear side thinks the port closest to the WAN(Internet) port is port 1 (this is the error), as it is port 4 according to the router kernel.

I think there was some confusion when the issue first appeared that it was a manufacturing printing defect only, but its actually software and printing .. software+rear panel numbering reversed+netgear lighting the wrong led at bootup

4 3 2 1 WAN will light the correct corresponding LED on the front, but it's not the correct port according to the router kernel

the lights on the front light up during bootup and they are software lit, the fix probably should of been to isolate the SN's of the routers that had wrong back port printing [4,3,2,1 WAN] and if its in the range to just reverse the front LEDs that are lit at boot -- if they can't do that then an option in the admin panel to reverse them with a checkbox. Then they just had to send you a sticker to stick above the rear ports to renumber them [1,2,3,4 WAN]

I'm almost positive the Duma side device manager (table view) is showing the correct kernel port (in your case port 4) and it matches what the OS swconfig switch port id's actually are to the kernel

I think [1,2,3,4 WAN] on the rear numbering should be the correct numbering... the lights won't match until the netgear software is fixed to light the right led. that's why i had a feeling netgear side is displaying statistics ports based on led activation and not actual swconfig true port configuration? that has to be fixed.. or it could be that some cable is reversed inside the router (flipped) during assembly causing all of this .. but need to know what is the correct one and if we can manually fix it if it is this

It really shouldn't make a difference network wise at if all 4 ports are considered just 1 switch (switch0) in all transactions -- but if by chance netgear or duma side applies any firewall or qos or dpi rules to a specific LAN port, or netgear side keeps statistics on the wrong LAN port it may be an issue if applied to the wrong one, probably on the netgear side more since the OS is reporting the same as duma as to which port is the active port -

on another note, it may be causing some issues with device manager.. in regards to sticky devices (a device that is showing an IP still attached to it in Tree view but is on the offline side of the tree) that does not allow you you to delete it (claiming its online):  I noticed the same device has no IP attached to it in Table view (correct), but still cannot be deleted .. so it may have something to do with this as some of duma apps may be trying to figure out online/offline/active status are requesting activity (arp,ip neigh show, arpwatching, etc) in a way from sources that aren't being updated in the best manner if netgear side thinks there is no connection to that LAN port (as tree view is unaware the ip has vanished) but table view does know its no longer in use/offline --it shows N/A. -- i know its more and more confusing now... may not be the cause, but it just adds to the mess and possibly f things in the future

I think it really needs to be looked at and a fix figured out for all with both printings / led situations

and it should show and always use the true kernel port no matter what the printing or led light is on the router, i could care less if its reverse printed or the wrong led on case, just that everything is correct in the OS side

 

 

My router shows 4-3-2-1 WAN as most Netgear routers do, in Netgear statistics  its connected to port 1 which is controlled by the base firmware that runs the switch etc with the code from Qualcomm and Netgears GPL. It's  only connected to DumaOS in port 4 in the new Table view. When I plug the iMac into port 1 on my router port one lights up on the front. Same as the R7800 and all Netgear routers I have ever had. The XR500 is the same PCB as the R7800 just with a bit more flash NAND so it is hard wired the same, with ports and LED's in the same order. So my LED's light up as they are numbered on the back, the same order as the XR700 which is 6-5-4-3-2-1 WAN (even though it has different port configurations) The XR700 is a R9000 router in an new shell and the XR700 has the exact same PCB again as the R9000. So port numbering on that router is correct from Netgears side too.

 Port to LED is hard wired on the PCB in this case. Netgear in the US who looked at the XR500 said it was a printing error and could not be fixed by firmware and a third party developer Voxel looked at the R7800 (same PCB) and found that it does not have the ability for Netgear firmware to alter port assignment it would appear you can only alter on and off states, and blink rates etc but that's it, so the switch on the XR450, XR500 and XR700 are in the correct sequence which is port 1 nearest the WAN port. So my LED's on the front light up the correct way. My iMac has always been LAN 1 in statistics, even on the old version that went 1-2-3-4 WAN. On that version If I plugged into port 4 Netgears firmware showed LAN 1 in use and LED 1 lit up , its Netgears GLP code and Qualcomms code and drivers as I mentioned that controls this, not DumaOS. I think it is just a GUI error maybe brought over from the R1 which uses port 1 for WAN then goes 2-3-4-5. So the WAN port is furthest away from port 5 which on a Netgear router would be port 1 which also have a separate WAN port anyway.  I think the odd view on the XR500s GUI comes with with milestone 1.3 and the table view that's maybe taken from the R1's GUI which got ported to Netgears routers where port 1 is always next to the WAN port which is the opposite way round on the R1.

 On the R1 port 1 the WAN Port is furthest away from 5 (in reality port 4 I'm guessing) which is always port 1 on the Netgear routers used for DumaOS as the XR450 is also 4-3-2-1 WAN (phew)  I think the LED's are a total red herring, the XR450, XR500,  use the same switch, a Qualcomm Atheros QCA8337; 7x Ports, 5x PHY, MII/GMII/RMII, RGMII/SGMII,1K MAC, 4K-tag VLAN, QFN 148-Pin QCA8337-AL3C (IPQ8064). The XR700 a Qualcomm Atheros QCA8337N; 7x Ports, 5x PHY, MII/GMII/RMII, RGMII/SGMII,1K MAC, 4K-tag VLAN, QFN 148-Pin QCA8337N-AL3C (QCA9563) All controlled by Netgear/Qualcomm's firmware, and DumaOS picks up on that code hence a GUI bug from the R1 possibly causing this port 4 numbering in table view All three routers cant be numbered wrongly. Well that's my theory so far. Interesting stuff :) 

Link to comment
Share on other sites

definitely interesting!...

I am by no means any expert on this area here, just an observer
Maybe it's a duma porting of milestone 1.3 oversight <-

This is what I see when poking around - my comments = //

So FYI: I have 1-2-3-4 WAN on the back of mine --- //netgear misprint

I have 2 switches wired to XR500, one plugged into LAN 1, one into LAN2  (LAN1 being leftmost, furthest from the WAN as per my rear numbering)

this is from the OS side XR500:

#detcable show //show me what cables are plugged in
LAN3 : Plug off
LAN2 : Plug off
LAN1 : Plug in, 1000M, Full duplex
LAN0 : Plug in, 1000M, Full duplex
WAN  : Plug in, 1000M, Full duplex

//matches my misprinted rear numbering - and same as R1?

#swconfig dev switch0 show //i removed all the tx rx info
Global attributes:
        enable_vlan: 1  //1=true using 2 vlans, local network is vlan 1,  wan is vlan 2
        max_frame_size: 1518
        dump_arl: MAC: PORTMAP: VID: 0x2 STATUS: 0x0

Port 0:
        pvid: 2  //wan? on vlan 2
        link: port:0 link:up speed:1000baseT full-duplex txflow rxflow
Port 1:
        pvid: 1 //LAN1 port, on vlan 1
        link: port:1 link:up speed:1000baseT full-duplex
Port 2:
        pvid: 1 //LAN2 port, on vlan 1
        link: port:2 link:up speed:1000baseT full-duplex
Port 3:
        pvid: 1 //LAN3, on vlan 1
        link: port:3 link:down
Port 4:
        pvid: 1 //LAN4, on vlan 1
        link: port:4 link:down

//matches my misprinted rear -- maybe the swconfig needs to be updated to match Netgears portnumbering (oversight?)
//XR700 will use next 2 more ports for vlan 1 I assume
//and for LEDs -- can they can be mapped?
//I found this on OpenWrt
//swconfig dev xxxxx port x set led x   //dev switch0?
//swconfig dev xxxxx set apply
//i think they can be reversed, but I do not want to try and brick the router
//led controlled by CPU, via gpio<-
//not sure if it will stick on this switch though if setting the led

Port 5:
        pvid: 2 //ethwan on vlan 2 to be able to talk to wan from eth switch
        link: port:5 link:up speed:1000baseT full-duplex txflow rxflow
Port 6:
        pvid: 1 //on vlan 1 not sure what this , guess brwan->br0? the bridge interfaces?
        link: port:6 link:up speed:1000baseT full-duplex txflow rxflow
VLAN 1:
        vid: 1
        ports: 1 2 3 4 6
VLAN 2:
        vid: 2
        ports: 0 //wan 5 //ethwan

so either its a mistake from the very beginning by netgear on handling portnumbering from their first Nighthawks and they just always printed the reverse on the back of all their routers: 4-3-2-1-WAN to fix it, rather then really fix it...

or duma oversight to update swconfig after porting of milestone 1.3 to xr500

software correction leaves so many places if you forget to fix it in one place it will pop up again, best to have it match the hardware otherwise something just need to keep in mind all the time

in your case that's why duma is showing port 4 in table view, its showing what the OS is reporting what port 4 is regardless of whats printed on the back of the case, if you were to do these checks on your router you would see port 3 and 4 are the ones that are link:up

not sure if netgear will let duma modify swconfig settings

anyone who knows more about switch configurations feel free to jump in

Link to comment
Share on other sites

Guest Killhippie
11 hours ago, xr500user said:

definitely interesting!...

I am by no means any expert on this area here, just an observer
Maybe it's a duma porting of milestone 1.3 oversight <-

This is what I see when poking around - my comments = //

So FYI: I have 1-2-3-4 WAN on the back of mine --- //netgear misprint

I have 2 switches wired to XR500, one plugged into LAN 1, one into LAN2  (LAN1 being leftmost, furthest from the WAN as per my rear numbering)

this is from the OS side XR500:

#detcable show //show me what cables are plugged in
LAN3 : Plug off
LAN2 : Plug off
LAN1 : Plug in, 1000M, Full duplex
LAN0 : Plug in, 1000M, Full duplex
WAN  : Plug in, 1000M, Full duplex

//matches my misprinted rear numbering - and same as R1?

#swconfig dev switch0 show //i removed all the tx rx info
Global attributes:
        enable_vlan: 1  //1=true using 2 vlans, local network is vlan 1,  wan is vlan 2
        max_frame_size: 1518
        dump_arl: MAC: PORTMAP: VID: 0x2 STATUS: 0x0

Port 0:
        pvid: 2  //wan? on vlan 2
        link: port:0 link:up speed:1000baseT full-duplex txflow rxflow
Port 1:
        pvid: 1 //LAN1 port, on vlan 1
        link: port:1 link:up speed:1000baseT full-duplex
Port 2:
        pvid: 1 //LAN2 port, on vlan 1
        link: port:2 link:up speed:1000baseT full-duplex
Port 3:
        pvid: 1 //LAN3, on vlan 1
        link: port:3 link:down
Port 4:
        pvid: 1 //LAN4, on vlan 1
        link: port:4 link:down

//matches my misprinted rear -- maybe the swconfig needs to be updated to match Netgears portnumbering (oversight?)
//XR700 will use next 2 more ports for vlan 1 I assume
//and for LEDs -- can they can be mapped?
//I found this on OpenWrt
//swconfig dev xxxxx port x set led x   //dev switch0?
//swconfig dev xxxxx set apply
//i think they can be reversed, but I do not want to try and brick the router
//led controlled by CPU, via gpio<-
//not sure if it will stick on this switch though if setting the led

Port 5:
        pvid: 2 //ethwan on vlan 2 to be able to talk to wan from eth switch
        link: port:5 link:up speed:1000baseT full-duplex txflow rxflow
Port 6:
        pvid: 1 //on vlan 1 not sure what this , guess brwan->br0? the bridge interfaces?
        link: port:6 link:up speed:1000baseT full-duplex txflow rxflow
VLAN 1:
        vid: 1
        ports: 1 2 3 4 6
VLAN 2:
        vid: 2
        ports: 0 //wan 5 //ethwan

so either its a mistake from the very beginning by netgear on handling portnumbering from their first Nighthawks and they just always printed the reverse on the back of all their routers: 4-3-2-1-WAN to fix it, rather then really fix it...

or duma oversight to update swconfig after porting of milestone 1.3 to xr500

software correction leaves so many places if you forget to fix it in one place it will pop up again, best to have it match the hardware otherwise something just need to keep in mind all the time

in your case that's why duma is showing port 4 in table view, its showing what the OS is reporting what port 4 is regardless of whats printed on the back of the case, if you were to do these checks on your router you would see port 3 and 4 are the ones that are link:up

not sure if netgear will let duma modify swconfig settings

anyone who knows more about switch configurations feel free to jump in

I don't know myself but say with swconfig settings I imagine Netgear wont allow third party firmware to alter its base firmware and I'm guessing DumaOS rides on the back on netgears changes without being able to modify, that's only a guess though. Even before nighthawk routers most have have followed the 4-3-2-1  config, like the R6300v2 which is from June 2013. Even the DG834v1 (I had one) is 4-3-2-1 and thats way back from November 2003. There were a few that deviated like the DGND3700v2, but Netgear predominately use the same config for their Ethernet ports since their very first home adsl home router. Make of that what you will. I guess the switches are preprogrammed the way they want as they cant be altered after manufacture, but I really am in uncharted territory here. :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...