Jump to content


DumaOS Insiders
  • Content Count

  • Joined

  • Last visited

  • Days Won


Everything posted by xr500user

  1. i also have auto update off and this issue hit me as well, just noticed today that error message on the admin interface, lol. i think another option is necessary to block ALL cloud updates if you don't want them - it is a security risk if this cloud server gets compromised. imagine what may be injected into all xr routers if an overflow is found?? dangerous and very concerning. anyway, it took me a few minutes to figure out how to fix it without a reboot. you know me, JUST SAY NO TO FACTORY RESET AND REBOOTS! luckily, still worked. was able to enable telnet... [email protected]:/# /rom/etc/init.d/./dumaos stop [email protected]:/# /rom/etc/init.d/./dumaos start go back to and disable telnet all back up and running ... no reboot, no factory reset i hope this next update fixes all the problems w/qos and device manager
  2. the /tmp dir gets overwritten/recreated on restart, so any changes in there you have to make again after a reboot. not sure if /etc will do the same ... i'm using the router as DNS w/google dns servers and this is what they contain: /etc/dnsmasq.conf: # filter what we send upstream domain-needed bogus-priv localise-queries no-negcache cache-size=0 no-hosts try-all-ns /tmp/resolv.conf: nameserver nameserver ---> you could try to recreate those 2 files and use your local nameservers in resolv.conf, if it doesn't work, just reboot and reset to what they were before changes.. (if they don't automatically do that) worst case, factory reset to restore everything
  3. Out of curiosity try disabling QoS, see if the ssh's hold then, may be connected to firewall/qos dropping return-pipe connections (net2loc)? If it's going to drop, usually it's within 20-30 seconds - weird issue with marked pipes and since ssh are encrypted (can't read the header) could be adding to it ... returning connection from internet unknown/dropped.
  4. DEVICE MANAGER DEVICES THAT ARE IN THE OFFLINE BRANCH, BUT ARE ONLINE AND HAVE AN ACTIVE IP (when clicked on) AND SHOULD BE ON THE LAN BRANCH: Another warning .. mess it up, a factory reset is in your future. Device Manager uses an SQLITE database to store the device names, type, etc. First, identify the device in the device manager tree that is stuck on the offline branch (but actually online) and then find it in the device manager database: [email protected]:/# cd /dumaos/apps/system/com.netdumasoftware.devicemanager/data [email protected]:/# sqlite3 sqlite> .open database.db sqlite> .header on sqlite> .tables device interface ndtech_config sqlite> select * from interface; /*list all your devices in interface table, device table seems to have user custom names of devices*/ mac|devid|dhost|gtype|ghost|wifi|pinned|atype ...device ...device ...device XX:XX:XX:XX:XX:XX|209|LGWEBOSTV|TV|TV|1|0| XX:XX:XX:XX:XX:XX|210|GALAXY-S10|Computer|unnamed device|0|0|Phone XX:XX:XX:XX:XX:XX|211|GALAXY-S10|Computer|unnamed device|1|0|Phone ...device ...device --COMMENT: find the offending device. the device (w/devid |211|) is the device that is currently stuck offline in Device Manager Tree (but it's actually online via hardwired AP and has an active IP) wifi is set to '1' when device was on XR500's wifi, but moved to AP wifi (AP is on LAN) so change it to '0' sqlite> UPDATE interface SET wifi = '0' where devid like '211'; sqlite> select * from interface where devid like '211'; /*just checking update was success*/ mac|devid|dhost|gtype|ghost|wifi|pinned|atype XX:XX:XX:XX:XX:XX|211|GALAXY-S10|Computer|unnamed device|0|0|Phone --COMMENT: So device was manually moved to LAN in device manager database, devicemanager.database update was processed and shown in log, device jumps immediately in Device Manager Tree to LAN/Online branch. sqlite> .exit [email protected]:/# --COMMENT: offending device is now fixed until next error in duma script/netgear occurs (ex. device moves to different AP, or XR500 wifi0,1 and then back to AP and device drops to offline, but wifi field is not updated correctly in database.db) finding that will be fun. I use my extenders in AP mode, so usually this happens to devices that joined the XR500's wifi and then at some point shifted to the AP wifi. One of the duma scripts is not updating device manager database in this scenario, or tracking of it gets lost. I'm not sure if it would help with extenders in actual extender mode - it's worth a shot. I thought about fixing the scripts but it would mean de-compiling the duma lua scripts, finding the statement not updating the statuses, recompile, more testing, etc. but the de-compiled script may not be complete, so not worth it. I've spent too much time tinkering today, maybe some other day, unless you guys fix it and spare my Device Manager OCD. Now for QOS .... ahh QOS, my friend, you need some firewall work.. maybe another day ...
  5. This is for those of you who like to tinker with the router and the duma devs if they are looking for the fix (or maybe they know already?) and a warning if you don't know what you're doing, a factory reset will be in your future. So I got to messing with this again today (bored) - router still up 15+ days. no reboot, no factory reset QOS still disabled. That's for another rainy day ... In a previous post I talked about throwing arping probes at each device on the router network to try and jar them free in device manager. Arping is no good, tested. Whatever you use, it needs to actually open a pipe/socket to the device in the kernel and get a temporary entry in the arp table. An arping probe doesn't seem to do it. I tested with telnet (works, but telnet could actually get a response from a host and open a socket, and if the ip is not in use it just takes too long). Would have liked to use nc (netcat) but unfortunately nc can also get a response from a host and take too long on some types of hosts. The version of nc on the router is super limited also, missing a lot of arguments, no -w timeout option which makes it difficult. So I settled on good old ping, even though ping on the router is super limited too and is missing a lot of useful argument options (like wait time). Had to devise a tiny script that would ping all possible hosts on the LAN subnet and do it quickly with what is available by default. In order to do this, you need telnet access to the router - need to understand how to create a script / chmod +x it, use vi, etc. basic linux stuff. if you find any improvement or better idea, let me know. so far, it works for me. FOR DEVICE MANAGER DEVICES THAT ARE IN THE ONLINE BRANCH, BUT ARE ACTUALLY OFFLINE -OR- DEVICES THAT ARE IN THE OFFLINE BRANCH, ARE ACTUALLY OFFLINE, BUT HAVE AN ACTIVE IP SHOWING (You are Unable to Delete Device/Device has Sticky IP): Create the script in the /tmp directory - chmod +x it and execute it: [email protected]:/# ./<name you called it> ex ./fixip This is the script: #!/bin/sh ### Get current LAN subnet - only first 3 octets chkip=$(config get lan_ipaddr | sed 's/\.[0-9]*$//') ### Cycle LAN subnet - un-comment echo to see if any error for i in $(seq 2 254) do #echo Are you really there $chkip.$i? ### Open ICMP socket to each IP on subnet & kill ping process in 1 microsecond ping -c1 -s1 -q $chkip.$i >/dev/null & usleep 1 && kill %1 2>/dev/null done Script is getting around limited ping version on the router by sending a quick ping to each ip .2-.254 and then turning around and immediately killing the ping process in 1 microsecond-- just enough time to open the socket and get an arp entry. With the sneaky loop the script takes only 1-2 seconds to complete as opposed to 37-45 minutes if you don't kill the ping process each time - learned that lesson, lol -- if you wanted to you could put it on a cron job every 6-10 minutes or so. I did not actually time how long it takes before the arp table resets back to only actual online devices and clears all the dummy 0x0 entries. It was about 5 minutes or so after execution of the script. if you un-commented the echo line to make sure its got the right subnet: [email protected]:/# ./fixip Are you really there Are you really there ... ... After the script runs (1-2 seconds) devices that were stuck online (but actually offline) will drop to the offline branch. Offline devices (really offline) in the offline branch that had IP addresses still stuck to them will clear out the ip, so you can leave them or delete them now without the "Error: Cannot delete, device is online" message. Now we still have one scenario that this does NOT fix. Devices that are online, have an actual IP attached to them, but STUCK in the Offline branch of device manager. Sneaky fix/cheat to that in next post ..
  6. i have a similar setup as well, but my 2 ex8000s are hardwired and in AP mode. my setup is like this from the XR500: xr port 1 --> switch1 --> ex8000(1) and xr port 2-->switch2-->ex8000(2) ---- one/same wifi name throughout (2.4 and 5) --- the extenders won't connect to each other this way, but you can end up with some sticky clients; for example you come home and your phone jumps on the XR500 wifi and then a little while later you are in the bedroom and it doesn't jump quickly and still hanging on to the xr wifi - eventually in time it will jump to the stronger ssid signal by itself (which drives device manager crazy) but it still works. you could just toggle wifi off/on on the wireless client (phone, laptop) and it will connect to the stronger AP. the reason why i have it like this is the ex8000s are tri-band but when not in AP mode one 5ghz band is reserved for the backhaul to the XR/router (THE BETTER ONE). You lose that 5ghz band/bandwidth. in AP wired mode the ex8000s 2.4, and BOTH 5Ghz bands are available to wireless clients as the ethernet cable is the backhaul. both switches used are managed and i only prioritize the uplink port to the xr on both switches at this time. it's not good practice to prioritize more than 1 or 2 switch ports as it starts to work against you. at a maximum maybe the uplink and a console connected to the same switch is more than enough. smart connect is enabled on the xr500 and both ex8000s (same wifi name) -- i tried this many different ways, with SC on, with it off, separate band names, etc. but i decided to leave it on and with same ssid -- because it does work most of the time, and it's easier then having 10 different wifis all over the place. the only negative from smart connect is that if you connect to the 5ghz band with the highest throughput sometimes after a while (could be a day, could be 2 days, sometimes never) smart connect will see you aren't using all that bandwidth and shift you over to the 2.4ghz band to save bandwidth for other clients. if this happens you just toggle wifi off/on and you connect back to the higher speed band. i noticed this once and a while as my laptop speed test would only be 65-95Mbps , but then toggling wifi and reconnecting-its back up to 550/600+ Mbps over the ex8000 5ghz wireless connection. if you can't run wires to the extenders, then make sure the xr500 is in the middle between both extenders (like a triangle) and you can do what Newfie suggested -- create a 5ghz guest network on the xr500 and name it 'backhaul' so your wireless clients won't know the password or connect to that ssid, but the ex8000s will. be sure to allow this guest network access to your local network (tick the box) and re-setup the ex8000s to connect to 'backhaul' ssid for internet. i have never tried this setup and my guess is it may open a whole new can of worms as now not only do you have to deal with the qos routing issues and other problems of the xr, but now throw a guest network -> local network into the mix which may make it even worse. plus i'm not 100% sure if the guest network steals bandwidth from the main xr500 5ghz radio or if its a slower speed radio then the main 5ghz (have to check that). you could also make sure you follow a proper reboot cycle, make sure the xr boots first with both extenders powered down, and then after xr is fully booted and up - then turn on the extenders one at a time (let the first one fully boot and come up before turning on the 2nd). verify in each extenders admin "connected devices" that they aren't connected to each other. you can also lock down the backhaul wifi channel to the xr so its the same as the xr radio so it doesn't look to the other extender transmit channel if you want to take advantage of the one wifi name. in theory they should NOT ever connect to each other as they identify themselves with a "-R" device name - so one extender should not accept a backhaul connection from another one with a -R as the device name. they should use ip to hijack/be aware of each other but again, the firmwares have to be aware and actually work -- ex8000 has bugs too, xr500 has bugs, lol. this might cause a problem if guest network is used as i think it may use a different ip range then the xr too, might not be able to pull up the extender admin page which lists multiple connected extenders when using guest - never tested it, so have no idea. that's why i just settled on using them as wired APs , seems to be the least problems. netgear really needs to gear more towards ap/extender communication in their respective firmwares so they can automatically inform clients to move if needed, but i doubt we will see this soon (but one can hope) as many standards need to be followed and enforced (not only on the router and the extenders, but also the clients) and there's no settings in the router that i see to limit rssi connections to a certain level or lowest connect speed (Mbps) allowed on one band before kicking the client in the admin setups of the xr or the extenders -- it's just not that advanced - although they are both very well capable of doing this - if they spent the time...
  7. yeah, it's a good router with qos off for me - it's doing what a router does and it stays up - and routes, so I really don't think it's the kernel, but something is not working as well as it should when qos is enabled. it could very well be effecting other modules because i don't use geofilter or any of that for quite a while. i spent a lot of time trying to track it down in .40, watching conntrack and opening connections to see what its doing - certain apps/services that open connections that get closed with qos on work just fine with the qos off. It still tries to use the marks its made but it (NOP) them so no operation. They are ignored. I really like the analytic of qos with the breakdowns of what the traffic is - gaming, web, social media, etc. but I can't use it :( I don't really need qos or the bandwidth shaping because my connection is fast but without it I lose that analytic functionality. as for apps or services that are effected by this.. it works like this.. application goes out to an internet server somewhere, so a connection is marked and tracked in the kernel, source destination, etc..(nat /netfilter) so when traffic comes back into local network its marked as allowed (firewall) and knows the proper destination (internal ip). each connection gets its own entry (there are hundreds being created all the time and closed as they are no longer needed more so the larger your network is) and each connection has a countdown timer of life 5, 15 minutes, 30 minutes, even less.. whatever. but when qos is on some information gets lost, or incorrectly added/marked so the kernel is thinking "what is this ACK from loc2net or net2loc" and it may close it in a cleanup/housekeeping or try to close it because it thinks its a security issue like an unauthorized incoming connection from the outside world to the internal network. some devices like business vpn cause this, maybe some games,don't know .. some of those iot devices that need to reach out to aws and have a connection come back from aws, etc. i see the connections opened, they appear marked correctly, but then its like one direction either net2loc to loc2net (loc=local network, net=internet) it does not identify one as an allowable connection and it just chops it when qos is on... causing things to not maintain an active connection (one sided, internet server no longer hears an ACK from the internal device). hard to explain .. i am no conntrack guru but i know what its doing and i can see which connection should be allowed, but no idea why it is deemed done/finished (to be closed), or security risk (chopped).. but whatever qos is doing or some other module is making the kernel just think its not a kosher connection, then click. vpn will disconnect, remote desktop over vpn will disconnect, loss of connection to outgoing server, etc. (security camera may not work, or claim its not reachable from the internet, etc) it's a certain type of back and forth _SYN,ACK that picks up some misleading data via qos. i get the feeling that some netgear tracking, or netgear legacy kernal code may still be running somewhere and when qos is activated it just doesn't gel well. i'm not so sure they can replicate it, so it may never be squashed? as for the device manager thing, not really sure if that's a netgear issue or duma issue, or maybe when they combine it comes out. I do know then if you try to telnet to an ip that is listed on device manager that is shown as online (and it is offline) you will get a No route to host error (from the router shell), and then it will drop off the online list (ip will appear in arp table as 0x0). some devices that are offline (if you click them) will show an ip address attached to them.. and they may be actually online (but sometimes they could be offline, but the ip is sticking in device manager) a telnet will clear the ip after a few seconds then you can delete the device (like in a situation where it says 'error: cannot delete device, because it is online' -- but the device is NOT online) right after you get the No route to host, its clear and can be deleted from device manager. i know a reboot fixes these things, but in time they return -- besides you know i am a no reboot guy.. i've had routers up for years in some cases, and still performing like champs maybe have some special script running on the router that sends ARPING (use --settings to limit to 1 second wait time, exit after 1 reply) to each ip .2-.254 so in 254 seconds or so device manager will clear itself (at least more then it does right now, but probably not 100%). only suggesting arping because its quick to come back if there is no response.. telnet sometimes takes 3-4 seconds before it reports No route to host. maybe there's something better to use? some custom script or small app needs to be written if bug can't be found - but its a really crappy workaround, need to figure out what is causing it. duma has many scripts running to watch devices with many different methods that run every X seconds (arpwatch, ipneigh, wlan stainfo whatever) but for some reason one of the tables isn't being updated properly - when the kernel itself actually tries to reach out to that ip then it knows No route to host. . when devices shift from the internal router wifi (ath0,1) to an AP wifi (wired AP) (with the same name) it will move from the 2.4ghz or 5ghz tree into the offline section (but its still online) and lan/wan qos update has been applied - sometimes it moves to the LAN/online (like it should) but then sometimes it decides to go to the offline branch (even tho its online) no rhyme or reason. device manager also doesn't like SSID's with a '\' in them. give that a try it willl turn it into \\\ in device manager, but netgear setup does not. that probably has to do with their whole config get and config set database implementation, i went through all that and saw that its limited in storage length for fields, so they had to spread out a ton of device settings over many entries, ugg. i know if a dev reads this they will know what i mean. ps. netgear stores the SSID with a \ in one config setting and with \\\ in another -- its a easy fix but low priority I guess, just change which setting field its pulling SSID name to the correct one in device manager script. probably some weird fix they did to make it show right, or a patch in the www html unfortunately a lot of the netgear base code original creators are no longer available to comment and the firmware has been shipped off to be updated in taiwan as i understand it -- they are good but i don't think they are making any major fixes or any kind of deep down work like this -- just bandaid and move on. it's the same base code on all of the routers it seems, theres orbi stuff on the xr, etc. i wonder if the new AX are also using it? probably. it's hard to maintain so many routers i know but they are all using some version of this base hodgepodge and tweaking it for each new hardware. i thought all the time they were taking (8) months they were really re-doing it all from the bottom up, but it wasn't the case - its late and im rambling, hope you guys figure it out. or a workaround ..and after you do, let me know as i'm interested in what it turned out to be...
  8. just a little update - had some time to go back and mess with it. little over 5 days uptime .. qos must be off for me as it effects certain net functionality in a negative way. this is unfortunate because the only thing I like about the qos is the analytics of traffic. it just does not mark certain contracked connections properly and the os closes them prematurely. they conflict with the firewall net2loc loc2net..usually as soon as it sees traffic or within 30 seconds. when qos is not running this does not occur. i'm not running all games but i'm sure certain game connections may also get marked wrong and then get the chop from the firewall. got upnp working.. just had to hit Apply on the empty UPNP screen and it resets the listing file (if frozen in time): just so you can see this was the file before the apply: -rw-r----- 1 root root 0 Dec 31 1969 upnp_pmlist and after: -rw-r----- 1 root root 0 Aug 21 19:35 upnp_pmlist and to prove it's working now: log-message:454457:miniupnpd[4288]: received signal 15, good-bye //miniupnp close log-message:454460:miniupnpd[17447]: listening on xxx.xxx.xxx.xxx:5555 //miniupnp back up and it is working (forced a upnp request): log-message:454776:miniupnpd[17447]: [UPnP set event: add_nat_rule] from source xxx.xxx.xxx.xxx, log-message:454776:miniupnpd[17447]: [UPnP set event: add_nat_rule] from source xxx.xxx.xxx.xxx, and rule was created to ip addr as ACCEPT in iptables. Just a note to people using upnp - ports are opened on device/app request, and will remain open until requestor asks to close. Many requestors never ask to close their ports, so just turn off upnp, apply, turn on upnp and it will remove any open (stale) ports by poorly coded requestors. I tested this and it works. right after miniupnp restarted ports were no longer in iptables list (closed) - no reboots are required --> this file contains so much more information that can be displayed on the web interface -- name of device, service name, etc. but they choose just not to show it. same for the wifi stats. they have connection speeds of each device, idle times, etc. i don't see much difference from .40 in this update, dnsmasq changes, a couple of zombie processes from bootup still stick (detcable, check_status.sh (streamboost checker). net-scan (The attached devices demo is Running...) they are early processes during the bootup. device manager still gets confused and sends devices offline when they still have an active ip. this does effect things as other modules rely on this information to be accurate. i also saw a post about (STALE) fe80: failure to parse.. I also saw this error once, and I do not have ipv6 enabled. it was just due to bad timing of disabling qos before one of the databases updated. reactivating qos, allowing everything to settle, and clearing stuck device, then disabling qos cleared it. if a device is shifted offline and is actually still online, an update to wan/lan qos is issued - over time hell breaks loose so due to qos bad marking and some other flaws, devices can be brought out of qos managements awareness and override bandwidth restrictions., etc. i keep it disabled until figure out a better way to monitor devices .. maybe in the next update so far stable but without all features
  9. btw, i traced the dhcp requestor .. it was a galaxy s10+ phone (authorized), i don't think it caused the wlan stainfo lockup, i think it happened prior to the dhcp request.. i've never ever seen the interface drag on the xr500 like that, but i remember reading reports of it on the xr700 -- i bet it's connected.
  10. LOL. new firmware always has me curious to see what was done .. but getting tired of messing with it TBH - wish it would just work right. But Nope. No factory reset - I rather figure out what's wrong and fix it - it's never been reset - I had .40 up and running for 220-250+ days (forgot the #) but I made some changes to it. Haven't really looked if anything else was updated, but the kernel was recompiled on Jul 16: Linux XR500 3.4.103 #1 SMP Tue Jul 16 08:25:16 EDT 2019 armv7l unknown
  11. Well, I fixed it. I refuse to reboot the router. wlan stainfo process was the culprit -- it was jumping between 46-50+% of cpu and cpu meters were spiked at 100%, causing the admin interface to lockup on Qos, Device Manager, and Network Monitor. 794 root R 268 313 46.8 0.0 wlan There were two wlan stainfo processes running, I killed offending 794 process and all went back to normal. Firmware is now showing .56 - admin interface quick again. It was a young process (low #) so it started during initial boot 24 hours ago.. It's possible the admin interface slow down started after a faulty dhcp request: /bin/sh -c /dumaos/apps/system/com.netdumasoftware.devicemanager/dhcp-event.lua DHCP-REQUEST 'XX:XX:XX:XX:XX:XX' 'android-dhcp-9' that hung for quite a while. Unless it started right before the request - not sure. I caught the mac address and IP of the requestor, but it disappeared from the network. Will start running traces to figure out who/what it was. Not 100% sure it wasn't an intruder at this point. Very strange firmware version was reporting .40 and then right after my post it's now .56 - weird coincidence? Also note aws script running to check in to aws servers but I thought I disabled all iot - still no where to enable privacy or NO reporting back to netgear in the interface for regular users. Can we have an option to disable all ET phone homes?
  12. Something new: Admin interface is now refusing the load properly "took too much time" message. Some Dashboard windows spin and spin, and Device Manager sub-section will not load. Spins and Spins "This operation took longer than expected. Please briefly wait before using this R-app." Never saw this in .40 and it was working fine for at least 22 hours. admin interface seems to have slowed to a crawl. I guess it's been tested well.
  13. Almost a year and hardly anything has been fixed. That's a shame. Tested this "new" firmware for a day... so far: Device Manager is still broken. After some time devices still shift from Online to Offline (but remain online) or vice versa. Happens to devices connected through a wireless AP (EX8000). If device connects directly to the XR500 wireless and then shifts at some point to an AP wireless this occurs. Also happens to devices connected through a switch but takes longer to start happening. QOS is still broken. Turned it back to off state -- after a while it still starts closing active connections it does not Mark properly. Especially for VPN (like Dell SonicWALL or Cisco AnyConnect) and probably many more iot devices, etc. and the XBOX UPNP list now empty. At least it had some information in prior version. System Information window still shows V2.3.2.40 as the current firmware version. No update available when checking in Settings, so it's on .56 - such a quick job..all this time and that isn't even updated? Installed R-Apps window still lists Hybrid VPN R-app size as 0.0 undefinedB dhcp.leases still showed some 1970 expiring leases, but after re-request of IP they repaired themselves (I believe this may be the only thing that was fixed that I reported) IMO don't have high hopes - nothing new on main NetDuma modules? It may be an under the hood security update, or just a bluff to calm the angry masses -- or possibly just the existing hotfix firmware re-branded since the firmware version didn't even update - but all the problems everyone is/was having will still occur. That's for sure. I only read the first few pages of posts and bugs people are complaining about I reported right after .40 dropped ages ago. Still being addressed as 'first time seen' bugs. Losing hope guys. I still want to believe but having trouble, please let us know when all bugs will be addressed and fixed?
  14. hey! just wanted to check in. I'm still here and I do pop in from time to time, but the time between times has gone long in the tooth ..everything said was spot on. Still have all the issues I discussed in the past, and I even sent them a huge list of things I noticed to look at -- but instead of working on it they moved on to the next milestone rather then fixing 1.3 issues. I have a feeling the new milestone will come out introducing a ton more bugs and we will always be in a bugged state. NetDuma has a small team and only a handful of that team who have the skill to develop and Netgear keeps pushing new models to flavorize with DumaOS -- you can bet on it AX is taking all the time right now and everything else is just back burning... plus the fact that Netgears Taiwan developers really can't work with NetDuma OS and introduce fixes (xr700) that create more problems (they even said they didn't know netgear released an update that hosed new things in netduma for xr700 in a prior post). So Netduma has moved on to new features, Netgear is fixing things they don't know how to and at the same time overloading Netduma team to flavorize their existing hardware routers and re-brand them instead of introducing something new. So now we have an R7000 duma os?? uhh, who's even buying that. Yes, devices still hang in online and offline state in Device Manager, port numbering debacle, lack of updated modules and security fixes, buggy QoS, DNS sync problems, I can go on and on.. I know how to fix these without rebooting or flashing firmware (I haven't re-flashed it once since I owned it) but you have to correct these errors in terminal mode. I chuckle every time I see a new post from someone asking about devices or not being able to connect or internet dropping, or why doesn't my XXX device connect, and response posts are always what firmware are you on, reboot, reflash, do this do that, when it ain't gonna do a darn thing, because it needs to be fixed OS side. I do have a 100+ day uptime right now, but QoS is disabled - so I'm not really getting any benefit from this OS. And like others, soon going to be moving on to other solutions since I can't take any real advantage of what this offers in this bugged state, and not going backwards to XR700 or AXwhatever it is.. because Netgears new AX is coming out soon (the 12 streams). And I'm starting to evaluate competitor routers. Actually, the 12 stream has been ready even before the AX6 was released. It's going to Orbi also. So anyone who has that AX6 is already outdated, oh well. They have the hardware ready and do targeted releases to bring in the most money (yeah any company would do this) but instead of releasing the best and making it the one and only..... every flavor needs to be milked first. well it's been long enough now (over 4+ months) so I guess I can release one more issue now.. The Data Analytics process has been enabled by default since day 1 sending data back to Netgear. Check your web console Settings, Administration, Firmware update -- You see the option to not automatically update firmware (a plus) -- but where's the data analytics option? oh. hmm. ok. I disabled the process manually. I did not find a way to disable it from the original firmware release until I looked again today, but the latest firmware may have been updated to do this on install, I did not check. So I can't be sure. But I found this in there, and you can check if yours is on.. Go to to check -- or at least make sure it isn't on by forcing it off. Just to let you know I was right that Netgear has one base core they use for all their routers and tweak for each, here's an example TOS htttp:// (but it's for orbi, lol) There is tons of things still in there from other routers (streamboost things, etc that are conflicting with Duma) cron jobs that are running, etc to update streamboost when the router doesn't use streamboost. Probably much more that I didn't see with the old modules, etc. These things are contributing to the problems people are having, and Netgear is focusing on the next router rather then fixing things in the firewall that are blocking peoples connections via QoS. The firewall is really complex this why it prevents most of the attacks even with the older modules, but it's got its quirks that piss off Device Manager and block some local network devices from making certain types of connections, only causing customer complaints. Maybe this time it's been so long because they are going to actually fix everything, but new milestone issues will just probably give us all a do over , if that happens, sorry to say - I'm out.
  15. lol, reported this since day 1 of .40 firmware, also sent all my findings to NetDumaAdmin. Hopefully not falling on deaf ears? bug in duma os. network monitor not effected
  16. seen this from time to time from google dns also, mostly happens on router first boot. mis-identified dos attack
  17. i would shoot for having it on if everything is stable, as everything works better when its on (unless its not working and causing you issues of course) if you turn it off then you lose the ability to see what the breakdown of the traffic is in network monitor area (which only seems to show the top 5 consumers at any one time, if a device is not there and you have more then 5 devices on your network - this doesn't mean its not working, and doesn't see the device, it just isn't consuming enough to be in the top 5) when qos is on you can click the upload or download bars on the device in network monitor and it will open a breakdown to the right of it of what it thinks the traffic is (games, media, web, etc) and if you go one step further and click on the color on the circle it will show additional breakdowns. if you click the upload and download bar on total usage line (first on the list) -overall usage- you can see breakdowns like what apps overall are consuming when you click the color on the circle that you want to see (twitch, bittorrent, SSL traffic, apple, microsoft, ssl, media streaming, youtube, netflix, etc) when its off, this info will just be unknown traffic. qos tries to prioritize whats more important and should go before someone else , in a perfect world it works. it classifys and then marks certain connections with priorities and if the connection for example say is thought to be offline in another module, for whatever reason (a monkey) you know we have an issue here , sometimes these classifications stick, even through a reboot and some chaos starts, combined with the dhcp thing, etc it could be a mess quick.. you wanna know why nobody else is doing what they are doing - this because its a real pain in the ass and pretty difficult to get it to work for everyone and every network scenario, but credit goes where credit is due they are the only one this close to perfecting it upnp is another animal, netgear always had issues with it in one form or another (reporting wise), upnp is supposed to open ports for you automatically so you don't have to do the port forwarding. but the requesting application is thought to be trusted if on the local LAN so it goes ahead and opens the port in the iptables firewall on the router.. and after a certain time its supposed to remove it, just because it shows there its most likely to be closed if after the limit..as other things on the kernel take care open connections that should not be lingering. its the upnp daemon netgear chooses just not updating its upnp_pmlist file (whats reflected on the admin interface), and sometimes it doesn't update it on creation either. there's a lot more information inside this file that they could show you like what app requested the upnp but they just don't show it (like Toredo, Skype, identd, etc) its in there, they are just not showing it on the interface (keeping it simple for the user) another issue is the version of upnp they are using to do all this is possibly custom modified over the years (at least we hope!) and may not have all the recent changes/fixes - future versions fixed issues of removing items from this list (just look at the changelog for miniupnpd) they did have a fix for closing the ports on the RFC on the list, and in addition to that... 2018/05/02: option to store remaining time in leasefile which could be useful so you can restart miniunpd or reboot and not lose active upnp ports if they should still be open and devices were never switched off during the reboot, also could help in tracking these and fixing the list with a watcher if a monkey does jump in but this requires netgear to update its daemons and modules, something they don't seem to want to do and nobody ever has given answer to that question as to why?... but rest assured just because its not reporting correctly it is opening/closing the ports as it should because things wouldn't work at all if it wasn't. the kernel has many NAT helpers built into it that do a lot of the work of upnp and bypassing upnp it seems, because I see them being managed through a different way.. certain apps like Skype will show because its old school requesting..or does not have a helper. some apps are only going to request a port once and not again until at least the time limit expires, but if you close an app and reopen it will request again and show on the list (or reboot the xbox so it requests them again) if anything ever jumps out at me i will report of course, and i am no expert in this, just dabbling and like to know how these things are working (and when they don't)
  18. what you are doing with that procedure is by hand what the kernel needs to be doing during startup (it is not a fix), there are some timing issues during boot introducing some monkeys, but at the same time even if you get a stable boot these issues can recur if the right conditions are met (miscommunication between netgears kernel and duma) another guy who throws a wrench in everything is udhcpc daemon who is in charge of getting the WAN IP dhcp lease (public IP lease) from your ISP - each time the lease expires he goes through a whole sequence of events all over again just in case the IP has not been changed by your ISP to ensure that QoS gets reapplied to the correct public IP (same public ip? different public ip?) the scripts he's calling can do some things behind dumas back and may introduce a monkey there too 🙈 since duma needs the data its working with to be accurate for everything to come together -- some people may be just be doing fine for a day or two or three (so how long is your public IP lease? ->Settings, Monitoring, Connection Status --> Lease expires time?) do problems coincide with that timing? things to watch ..
  19. have you tried just disabling QoS and just leave it off prior to/and after the fresh reboot? (just disable QoS, reboot, leave it off) just to see if it works better? turning it on and off when its been up for a while creates some corruption if something is not right gets in there. i know this because of changing some things behind duma os back and forcing/sending dummy DHCP events to device manager while trying to figure out exactly what its doing (which script triggers, etc) after an event to pinpoint the failure.. at one point I was unable to turn off QoS fully since I had removed rules that it was looking for to delete when being turned off, lol. I have yet to factory reset once since getting this router though.. I always have been able to restore it to a very stable working state manually. rebooting the way i said is probably the cleanest reboot you are going to get, but yes whatever is triggering it may pop up again and throw everything out of wack (monkey wrench in the machine) the problem is the constant rebooting everyone is doing without that method creates more problems on each reboot - you never know what you're going to get, especially if some IPs get locked up on a reboot (ones that you've assigned statically) so if the static device requests an IP it can't get its permanently assigned one back so it may get a different one or just decide to up and quit and settle with a 169 non usable ip .. and then as time goes on it just gets more and more out of sync with other settings (qos, port forwards you may have set, etc) i don't think that it is unfixable, it just needs to be fixed. its the illusion that the lan port isn't working, but it is, just the device is not i have faith they will do it. they got some smart guys there, and what they are doing is not an easy task.. i give mad props to the cross.. it's advanced traffic shaping for dummies on the interface, but deep down so much stuff is going on behind the scenes and things need to be in order for it all to come together and one link in the chain fails.. its a cascade of small failures, just a nasty bug that i'm sure they will deal with it. something has to give with netgear
  20. it is not a setting you can change with the admin interface. as most know the router is running on linux, so you would have to have at least a pretty good understanding of linux and make changes to certain startup scripts to change the cpu setting on the current firmware during boot. i would not recommend doing this unless you really know what you are doing and ok with the possibility of messing something up badly so i am not going to give the instructions but only these hints that is can be done and you will have to do the research and learning part. that is not the only modification that can be done to boost performance - from reading all the voxel posts available i learned a lot, and after that study in addition you could change some compiler settings when building the base router kernel from the ones netgear uses and it will speed it up even more.. but keep in mind this would have to be done by netgear, and netgear plays it safe and conservative with their kernel it appears. setting the performance mode raises the cpu temp by about 7 degrees -- so it is not a hardware limitation..its a risk they don't want to take - although people are using these settings without any issue (as 3-4c more isn't that much, and its not in the realm of overclocking yet as its forcing the cpu's to use its full spec - all the time, if you start pushing it to 2.0ghz+ now running into danger zone) but as they say .. no risk ... no glory (some are getting 80/85+mbps/sec over the router based VPN on the R7800)
  21. for peace of mind (even though you may not be getting DoS attacked, its probably something else) you may need to request a new public IP from your isp. The front line will not have a clue so you will have to escalate it to level 2 or 3 .. it really depends on if they lock your public IP to your cable modem or ONT's mac addr. if they do they are the only ones who can release it, or just return the cable modem and get a new one that has a different mac addr, tell them its broken and you want a new modem. if you are sure they don't lock this, you need to check the public WAN ip lease time on Duma -- in Settings, Monitoring, Connection Status -- take note of Lease Expires time.. when it gets close (within minutes) turn off your XR and turn off your Cable modem.. now wait.. how long you wait after that really depends on if some new customer joins the network and gets your old public IP and even waiting a few minutes is not enough - it may re-request the one it was using and if available, get it back. some recommended turning both off for a whole day. try to make sure your off time is during business hours when new customers are being added to the network or new modems are being activated. You may or may not luck out , its chance gamble .. technically nobody could be added that day and you just get your old public IP back, lol.
  22. yeah, i believe they are aware - but i think not only are there some quirks in device manager (both views) and QoS match tracking which need to be addressed, its netgears base firmware causing some issues to boot the issue is very apparent when using wifi extenders. it seems to work for the most part correctly when the wifi extender is in wifi 'extender' mode. client connects to wifi extender and gets a virtual mac addr, then gets allowed onto the local XR wifi. device manager is good at picking up this virutal mac and merging it with the ip of the actual device (so it knows say iphone X is ab:cc:dd:xx:xx:xx and also cc:xx:dd:xx:xx:xx with IP 192.168.1.x) -- but when extender is in AP mode there are no virtual macs assigned when connected to its wifi, and backhaul is ethernet so its just a pass thru the local LAN (mac addr is true) depending on first connection time and state, this is the issue. if device first connects to an AP, its considered on Local LAN (since pass thru to ethernet) if it jumps off AP wifi and goes to true XR wifi then there's another problem since device manager holds last location (LAN or Wifi) and it will just stay on LAN as online and won't move to wifi side, sometimes marking gets stalled since nothing is triggered until a dhcp event and there are no dhcp events if the client is still in lease time, it won't ask for a new ip. this is one of many scenarios that can start a sh*t storm since device manager can't assume device once on wifi, always on wifi with the same mac addr -- issue that's effecting here is also effecting qos tracking module, if it can't know the device is online and active how can it mark it? it doesn't, so old info may still be in there with prior marks and wrong ip (if it switched) connections get dropped (connection tracking) and it lingers with a reboot and only gets worse after each reboot as the dhcp.leases files gets more and more out of sync along with dumas tracking of them so to say. sometimes if the client leaves the network from wifi, but was thought to be on LAN the ip isn't being cleared by device manager so theres your "Error cannot delete device because its online" - its odd though that table view knows the ip isn't there sometimes (show's N/A) but Tree will still show the IP as online even when offline. now xbox sleeping and waking over and over and over would work in a perfect world but if anything gets some bad info it causes the lost connectivity issue. i'm pretty sure xbox sleep mode (instant on) has no issues on R7800, there is still some legacy things interfering with duma os they have not closed it all down yet. device manager does trigger some events on dhcp renew and other modules also rely on the same information gathering method, so if its wrong, you know problems happen, some may not materialize to state where a user can notice and bitch about, but clearly you see, some are the startup sequence (bootup) causes issues in larger networks where multiple devices have different lease expirations.. and if the lease time is hosed in the XR (for whatever startup reason) it can cause problems because once clients notice a connection is available it will request a renew, or just start thinking hey im ok - im in lease time, and start working as normal. XR kernel has not yet determined its mac addr (and may not know the true ntp time yet so ntp should be set immediately after WAN is up) so it will assign all 00's (incomplete) to that IP it sees data coming from or requested dhcp from, and when the router actually is up and running the device may hit a renew period or request again and oh no that IP is in use, give me a new one, so the router will assign a new IP address to the client, but qos markings are for the old IP - another sh*t storm, and if the IP has a weird lease time of like 1969 or 1970 forget it, its ip will never ever release for use some way the router needs to fully boot and and acquire WAN ip and internet access, duma os needs to fully start (with empty databases and tracking) and then LAN needs to start and then Wifi radio on. at different points of the startup process these things are seen for a split second as active ( and devices think everything is ok and either request dhcp or mantain what they had before the reboot then interface may get moved to br0 interface but the damage has been done to the files and databases (bad data in there) it may only effect certain devices with just bad luck timing so that's why the more devices the more chance it has to occur.. and if a device actually tries to renew for some it may not get an ip and just give up and set itself to 169.xx.xx.xx - sometimes it works after a reboot so people think its fixed, but its not just had a lucky start up and could occur again yeah it will result is longer boot up times before everything is started, but so be it, just let people know it may be 2,3+ minutes before internet is available on reboot, if you imagine a boot up with multiple APs (which may have 10 clients behind them) all assaulting the dhcp server as soon as it sees some activity from the gateway and certain things are not yet known to the kernel or duma theres other things too that are going on causing issue, its probably best if duma just takes control of the kernel if they just can't get it to work, too many chefs in the kitchen xr700 is just one more complication since the 10G port and the ability to aggregate or use 10G port as uplink or WAN - duma bugs need to addressed, and netgear startup needs to be optimized for all scenarios .. its hard to fix bugs until netgear startup is working optimally, lol. catch .22 .. especially if work arounds were applied for prior bugs that were corrected but now after being corrected cause new bugs, a real hair puller
  23. It's a firmware bug. Clients are jumping onto the router before router dhcp server for local LAN is up and running (this includes wifi clients) I believe it's the sequence of startup for some - with different people experiencing different things depending on ISP, LAN setup, etc. If you have the router up and running with the latest firmware, you can attempt to do a nice reboot but prevent all clients from jumping on.. Turn off the Wifi (to prevent wifi clients from jumping on) It has to be done from a LAN client. Turn off router Wifi. Issue a router reboot. As soon as you issue the reboot, wait a few seconds and disconnect all cables from behind router (WAN, LAN ethernets) Wifi should be off prior. Router reboots .. Wait until it's fully booted Plug in the WAN cable and wait until it's got Internet. Wait a tiny bit after it has Internet, then plug in your local network ethernets, then turn wifi back on This is just one issue.. it can cause weird things like bad dhcpd epoch times (since dhcpd running before actual time known) so IP's get locked to certain macs and they never ever expire, so when the device IP ever changes devices may be associated with multiple IPs in duma databases, and if another device happens to reconnect without renewing its lease (with it's old IP) dhcpd may gave it out to someone else who requests -- so now device manager has two different ip, but thinks it should be for a different mac addr for others - WAN dhcp (udhcpc) is up, but local dhcp (dhcpd) is not (yet), so client requests dhcp and gets an error (since already assigned) so defaults to 169.xx and client is locked there, no ip what needs to be done is run through all startup scripts on the router and make sure you hold back local LAN and wifi until WAN is fully up and ntp time is set then start dhcpd, sleep, then start lan and then start wifi radio and make sure to clear any saved information in device manager databases prior to reboot so it still doesn't fix bugs like sticky IPs based on last connection type for device manager, this is when last connection type was wifi for mac xx:xx:xx:xx:xx:xx and then becomes LAN for same mac. device may stick on either side or be online or offline, but it may make your network stable enough to use for the time being, when qos devices get marked and are not thought to be online, its a problem You can also turn QoS completely off in settings next to Anti-Bufferbloat (disable QoS), do the reboot sequence above and leave it like that. if all is good for a day or two (so any old info expires), then turn QoS back on and see if everything behaves - there is no way to set dhcp lease time in XR500 so its set to 1 day by default, although there is no reason why this can't be changed manually - to really know everything is working nice is to take a look at the dhcp.leases file and make sure it looks good, and not any strange epoch times, but most won't do this unfortunately until its fixed it will continue to some degree over time on those with effected configurations , and i believe it is part of the reason why xbox is having trouble, since it sleeps and wakes and sleeps and wakes, and if at any point something goes wrong in this process and it gets marked for QoS when its online but thought offline it can't maintain connection to its authentication servers because QoS KILLS flows that it doesn't know about in TIME_OUT state and blocks ACK (there's more to it then that, but just an idea), its watching conntrack and theres' a bug there too -- I believe they will fix it, but when -?- don't know -- maybe soon!
  24. already notified about the DNS and RTC leak. TCP isn't going to help. It's problematic for all VPN providers. It needs a modification to tunnel VPN DNS into the tun0 device ExpressVPN does push preferred 10.x.x.x DNS server upon connection - Duma ignores - uses preferred or automatic WAN DNS for resolution. speed can be increased by setting the processor to performance mode which netgear doesn't do (ondemand) -- you can do it yourself. ExpressVPN doesn't give tcp configs for download apparently, but just for the curious: mods to config in BOLD (make sure to uncomment with a # where needed): proto tcp-client dev tun #fast-io persist-key persist-tun nobind remote (vpn server you want to use.com) 443 remote-random pull comp-lzo no tls-client verify-x509-name Server name-prefix ns-cert-type server key-direction 1 route-method exe route-delay 2 tun-mtu 1500 #fragment 1300 mssfix 1450 verb 3 cipher AES-256-CBC keysize 256 auth SHA512 sndbuf 524288 rcvbuf 524288 auth-user-pass <<add your cert, etc >> if you get an error on any line in logs, comment it out. if it connects, tcp connection is successful
  25. yes i guess you could say that, but not in routers there is a lot going on and i am by no means a master but getting a better understanding of whats being done with the advanced traffic shaping each time i follow another path putting the puzzle pieces together. there are def. some issue with the local lan <> internet <> lan <> iot when qos is on, its not all devices, i guess it depends on the connection, usually connections that go out to a server and the server comes back in on a different port or to a diff local lan device - device manager issues are only adding to it, that's a different story. something to do with when the connection is marked, and then OUTPUT and INPUT on the firewall side is dropping, maybe misidentify as DoS, dunno, its not blocking everything but blocking one important reply (ack? fin?) from coming back in from the internet to keep the connection alive , could be happening during prerouting - so the connection is just counting down 2 minutes and timing out/closing im surprised your not getting a lot of calls from people with arlo and xr500, because its effected the way arlo works is arlo base -> <local lan> -> router -> <internet> -> aws <ESTABLISHED> and the !reverse ex. say you open arlo app on phone that's on the SAME local lan ... local lan -> router -> <internet> -> aws -> <internet> ->router ->>local lan-> base station and then its screwed on the !reverse all going both directions thru wan - (u have an arlo there? i think you should be able to reproduce it and maybe track down what is being dropped by qos) sometimes connection marked before firewall sees it come back into the base station, it can't connmark packets after the 1st one (when its NEW) ,so arlo may or may not connect or after a lucky connection -- it loses the mark -- the 2 minute countdown starts and instead of renewing you will see "your arlo appears offline." when it is online. All works well when QOS is disabled. so qos is influencing something no doubt something either the input/output is not letting ack back through and connection times out,. it could be effecting authentication servers on iphone , i know it effected port 4433 sonicwall vpns i sent a PM to Fraser the other day with some more things, pls read my other thread i gave some thoughts to look at (more device manager then qos) i will let you know if i find out what it is exactly , can't spend all time looking for it when it can be a simple ! instead of an and, or OR .. it can drive anyone nuts I'll forward it to you..
  • Create New...