Troubleshooting a router’s Monday morning sickness

Q: I have a 40-user network that has Monday-morning sickness. Every couple of weeks, most often on a Monday morning, all the computers will lose connectivity to the eight-portLinksysrouter and the three servers. When we try to refresh the IP addresses on the workstations we are not able to get a DHCP address.

The only workaround is to power cycle EVERY network device: cable modem, 8-port Linksys router, two 24-port ASUS 10/100 switches, three 4-port switches (used to expand the network without running additional cable drops), three servers and three network printers. The only DHCP server on the network that we are aware of is the Linksys router. Re-booting all these machines is very time consuming — 30 to 40 employees at about $20 per hour each. A half-hour to reboot all the machines on Monday morning is $300-$400 in lost productivity. Is Monday morning network sickness a common problem? What plan should I follow to find the source of the problem?

Also, we want wireless connectivity on one side of the building so I am planning to add another router for about 12 users. My thinking is that this might segment the network and may narrow down the source of the problem. Is this a good start?— GLF

A: Having problems on a Monday morning with the regularity you describe definitely is not normal. The good news is that this degree of regularity help diagnose the problem.

Based on what you have described, I would initially focus on the Linksys router. For the size network you have, I would definitely recommend having the DHCP service be provided by one of the servers and not the router. It is fine for providing DHCP for a small office or home network but it has limits on being able to serve up some of the advanced DHCP functions you’d want as your network grows.

The default IP address pool on the Linksys routers I have seen is somewhere in the 40 to 50 IP address size. Depending on the revision of the firmware in your router, you might be able to see the number of devices that have currently been assigned an address by the DHCP server in the router. Look at the size of the DHCP address pool and compare this against the listing of the assigned IP addresses. This should tell you quickly if the pool is being exhausted.

If you have experience in using a packet sniffer such as Wireshark or one of the commercially available packages, use it when turning on a workstation when things are running normally and then try it again when the connectivity problem shows up. Look at the two trace files to compare the two states. That should also help you concentrate your efforts on where the problem is.

Since all the computers seem to be losing connectivity around the same time, it could be related to the Linksys router getting swamped and unable to keep up with so many simultaneous DHCP requests. Based on what you find, the short-term solution could be to increase the size of the IP address pool. When you restart the router, it clears the address pool and all the devices now stand a chance at getting an IP address.

Network World Canada

For breaking networking and communications news, visit Network World Canada

Since you are running in a switch environment, you will need to either use a hub (not a switch) or an Ethernet tap such as the Barracuda Tap or Netoptics 10/100 Teeny Tap (both of which are reasonably priced and readily available). Put it between the network and a computer so that you can see the traffic you are looking for. When the problem shows up again, try manually assigning an IP address, subnet mask and default gateway to a computer that shows the connectivity problem. Use an address that is outside the range being assigned by the Linksys router. For the DNS server, use the IP address of the Linksys router.

If you can access the Internet and other services on your network, this would seem to confirm that the Linksys router is where you need to look. The next time you have this problem, try restarting just the Linksys router and see if you can then acquire an IP address. If you get success from either restarting the Linksys router or by manually assigning an IP address, try increasing the size of the IP address pool assigned by the Linksys router.

While you are going through the troubleshooting process, make sure that your Linksys router is using the latest firmware. Although I don’t believe old firmware is causing the problem, it can’t hurt. Also, make sure that you have some type of surge protector plugged into the power supply for the Linksys router and the the network switches that you are using, because there’s a chance an electrical power surge might also be a contributing factor.

Also, the next time the problem occurs, see if you can spend a bit of extra time for step-by-step troubleshooting, such as resetting just the Linksys router and then seeing if the problem seems to go away. Resetting everything at once may help bring the network back up more quickly, but it can hide the real source of the problem. Another thing to try: Get on a working computer and ping the IP addresses of various devices on the network that are connected to different switches. If it’s not the Linksys router at fault, this could help identify the problem device.

As to the wireless part of your question, you don’t need to use a router, you can use a switch. What you might want to think about doing is going to a more sophisticated Ethernet switch that allows you to use VLANs where you can specify what subnet a particular port is assigned to. That will help keep broadcast traffic from going to ports on a switch that it doesn’t need to.

Related Download
3 reasons why Hyperconverged is the cost-efficient, simplified infrastructure for the modern data center Sponsor: Lenovo
3 reasons why Hyperconverged is the cost-efficient, simplified infrastructure for the modern data center
Find out how Hyperconverged systems can help you meet the challenges of the modern IT department. Click here to find out more.
Register Now