The Hitchhiker’s Guide to the “Work from Home” Monitoring Galaxy

My Post (9).pngIn these times of remote teamwork, the pressure on IT teams is at its peak. So how can you ensure teams function well and conditions are good when working remotely? How do you ensure that the IT Ops teams can support the business as per usual? VPN, office suite, critical applications, videoconference, etc. The list of priorities change, new business apps need to be added while your kids and their endless energy become your face to face office colleagues. 🙂

According to Atlas VPN user data, VPN usage has increased in almost every single country in March (+112% in Italy, +53% in the United States but estimated to increase over 150% by the end of April) and this has a direct impact as many enterprises have to support multiple network and security technologies stressing VPN concentrators, DHCP servers, the number of SSL sockets, etc.

As the need for collaborative tools also explodes, more and more companies tend to make some changes in security to meet VPN demand such as using split tunneling for example.

The objective of this blog is not to go into very technical details but rather to help (at my humble level, but with the help of some colleagues) our customers by pointing to certain tools and practices to cope with an increase in remote work needs, not only to absorb internal demand but also to allow IT operations teams to work more easily remotely (someone said “distributed NOC”?).

Here are the main questions we will be addressing:

  • How do I collect the relevant data to monitor all systems’ smooth operation for remote workers?
  • Where in my environment is the next bottleneck coming up?
  • How can I share the big picture within my (remote) IT Operations team?
  • How can I take action when I’m not at my wall of screens in the NOC?

Get Data In to avoid blindness

Naturally, you are already monitoring your network, your VPN, endpoints, etc … but not that long ago, it wasn’t strictly necessary to supervise in-depths details such as access to certain applications in the cloud. At the end of this blog, you’ll find a (long) list of applications and other sources of information (from our Splunkbase, or Splunk Answers, even a fresh new add-on created by my fellow colleague Matthias Maier…)  that should set you up to onboard data more quickly and easily as well as monitor usage and issues.

You don’t have time to look at such a long list? Don’t despair, Splunk created a dedicated webpage listing Splunkbase Solutions for Remote Work. Our CTO, Tim Tully, and his team created Remote Work Insights (RWI), a solution composed of technical add-ons, dashboards, and connectors delivering real-time visibility across multiple disparate systems (VPN, Okta, Zoom…). RWI is available to any organization and includes free Splunk resources to understand your distributed workforce (and we made sure the dashboards in RWI rendered well in the Splunk mobile app as well).

More pressure on remote access = more risks

To save VPN resources or control costs (especially in high bandwidth consuming applications like videoconferencing), or just deal with the lack of transport services in specific areas of the country, more and more companies are changing their remote access approach by adopting split tunneling. Microsoft has posted an interesting blog on “How to quickly optimize Office 365 traffic for remote staff & reduce the load on your infrastructure“ where they recommend the use of split tunneling. This phenomenon becomes a troubleshooting challenge and might impact the way you monitor your WFH (work from home) infrastructure as your organization cannot easily monitor web traffic on the remote device through the VPN connection anymore.

Splitting the tunnel on the remote endpoint gives you two (or more) data paths. So to my previous point, you might want to also gather data from both paths and onboard data from your endpoint agents at the same time you monitor activity in-depth across your online services (G Suite, Office365, Salesforce…) to ensure you can support your business even if part of the traffic is not routed via your VPN.

There are several options for monitoring your endpoints such as UberAgent (paid service – refer to the dedicated link section), or Nexthink (paid), but there is another option to explore: install a Splunk Heavy Forwarder (HF) or Universal Forwarder (UF) on your endpoints.

To do this, you’ll need to do the following:

  • Identify users critical apps and services
  • Define the right data point to monitor
  • Create an input.conf for an HF/UF and use addon data input or command input or execute a batch/python script that puts in stdout the timestamp with the metric (more details on the scripting in the links section).
  • Investigate within apps like Splunk Add-on for Unix and Linux (to collect some network statistics, network interfaces information…), Curl command app (to poll data from REST API, etc
  • send it via outputs.conf to the Splunk server and build your dashboard
  • Or simply use the new free Add-on created by my fellow colleague Matthias called “WebPingi” available on GitHub that will allow you to monitor web services from the perspective of your endpoints.
  • Connect everything to IT Service Intelligence (if you use it) to see the big picture.

WebPingi AppExample of dashboard using WebPingi add-on to measure performance from a remote workers system to cloud applications

I thought about what to monitor and ingested the relevant data, now what? – The single pane of glass

Yes, the IT practitioner’s role is to look after critical applications, systems, networks, etc but they also need to look after themselves. We still spend countless hours looking at too many tools/screens, switching from one screen/tool to another. It is made much worse when your NOC/service desk “wall of screens” is now…your laptop (and the kids are still running around). Splunk IT Service Intelligence might help you see the big picture, save time and identify the issue faster. Here is a mockup of a glass table to monitor what is going on in a complex WFH situation. – Read more

Learn More About Splunk

Leave a Reply