Charlie Schluting, Author at Enterprise Networking Planet

Sorting Out the Debate Over Cloud Computing

Charlie Schluting — Thu, 29 Apr 2010 00:04:00 +0000

There are two extreme camps on either side of the cloud computing argument. One says everything is moving to the cloud, IT departments will be slashed, and businesses will save millions. The other says nothing will, due to various concerns such as security, performance and customization ability. Most businesses already use some cloud-based services, depending on how loosely “cloud” is defined.

First, we must identify what exactly is meant by “the cloud.” There are two major aspects: hosted applications and infrastructure. Both are a form of outsourcing, but the implications are vastly different. We’ll spend a bit of time explaining the differences before exploring the two sides of the cloud debate.

Applications hosted and run by another organization are said to be cloud-based services. Your bank probably provides a portal for managing various accounts. Your payroll company or check printing house provides a mechanism by which you can access their application, via a Web site. Even the most anti-outsourcing businesses use at least a few applications, via the Web, to communicate with partners and service providers.

Taking that one step further, today’s Software as a Service (SaaS) models represent the true spirit of cloud-based applications. Instead of buying a CRM product, for example, and installing it on your servers, you can now simply use the vendor-provided Web access and get all the functionality without the hassle. Moving to hosted applications saves IT staff time, capital budgets, and hassle. It makes a great deal of sense to let the people who wrote the software host it for you, since they are the experts.

Many cloud-only service providers have cropped up lately, to fulfill a market demand for hosted applications. Salesforce is the premiere example, and they have certainly proven that there is space in the market for SaaS companies to re-implement traditional software as hosted solutions.

Infrastructure as a Service is computing outsourcing. Amazon EC2 provides the infrastructure where you can run operating system instances just as you would do on physical servers in a data center. Except you don’t buy servers, you rent time on the cloud. The major selling point is that you can scale up computing resources very quickly with a cloud infrastructure, as opposed to purchasing and installing new servers yourself, but that point is also heavily debated.

The infrastructure component is what most IT people are referring to when they mention clouds. They are struggling with the options: use EC2, implement a compatible private cloud themselves, or skip it altogether.

The cloud infrastructure is essentially a better way to manage virtualization. It is also, some would say, “the right way to run servers.”

Everything Is Moving

The promise of mass deployability of new operating system instances has thankfully forced the IT world as a whole to really embrace automation and configuration management. Long-time sysadmins often scoff at the marketing buzz around the cloud, because they have accomplished automated deployment and on-going configuration management of running systems a long time ago. The cloud concept is simply another smart layer, one that balances resources and possibly auto-migrates virtual machines when resource usage changes.

This is a good thing! It means that everyone agrees we’re headed in the right direction. The right way to manage systems is with automated virtual machines, and the cloud concept is brilliant.

The “everything is moving to a public cloud” camp believes it makes no sense to run your own servers. Data centers, cooling, power — it all is a waste of time and money, unless of course you’re in the hosting business. Large-scale providers can do it all much more efficiently.

It’ll Never Work

Stability is a problem. If Amazon has taught us anything, it’s that they aren’t very good at this cloud thing, or redundancy. Their storage infrastructure, S3, has had many outages and consistently under performs. Why would you move your important business computing needs to a single provider? Nobody is immune to outages, regardless of how redundant they are.

Security is another major concern. If a critical piece of S3 infrastructure becomes compromised, all businesses using S3 will have a major data breach. The servers running EC2 images, too, are a target for criminal hackers. It makes more sense to hide critical servers behind a corporate firewall that can be audited, rather than rely on the security of each individual EC2 instance (and Amazon’s infrastructure).

Final Thoughts

At least, that’s the argument. Personally, I don’t buy the security argument, but stability, reliability and performance are highly suspect.

Another major point of contention is public versus private clouds. If you choose to run your own cloud infrastructure, you need to purchase more computing power than you necessarily need. To scale on-demand, capacity needs to be available. I think you will find that most businesses are currently in a half-virtualized state. They aren’t running cloud-like infrastructure, but a large portion of their servers run as virtual machines. The utilization of those servers is higher (in a good sense) than it has ever been, but there are still a lot of applications running on bare metal.

As has already been happening, businesses will continue to move away from service- or server-based mentalities, toward application-specific uses. Applications can easily be segregated onto individual virtual machine instances, which provides security and manageability benefits (assuming proper automation exists). Those virtual machines can be run the “traditional way,” or they can be part of a cloud infrastructure that eases management even further.

For many larger businesses, regardless their position in the cloud debate, it makes the most sense to implement a private cloud infrastructure that is compatible with EC2. If some instances need to be migrated to a public cloud in the future, and the security department OK’s it, you’re ready to do so.

The post Sorting Out the Debate Over Cloud Computing appeared first on Enterprise Networking Planet.

Business Has Killed IT With Overspecialization

Charlie Schluting — Thu, 08 Apr 2010 01:04:00 +0000

What happened to the old “sysadmin” of just a few years ago? We’ve split what used to be the sysadmin into application teams, server teams, storage teams, and network teams. There were often at least a few people, the holders of knowledge, who knew how everything worked, and I mean everything. Every application, every piece of network gear, and how every server was configured — these people could save a business in times of disaster.

Now look at what we’ve done. Knowledge is so decentralized we must invent new roles to act as liaisons between all the IT groups. Architects now hold much of the high-level “how it works” knowledge, but without knowing how any one piece actually does work. In organizations with more than a few hundred IT staff and developers, it becomes nearly impossible for one person to do and know everything. This movement toward specializing in individual areas seems almost natural. That, however, does not provide a free ticket for people to turn a blind eye.

Specialization

You know the story: Company installs new application, nobody understands it yet, so an expert is hired. Often, the person with a certification in using the new application only really knows how to run that application. Perhaps they aren’t interested in learning anything else, because their skill is in high demand right now. And besides, everything else in the infrastructure is run by people who specialize in those elements. Everything is taken care of.

Except, how do these teams communicate when changes need to take place? Are the storage administrators teaching the Windows administrators about storage multi-pathing; or worse logging in and setting it up because it’s faster for the storage gurus to do it themselves? A fundamental level of knowledge is often lacking, which makes it very difficult for teams to brainstorm about new ways evolve IT services. The business environment has made it OK for IT staffers to specialize and only learn one thing.

If you hire someone certified in the application, operating system, or network vendor you use, that is precisely what you get. Certifications may be a nice filter to quickly identify who has direct knowledge in the area you’re hiring for, but often they indicate specialization or compensation for lack of experience.

Resource Competition

Does your IT department function as a unit? Even 20-person IT shops have turf wars, so the answer is very likely, “no.” As teams are split into more and more distinct operating units, grouping occurs. One IT budget gets split between all these groups. Often each group will have a manager who pitches his needs to upper management in hopes they will realize how important the team is.

The “us vs. them” mentality manifests itself at all levels, and it’s reinforced by management having to define each team’s worth in the form of a budget. One strategy is to illustrate a doomsday scenario. If you paint a bleak enough picture, you may get more funding. Only if you are careful enough to illustrate the failings are due to lack of capital resources, not management or people. A manager of another group may explain that they are not receiving the correct level of service, so they need to duplicate the efforts of another group and just implement something themselves. On and on, the arguments continue.

Most often, I’ve seen competition between server groups result in horribly inefficient uses of hardware. For example, what happens in your organization when one team needs more server hardware? Assume that another team has five unused servers sitting in a blade chassis. Does the answer change? No, it does not. Even in test environments, sharing doesn’t often happen between IT groups.

With virtualization, some aspects of resource competition get better and some remain the same. When first implemented, most groups will be running their own type of virtualization for their platform. The next step, I’ve most often seen, is for test servers to get virtualized. If a new group is formed to manage the virtualization infrastructure, virtual machines can be allocated to various application and server teams from a central pool and everyone is now sharing. Or, they begin sharing and then demand their own physical hardware to be isolated from others’ resource hungry utilization. This is nonetheless a step in the right direction. Auto migration and guaranteed resource policies can go a long way toward making shared infrastructure, even between competing groups, a viable option.

Blamestorming

The most damaging side effect of splitting into too many distinct IT groups is the reinforcement of an “us versus them” mentality. Aside from the notion that specialization creates a lack of knowledge, blamestorming is what this article is really about. When a project is delayed, it is all too easy to blame another group. The SAN people didn’t allocate storage on time, so another team was delayed. That is the timeline of the project, so all work halted until that hiccup was restored. Having someone else to blame when things get delayed makes it all too easy to simply stop working for a while.

More related to the initial points at the beginning of this article, perhaps, is the blamestorm that happens after a system outage.

Say an ERP system becomes unresponsive a few times throughout the day. The application team says it’s just slowing down, and they don’t know why. The network team says everything is fine. The server team says the application is “blocking on IO,” which means it’s a SAN issue. The SAN team say there is nothing wrong, and other applications on the same devices are fine. You’ve ran through nearly every team, but without an answer still. The SAN people don’t have access to the application servers to help diagnose the problem. The server team doesn’t even know how the application runs.

See the problem? Specialized teams are distinct and by nature adversarial. Specialized staffers often relegate themselves into a niche knowing that as long as they continue working at large enough companies, “someone else” will take care of all the other pieces.

I unfortunately don’t have an answer to this problem. Maybe rotating employees between departments will help. They gain knowledge and also get to know other people, which should lessen the propensity to view them as outsiders.

The post Business Has Killed IT With Overspecialization appeared first on Enterprise Networking Planet.

RiverMuse Brings Your Legacy NMS Into the Present

Charlie Schluting — Mon, 22 Mar 2010 23:03:00 +0000

RiverMuse has taken on the task of making legacy enterprise monitoring systems useful in today’s IT environment. No, it’s not re-writing an HP OpenView or IBM Tivoli, instead RiverMuse has found a nice place to sit and provide even more value: as a Manager of Managers. It’s so much more than just another layer on top of existing monitoring systems, however.

RiverMuse started in 2008 as a combination of some of the the remaining innovators from Riversoft and Micromuse. Micromuse acquired the struggling Riversoft in 2002, and IBM acquired Micromuse for considerably more money in 2005. Prior, Riversoft had essentially become HP OpenView advanced edition, and with the IBM acquisition Micromuse became IBM Tivoli Netcool. Needless to say, this company has pedigree. So, what does this software do? It’s complex; that is to say, really difficult to explain: Open source manager of managers, uber event correlation, fault message aggregator, message filter and enhancer–all of those descriptions are true. None of those really explain it, either. Instead, let’s start by defining a few of the problems.

Current Problems in Enterprise Systems Monitoring

Continuous change is prevalent in today’s IT world more than it ever has been. Administrators deploy new hardware seamlessly, but that’s not even half of the story. In virtualized environments, virtual machines can spring up as the result of just a few simple commands. They can also physically move, which plays havoc on many monitoring and reporting systems.

Agility, then, is a weak point. What happens to your monitoring system when a node physically moves? Is all the historic data associated with the old MAC or IP address re-associated properly? Or does it linger behind and get associated with the new node to occupy that space? Does your monitoring system even know when virtual machines migrate to different physical hardware?

The lack of agility and ability to handle continuous change means that the two popular enterprise monitoring systems have blind spots. Silent failures can happen if the monitoring system fails to properly recognize how the environment has changed.

Finally, there is the cost. Large IT departments are still paying in excess of six figures for maintenance contracts on this aging monitoring software. Don’t forget the specialized staff required to run these systems.

That is not to say RiverMuse can replace expensive and antiquated network management systems. Instead, it sits and gathers information from your NMS as well as other sources, and from this vantage point can make intelligent decisions about event and fault management. Let’s take a look at what it actually does.

RiverMuse’s Place in the Infrastructure

To make sure we aren’t wrongly portraying RiverMuse as an add-on to the big two NMS providers, let’s talk about the open source solutions briefly.

What is number one complaint with the open source monitoring system Zenoss? It’s the noise. You cannot even define parent-child relationships to stop Zenoss from alerting on 100 hosts when a switch has failed. With the other big option, Nagios, the main complaint is configuration. Alerting, and even adding new nodes, is a manual, time-consuming process. What if you could just send all alerts and data from these systems to a smarter system. With RiverMuse being a manager of managers, this is possible.

RiverMuse correlates events from monitoring systems, applications, and infrastructure components. This means that you aren’t constrained by information from only a few of the critical sources any longer. SNMP traps can be sent directly to RiverMuse from various infrastructure and application sources. Your current NMS can likely do that, but RiverMuse can also digest syslog and Windows events. This means event correlation goes much deeper and that actually identifying the root cause of outages is possible from within the alerting system. RiverMuse, then, is almost like a Splunk but with more enterprisey features and robust alerting capabilities.

RiverMuse will then enhance the information it gathers by taking data from configuration management databases. For example, getting an alert from your NMS that says “node web2347 is down!” is not very useful. Sysadmins even struggle with remembering whether or not they should be in panic mode. Business users and NOC employees less familiar with hostnames in an infrastructure certainly aren’t going to know if that alert means anything.

Finally, once events have been given additional business information, they get filtered. This is where RiverMuse augments many half-baked NMS solutions. It de-duplicates, filters, and summarizes only events that matter before sending them on to operations staff for immediate attention. Some NMSes do this fairly well, and some completely ignore this. RiverMuse, however, owns this functionality as its core function, and it uses more information than is available within a traditional NMS to make its decisions.

Configuring these types of systems is going to be complex. The data and rules RiverMuse is dealing with can only be automated so much before it starts to make guesses about individual business requirements. It does, based on the demos we’ve seen, make this process very easy. We all know that dedicated FTE is required to manage the big two NMSes. This time is often spent updating the NMS with new information about network changes and fighting with alerting rules. With RiverMuse’s vision, it seems entirely possible that after deploying RiverMuse these staffs will be freed up to focus more on the business logic of how the infrastructure interacts.

RiverMuse is partially open source. RiverMuse core includes all the basic components, but seems to be fairly bare-bones compared to the pro offering. But hey, RiverMuse say it has “gifted” this to the open source community. Misunderstandings of “open source” aside, this is a fascinating system built by a fascinating company comprised of veterans in the NMS space. It’s worth a look.

The post RiverMuse Brings Your Legacy NMS Into the Present appeared first on Enterprise Networking Planet.

Move Your E-Mail Hosting to the Cloud

Charlie Schluting — Mon, 01 Mar 2010 23:03:00 +0000

The thought of switching a large (or even small) business’s e-mail to hosted providers like Google or Microsoft often meets with a lot of resistance. You might think security would rank as the number one concern, but in practice the big show stopper usually ends up being implementation. Moving millions of old e-mail messages and thousands of accounts is a daunting task.

Migrating all of your existing e-mail doesn’t have to be difficult. The big three providers, Google, Microsoft, and Yahoo, all provide many options to help alleviate the pain associated with moving an entire company’s e-mail infrastructure.

Why?

The decision to outsource e-mail, even to a free provider, is a big one. Once all the naysayers have been satisfied with regards to performance, security, and maintenance, the next big step is to devise a migration mechanism that won’t break and allow for anyone to question the decision.

To create a seamless migration from your local mail storage to the cloud, user accounts must be moved in-tact. It’s more than just the account (user and password), however. Every single mail folder must be migrated, and message timestamps need to be preserved. In short, users should not be able to detect that anything has changed.

The savings of not running 50 or more servers won’t be realized if account migration and routine maintenance tasks aren’t carefully considered. The big providers have APIs for creating, deleting, and generally adjusting user accounts. Most organizations have identity management systems that create accounts on various systems, including e-mail servers. Before making the big switch, make sure to extend these automation tools to begin managing e-mail users via the new provider’s API.

How?

There are four basic steps, once all the planning and other tasks mentioned above are complete. They are:

Calculate how much time the actual transfer of mail boxes will take.
Create the user accounts in bulk via your new provider’s API or bulk upload function.
Change MX records in DNS to point at your new provider.
Transfer

Calculating how much time it will take to transfer e-mail is critical, and difficult. If users are actively using e-mail, some “sent mail” may be left behind if the timing is off, so it is important to plan when to point users at the new service. Say you have 4,000 e-mail accounts, averaging 1GB each, and your provider will allow 100 transfers per batch. You can estimate that it should take one hour per batch, so in 40 hours the migration will be complete. Ideally, don’t let users access e-mail during this time, at all.

Next, you will want to create the user accounts at your provider. It is recommended that you initially set the password to something you know, to make sure you have as many options as possible for the transfer mechanism. We’ll cover that in a moment.

You will want to update MX records so that new mail ends up at the provider, but only after the user accounts exist, and preferably during a large window of time that your users are dormant. Make sure to lower the TTL records well before time so then changeover is quick.

Finally, the transfer itself. With most providers, there are three options for transferring all e-mail (and folders):

Have users do it themselves
IMAP automation, using provider tools or your own scripts
API insertion of mail

Having your users do it themselves really only works for small offices full of fairly technical people. Nevertheless, this is how: simply add the new IMAP account from your provider to a mail client, and drag & drop folders between accounts until everything has been copied over.

IMAP automation is the preferred method. There are two options here: do it yourself with imapsync, or use the provider’s tools. Google, for example, provides IMAP mail migration for education and Premiere Google Apps accounts. You must provide Google with remote access to your IMAP server, including a list of usernames and passwords for each user to migrate. Once it has that, Google’s sync tool will automatically transfer all IMAP folders over.

Getting your user’s passwords can be tricky. The best method is to get them to “register” for their new mail account, at which time you can gather the plain text version of their current password. If you’re authenticating users via LDAP, however, there is a cool way to work around this. Simply write a script that records a user’s password hash and then sets their password to something you know. Provide that known password to Google, and then restore their previous hash once the migration is complete.

Finally, most providers have an API to facilitate creating accounts and other administrative tasks. They might also allow for the transfer of e-mail via API. If this is the case with your new provider, API transfer of e-mail folders is a good idea. You’re going to need to know the provider’s API to create or delete accounts anyway, so this is a good time to practice.

What Won’t Work

Mail filters are the big issue. Unix-based procmail or sieve filters will be lost. If you’ve historically provided a Web interface to allow users to create filters, this one will probably bite you. Users that run Outlook or other mail clients, however, probably have defined filters from the client side. If filters were created server-side, you’re probably going to lose them, since Google doesn’t provide a migration path for filters.

The post Move Your E-Mail Hosting to the Cloud appeared first on Enterprise Networking Planet.

Zap Provides Open Source Wireless Testing

Charlie Schluting — Wed, 27 Jan 2010 23:06:00 +0000

“Zap” is a wireless performance tool, previously used for internal development and testing by Ruckus Wireless. Ruckus has released the Zap source code under a modified BSD license to provide the tool to the world, and hopefully spur development of this and other related analysis tools. Zap measures performance, statistically, to provide insight into the true nature of how a network can perform.

Before Zap, a wireless network administrator might have done some manual calculation to determine a best-guess baseline of how a network would perform. Check to make sure there aren’t too many overlapping channels that may cause interference, carefully plan thelayout, and hope. When testing time comes, we might run a large file copy with rsync or even use TTCP, which will tell us, on average, how fast the data transferred. This is fine for bulk transfers, and it may highlight an obvious problem in the network, but network-intensive applications today require more certainty.

How It Works

Testing wireless is difficult because the tester has no ability to control the environmental factors. We might run a crude test, like a file copy, multiple times and get multiple answers. Those answers can vary by MegaBits per second, which isn’t exactly precise.

Zap lets network designers determine the sustained worst-case performance a network can deliver 99.5 percent of the time. Zap does this by sending controlled bursts of data, measuring loss, latency, and other factors that can cause fluctuations in performance. Learning the worst-case sustained rate provides network engineers with the tools to evaluate whether streaming video or other demanding applications will run effectively on the existing network.

Getting Zap

At first, it seemed Ruckus was hopping on the marketing bandwagon that is open source. Its choice of license, however, indicates otherwise. Ruckus released Zap under a great license, the BSD, but simplified a bit. Basically, you’re free to use or even sell it, but keep the copyright notice intact and don’t imply Ruckus is endorsing your product or service.

To get Zap, visit its Google Code page for subversion checkout information:

http://code.google.com/p/zapwireless/source/checkout

Part of the reason for that awkward first impression is that Zap lacks documentation. There is no README file, or other trace of documentation in the checked out code. There are good comments in the source code, but you shouldn’t have to look there. On the Google Code page there is a Word document in the Downloads section, but it is mainly instructions for running Zap on Windows. Windows users, however, don’t get easy access to the binaries.

Anyway, Linux users. We get the source and run ‘make’ to compile Zap. There are a few warnings, but it seems to compile and run fine. Being the Curious George most of us are, we also find that turning on all warnings in gcc (-Wall) yields 78 compile warnings. Regardless, it runs and does its job, and none of the warnings seem to indicate that a grave mathematical error has been made.

In addition to the missing README file, ‘make install’ does not install any man pages. In fact, there are no man pages to speak of. This clearly isn’t Linux software, but it’s useful nonetheless, so let’s move on.

Output

The Windows documentation indicates that Zap has two components: zap and zapd. Running Zap on one node as the server requires simply invoking ‘zapd’ to start the service. Zap, then requires you to specify the source and destination IP address. Running ‘zap -h’ or any other option it doesn’t understand will coincidentally spit out syntax help information.

We first tried to run zapd on one machine, and then zap on the same. This produced a segmentation fault, so maybe testing the loopback interface performance isn’t implemented.

Running zap on two machines led us to discover it isn’t a server/client type of scenario. In reality, you need to run zapd on both machines, and then run zap to instruct them to communicate. This is weird, at best, but keep in mind that this software is generally running on their embedded devices, so it makes some sense.

With zapd running on both nodes, we ran:

‘zap -s10.1.0.69 -d10.1.0.70 -X30’

The -X30 option instructs Zap to terminate after thirty seconds.

Every burst outputs a line, but the very last line of output is what you’re after:

274: 10.1.0.69->10.1.0.70 1071333=rx 0=dr 0=oo 0=rp 3880=rx in 50.1ms 912.8mbps 916.8 | 944.3 926.3 883.6 871.4 863.1 856.7

Left to right, here is what this all means:

274: this was the 274th burst test

10.1.0.69: the source (sending) station

10.1.0.70: the destination (receiving) station

1071333=rx: number of received packets

0=dr: number of dropped packets

0=oo: out of order packets

0=rp: retried packets

3880=rx in: received packets in this batch/sample

50.1ms: batch sample time

912.8mbps: throughput

916.8: cumulative average throughput

944.3: peak throughput observed

926.3: median throughput (50%)

883.6: throughput at 90 percentile, i.e. it was better than this 90% of the time

871.4: 95 percentile

863.1: 99 percentile

856.7: 99.5 percentile

As you can see, Zap is fun to use on wired networks, too.

The 99.5 percent number basically tells you that you can depend on the network to deliver that speed almost always. This is especially useful for IPTV traffic, where it is important to be able to sustain a minimum rate to keep the video flowing.

In summary, Zap provides insights into the true nature of how a wireless network will perform. The unique method of sending bursts really puts wireless (and wired) networks through its paces and demonstrates what real-world throughput performance you can expect. It’s great to have this tool, we just wish it were a little cleaner so that it might one day end up in various Linux distributions.

The post Zap Provides Open Source Wireless Testing appeared first on Enterprise Networking Planet.

WAN Optimization the Open Source Way

Charlie Schluting — Fri, 22 Jan 2010 23:05:00 +0000

WAN optimization is a complex and expensive, yet sometimes required investment. Even if you aren’t running a branch office in Africa over ISDN, the need for WAN optimization and acceleration exists within nearly every business. The problem is that these products are extremely expensive. Wouldn’t it be great if the same functionality could be accomplished with commodity PC hardware and free open source tools?

Mostly, it can be done.

What Can’t Be Done

Riverbed and other vendors implement Wide Area File Services (WAFS), which is a fancy way to say it caches CIFS and NFS data. If multiple people are working on the same file, or if the same file gets opened and closed more than once, that data does not really need to be shipped to the remote file server. It’s even fancier than that; long-term caching of files also makes opening Word documents, which require a lot of bi-directional communication just to open, much faster. WAFS implements a (generally) safe mechanism for caching data when sending it over the WAN would be redundant. General caching proxies are not optimized for file sharing data, and will often have to send the whole thing, whereas the WAFS-style devices can be much more clever about it.

There are, unfortunately, no open source tools to create a systems like this, but WAFS is only one (albeit powerful) method for optimizing the WAN. Most businesses can realize substantial performance and usability improvements by leveraging QoS, caching proxies, and compression available in various open source tools. Jumping straight into a commercial WAFS solution is not recommended; the pricing is staggering, you may not even need WAFS-like features, and the architectural limitations of WAFS are sure to put a damper on your plans.

QoS and Queuing With FreeBSD or Linux

A big part of making a congested WAN link usable is prioritization. Especially if VoIP traffic traverses the WAN link! The good news is that Linux and the BSD family of operating systems can employ effective QoS and traffic shaping / queuing (shaping is accomplished by queuing packets). The prioritization aspect comes in when the kernel is deciding which traffic to allow through when some is queued.

Things can get very complex, very quickly, when configuring QoS. There are different methods for classifying and queuing traffic, as well as for determining how to stall (or kill) non-priority traffic. Ultimately, it certainly is possible to deploy a firewall with traffic shaping such that VoIP always works, Web browsing to internal applications gets priority over Web browsing to the Internet, and whatever other critical traffic you may have is given the proper consideration. Understanding how to classify packets is required to configure even a SOHO-class router with these features, so in the end it’s worth doing it in Linux or FreeBSD to get the full feature set.

Caching Proxies

Web traffic, even to internal servers, can be drastically reduced by using a caching HTTP proxy. Images and other large items can be cached locally using the standard Squid or Varnish proxy servers. Instead of using the WAN link to fetch images on remote Web and application servers every time a user clicks, the content is served from the local cache. All HTTP traffic can be transparently redirected through a proxy without any client-side configuration. HTTPS traffic can be proxied as well, but the Web browsers will need to be configured to use the proxy.

Squid can also proxy FTP traffic, or any other protocol it is configured to work with. This is beneficial for sites that may have a custom application and communication protocol that doesn’t use HTTP or FTP. The vast majority of use cases, though, simply require HTTP caching to realize a huge decrease in the amount of WAN traffic.

Before we talk about the two solutions that implement caching, compression, and other tricks, take note that another partial solution exists. In addition to providing security, OpenVPN can also employ compression. It runs in user-space, so latency will increase a tad, but when you need to physically ship less data (or risk saturating the link), OpenVPN is a good option.

WANProxy

WANProxy is a generic (as in flexible) TCP proxy. It can be deployed transparently to compress all TCP traffic between two endpoints. WANProxy can be used to filter data through a Squid proxy instead of the standard method of redirecting traffic using iptables rules in Linux. The benefit of filtering traffic through WANProxy as well, is that you get the compression benefits. And finally, WANProxy also caches (in RAM) some data, so that duplicate data doesn’t have to be re-sent over the WAN.

With the combination of WANProxy and Squid, a random remote file, say a Word document, will be transferred over the WAN (compressed) in full the first time someone in the office opens it. The second time, however, it gets served from local cache using no WAN bandwidth and providing immediate response to the end user.

Traffic Squeezer

Traffic Squeezer does all that and more. In addition to compression and QoS, it also provides traffic coalescing and protocol specific acceleration. Caching can be had by combining Traffic Squeezer with Squid, or even throw both WANProxy and Squid in the mix.

Traffic coalescing refers to the sending of multiple packets as one, to reduce the impact of protocol overhead and small packets. Traffic Squeezer coalesces every packet, and the benefits add up dramatically. See their documentation on coalescing for some great illustrations of how this works.

TCP and protocol-specific optimization (such as HTTP) also go a long way toward making high latency and low bandwidth WAN links more usable. Other features that may be implemented include post-optimized encryption and VoIP optimization. Traffic Squeezer is an extremely interesting project, and we will be watching to see if the scheduled features get implemented.

Legos, Anyone?

As you have no doubt surmised, piecing together a WAN optimizing cache and firewall is extremely complex. This is always the case in the open source world. The freedom to construct whatever system works best for you means it cannot be cookie-cutter-simple; the simple to use devices are likely based on Linux, and they are simple because someone chose how to construct the system for you.

With the right systems/network administrator at the helm, you can squeeze an amazing amount of performance out of 128Kb/s WAN links using open source software. Ongoing maintenance is not usually required more than once every few months, assuming the system was configured well. The choice to build or buy (or have a consultant build) is ultimately governed by your budget and time constraints. The “just make it work” solutions, and they will work fairly well, definitely strain the budget. Customized solutions can perform as well, or even better since they can be customized to your specific data needs.

The post WAN Optimization the Open Source Way appeared first on Enterprise Networking Planet.

Making the Case for Centralized WLAN Management

Charlie Schluting — Wed, 13 Jan 2010 21:01:00 +0000

Standalone wireless access points are a burden to manage. Cisco has a wide suite of centralized wireless controllers available to centralize and manage your thousands of access points as one. In this article, we examine how WLAN management looks in the world of centralized control with Cisco controllers.

In the absence of WLAN controllers, configuring a new access point requires a few steps. First, you remove it from the box and connect it to your laptop’s serial port. Next you login and paste in your standard configuration and set the IP, after allocating an IP address and creating the DNS and DHCP entries. Next, you decide where to plug it in and configure a switch port to connect the access point to. Finally, you can deploy the access point.

With centralized management, the only required step is the physical deployment. The access point will need to connect to a trunk port on your switch if you run more than one wireless network and wish for it to land on a specific virtual LAN (VLAN) once it hits the wire. Even without a special VLAN configuration, you will need to place the access point in the right VLAN to get DHCP and the next-server address of the wireless controller, where it fetches its configuration from. The port configurations can be done ahead of time, and pushed out to many switches at once; or they can even be automated. Suffice it to say, deploying new access points in this manner is most enjoyable.

It may seem a marginal benefit to save 20 minutes of configuration time to configure new access points, especially to smaller businesses. Larger infrastructures may have 100, 1,000, or even 10,000 access points to deploy and manage. With thousands of access points, it simply isn’t possible to configure, let alone manage, such a large wireless network.

Not only do centralized wireless controllers ease the burden of deploying new access points, but they also greatly simplify the day to day management of them.

Features

Aside from automated deployment, which, truth be told, is available to some extent with third party management tools, centralized wireless controllers are also able to implement some neat tricks.

Management, as was briefly touched on, is done from a single point. Not only is logging in to the thin-client access points impossible, it is unnecessary. Controllers allow the administrator to create groups for many purposes: geographical, security, and features. To deploy a change to the wireless configuration of an entire building, for example adding an SSID, simply apply that change to the group.

Wireless controllers are able to implement tricks unavailable in a standard decentralized wireless network. RF management, for example, allows the controller to detect radio interference and work around it by automatically boosting the power of nearby access points. Voice over Wi-Fi with proper QoS and location services allows for reliable and robust deployment of VoIP services.

Location tracking is useful for more than emergency services. User mobility when roaming between access points with potentially different networks, even with the ability to track and manage security policy updates, is possible with these controllers. The controllers also implement IPS and IDS features, and can use the location services to pinpoint the exact location of an evildoer. Defining security groups and configuring authentication protocols without having to manually configure an access point is also another time saver.

Ultimately, a whole slew of features is available when a centralized controller is calling the shots. Perhaps the most beneficial features–because let’s face it, most people could at least partially automate the centralized configuration duties–is RF control. The capability to knock rogue access points off the air and work around obstructions and interference is nothing short of amazing. There is no longer any need to physically move about access points due to spotty coverage, nor to run about frantically trying to locate an unauthorized device somewhere within a quarter mile.

Implementation

To implement a central wireless controller, you first need to find one. There are two Cisco options: standalone and modules. Both types have the same software, and therefore the same feature set, so the decision comes down to which you’d prefer. You will likely want two, since the drawback to centralized control is that all wireless traffic will flow through the central point, making the controller a single point of failure. The controllers have high availability features, which will allow one unit to take over if the first becomes unavailable.

The 2100, 4400, and 5500 series devices are standalone controller units. Integrated controllers and controller modules are available for many Cisco routers. Modules in the 6500 series are called the WiSM (Wireless Services Module), and the integrated controller stacks with other 3750 devices. Finally, the integrated service module is available for a number of Cisco routers (but not the 2600 series). The service modules often come with a limited feature set, as many of the devices they are available for are targeted at SMBs. Just pay attention to how many simultaneous clients and access points each will allow.

To get started implementing a controller-backed wireless network, you must first consider what to do with existing access points. You can re-flash your existing Cisco Aironet 1200 series access points to be thin clients, but since the thin access points are less expensive, this is often difficult to justify. Deploy a test network, sell them on eBay, or re-flash them; in the end, you need thin access points running the firmware the controllers expect to take advantage of all the features. Afterward, you simply need to configure the controllers and VLANs, and start deploying hundreds or thousands of access points.

Managing a heterogeneous wireless network with central controllers, provided by the same vendor, is absolutely the easiest way to deploy a scalable WLAN infrastructure. The cost of developing your own scripts or buying third party management software far outweighs the cost of the access point controllers. Even ignoring the added functionality you get with WLAN controllers, it makes sense to stick with one vendor: one vendor to call, one bill to pay, and a higher probability of 100 percent reliability.

The post Making the Case for Centralized WLAN Management appeared first on Enterprise Networking Planet.

Vendor Neutral WLAN Management: Do Your Research (Updated)

Charlie Schluting — Mon, 04 Jan 2010 21:01:00 +0000

Editor’s Note: This article has been corrected since first running.

Last week, we explained your options for managing wireless networks. One category of wireless management products we identified was “vendor neutral,” meaning products that claim to work across a variety of devices. This week, we’ll cover two popular and feature-rich products: AirWave from Aruba Networks and WiFi Manager from ManageEngine.

To be clear, these are applications which are able to manage access points from various vendors; we are not talking about access point controllers. An access point controller, at least as it’s commonly understood, refers to a “master” device that controls all other access points on the network. With controllers, you often don’t run a full access point firmware on each device, instead each has enough smarts to boot and get its configuration from the master controller. Next week we will cover a few Cisco products that operate this way.

Features

Vendor neutral management implies that you must be careful to ensure that your access points are supported. For products we’re talking about today, the respective companies list access point compatibility on their product Web pages. With the exception of AirWave, which can manage controllers too, this type of software is often managing full access points, be they Cisco, Proxim, Avaya, or other devices. Each run their own firmware and traditionally require an administrator to login to each unit for configuration. Unified management software essentially logs in automatically (or uses SNMP) to configure the devices.

Centralizing configurations for wireless access points allow administrators to be certain each access point is configured correctly. It is configuration management, for those familiar with the Unix/Linux server world. Wireless management goes much further though, allowing the automation of firmware updates, security monitoring, and even threat elimination.

For any of this to work, the management software must know how each access point works, and it must be programmed specifically for each model of access point. The good news is that both products we’re talking about today support a wide array of devices.

AirWave

In March 2008, Aruba Networks acquired Airwave Wireless to take ownership of the AirWave Wireless Management Suite. The suite includes four main components:

AirWave Management Platform
VisualRF Location and Mapping Module
RAPIDS Rogue Detection Module
AirWave Master Console & Failover Servers

Together, the components of AirWave compose a full feature set for managing your wireless infrastructure. The Management Platform provides provisioning and configuration automation; the VisualRF module provides monitoring, reporting, and visualization of the WLAN; RAPIDS enables device discovery and policy enforcement; and finally the Master Console provides an interface to it all.

Much of WLAN management centers around security. The potential for abuse is great, and administrators spend most of their time dealing with security problems. Take, for example, a user found to be stealing MAC addresses to hop on the network and execute man-in-the-middle attacks. While AirWave provides many tools for dealing with rogue (unauthorized) access points, it doesn’t actively monitor for man-in-the-middle attacks. If however, you discovered this happening via other mechanisms, the VisualRF module could easily locate the physical location of the offending user. Map overlays using Google maps, combined with device triangulation between all access points that can see the user, mean that a user’s physical location can be pinpointed with surprising accuracy.

While AirWave does not publish a list of compatible devices, the company does provide a compatibility matrix to prospective customers on request. We would urge anyone looking at Aruba AirWave to take a careful look, as AirWave Suite’s support for certain features will vary from device to device.

Our next vendor, makes it a little easier to do your research by publicly publishing a compatibility matrix that illustrates which features work with which devices.

WiFi Manager

WiFi Manager from ManageEngine takes things one step further in the security department. It can detect rogue access points, but also knock them offline automatically, either by DDoS or by disabling the switch port it’s connected to. WiFi Manager discovers more than just access points, it needs to be given access to all switches and routers as well, so it can trace the physical port location of every MAC address on the network.

Cool features aside, WiFi Manager also does the basics you would expect. It centralizes configurations, allows configuration changes to be pushed out to every device, and can centrally dispatch firmware updates, assuming your access points are fully supported.

The WiFi Manager Web site is up front and honest about which features work with which device. The list is not overly large, but WiFi Manager certainly covers the popular access points used in enterprise deployments.

There are a few features lacking in this product, however, including: user location, mapping integration, and NMS integration. Compliance auditing and similar functions are not advertised, but often companies that advertise these types of features are just providing a simple dump of random data anyway. WiFi Manager seems to focus more on security, and provides all of the expected functions to make centralized access point management a reality.

Caution

If our skepticism was not readily apparent, let’s spell it out:

These types of solutions rarely work flawlessly. Anyone who has run an NMS with multiple vendors’ gear already knows this, and in larger organizations that is likely that person’s only job function. It’s that bad. If you’re yearning for 100 percent compatibility and no surprises, the easy option is to go with central wireless controllers and access points from a single vendor. If you’re already looking to replace your entire network, it makes sense to standardize. In this case, vendor lock-in can be a good thing. There is no need to purchase a separate management suite on top of your hardware investment, even if it is the exception (like AirWave) and will likely support all your devices. If you’re in a more common budget situation and must support a variety of existing access points, be sure to get the real scoop (and even a trial) from your wireless management vendor.

Next week, we explain how to navigate the huge maze of Cisco access point and wireless controller options.

Corrections

Since first running, the following corrections have been made:

Unlike a number of other products in the field, Aruba provides the ability to manage wireless controllers.
Contrary to our initial report, Aruba AirWave does provide a compatibility matrix to prospective customers on request.

Return to the top of the article.

The post Vendor Neutral WLAN Management: Do Your Research (Updated) appeared first on Enterprise Networking Planet.

Understanding NIC Bonding with Linux

Charlie Schluting — Wed, 02 Dec 2009 21:12:00 +0000

Network card bonding is an effective way to increase the available bandwidth, if it is done carefully. Without a switch that supports 802.3ad, you must have the right hardware to make it work. In this article we will explain how bonding works so you can deploy the right mode for your situation.

Most administrators assume that bonding multiple network cards together instantly results in double the bandwidth and high-availability in case a link goes down. Unfortunately, this is not true. Let’s start with the most common example, where you have a server with high network load, and wish to allow more than 1Gb/s.

Bonding With 802.3Ad

You connect two interfaces to your switch, enable bonding, and discover half your packets are getting lost. If Linux is configured for 802.3ad link aggregation, the switch must also be told about this. In the Cisco world, this is called an EtherChannel. Once the switch knows those two ports are actually supposed to use 802.3ad, it will load balance the traffic destined for your attached server.

This works great if a large number of network connections from a diverse set of clients are connecting. If, however, the majority of the throughput is coming from a single server, you won’t get better than the 1Gb/s port speed. Switches are load balancing based on the source MAC address by default, so if only one connection takes place, it always gets sent down the same link. Many switches support changing of the load balancing algorithm, so if you fall into the single server-to-server category, make sure you allow it to round-robin the Ethernet frames.

Alternatively, you don’t need to burn the expensive switch ports at all. Both servers can be connected together via crossover cables to the bonded interfaces. In this configuration, you want to use balance-rr mode on both sides, which we will explain momentarily.

Generic Bonding

There are multiple modes you can set in Linux, and the most common “generic” one is bonding-alb. This mode works effectively in most situations, without needing to configure a switch or trick anything else. It does, however, require that your network interface support changing the MAC address on the fly. This mode works well “generically” because it is constantly swapping MAC addresses to trick the other end (be it a switch or another connected host) into sending traffic across both links. This can wreak havoc on a Cisco network with port security enabled, but in general it’s a quick and dirty way to get it working.

Channel Bonding Modes

Channel Bonding modes can be broken into three categories: generic, those that require switch support, and failover-only.

The failover-only mode is active-backup: One port is active until the link fails, then the other takes over the MAC and becomes active.

Modes that require switch support are:

balance-rr: Frames are transmitted in a round-robin fashion without hashing, to truly load balance.
802.3ad: This mode is the official standard for link aggregation, and includes many configurable options for how to balance the traffic.
balance-xor: Traffic is hashed and balanced according to the receiver on the other end. This mode is also available as part of 802.3ad.

Note that modes requiring switch support can be run back-to-back with crossover cables between two server as well. This is especially useful, for example, when using DRBD to replicate two partitions.

Generic modes include:

broadcast: This mode is not really link aggregation – it simply broadcasts all traffic out both interfaces, which can be useful when sending data to partitioned broadcast domains for high availability (see below). If using broadcast mode on a single network, switch support is recommended.
balance-tlb: Outgoing traffic is load balanced, but incoming only uses a single interface. The driver will change the MAC address on the NIC when sending, but incoming always remains the same.
balance-alb: Both sending and receiving frames are load balanced using the change MAC address trick.

High Availability

How often have you seen a network die catastrophically? So bad that the link died? Chances are: never. More often you will see packet loss and very strange behavior. The failover part of NIC bonding is quite attractive to many administrators, but it rarely ever works. When the switch that both ports is connected to reboots for a firmware upgrade, you are down.

The easy fix is to connect each port to two distinct switches, right? If you are using a bonding mode that doesn’t require switch support this will work fine. If, however, you are using a mode that requires switch support, this is not possible on most devices. Switches that support stacking, and are managed from a single point, often support EtherChannel across multiple switches. Ideally, you would would connect one port to each, and never reboot the whole stack of switches simultaneously.

Decisions

Bonding is simple once you understand the limitations of each mode. if you’re working in an environment where switches support 802.3ad and you have no special needs, use that mode. Conversely, if you have no switch support and just want to increase throughput and enable failover, use balance-alb. Finally, if you just need a data replication link between two servers, balance-rr is the way to go.

The post Understanding NIC Bonding with Linux appeared first on Enterprise Networking Planet.

Use Samba With Windows 7 Clients

Charlie Schluting — Wed, 18 Nov 2009 22:11:00 +0000

Windows 7 is out, and everyone says they are going to upgrade, finally. What does this mean for your Samba servers? In this article we will talk about our experience using Windows 7 with Samba, both as a domain controller and as a basic file server.

Samba is not important just for those rogue sysadmins who try to avoid buying Windows Server products. Samba is used by storage appliance manufacturers and within a wide variety of other embedded devices. Samba interoperability is therefore important for both IT shops that run Linux servers, and businesses that sell Linux-based devices. Microsoft may not, contrary to popular belief, intentionally break Samba, but updates to the protocol and client default settings (due to complaints about security in the Windows world) often leave Samba unable to operate, which brings us to some good news:

This time, with Windows 7, only half of Samba stops working.

Accessing Samba Shares

Accessing Samba shares from Windows 7 “just works.” That is, assuming you’re running a relatively recent version of Samba. Samba 3.3.2, which ships with Ubuntu Jaunty, works perfectly with Windows Vista and therefore Windows 7 (they are the same, really). In testing, we had no problem connecting to various different Samba servers and Windows XP-based shares.

If you are stuck with an older version of Samba and cannot upgrade, workarounds do exist. Many NAS devices still run Samba 2.x, and do not have an upgrade mechanism. Before modifying all your Windows 7 machines’ registries, it is worth checking with the manufacturer of your storage device to inquire about an upgrade. Failing that, you must “degrade” Windows 7.

Go to: Control Panel -> Administrative Tools -> Local Security Policy
Select: Local Policies -> Security Options

As shown in Figure 1, there are two settings to change.
“Network security: LAN Manager authentication level” -> Send LM & NTLM responses
“Minimum session security for NTLM SSP” -> uncheck: Require 128-bit encryption

Figure 1. Click for a larger image.

After these two settings have been changed, you will be able to connect to older Samba-based file shares.

If problems still exist, one final thing to try is removing the stored credentials for the Samba share. During testing, it’s possible that something strange got “stuck” in there. In the Control Panel -> Credential Manager, find and remove the stored credentials for the Samba server.

The “just works” comment should be true for people with an already-working Samba setup, who need to allow access from new Windows 7 clients. If you are trying this for the first time, we have left out a lot of details. Start the Samba project’s own documentation.

Joining to a Samba Domain Controller

To join a Windows 7 workstation to your Samba domain controller, you must be running Samba 3.3.4 or higher. It also requires registry hacks within the Windows 7 machine due to security upgrades from Microsoft. Microsoft is not intentionally breaking Samba support, they are simply forcing the Windows Server world to upgrade and deploy more secure mechanisms. Samba often gets caught in the crossfire of forced security hardening, but this is to be expected given that Microsoft doesn’t work with or inform the Samba team of upcoming changes.

Failure to join a Samba domain is confusing. The error, as seen in Figure 2, will state, “The specified domain either does not exist or could not be contacted.” If the domain controller really was inaccessible, you would get another error, before Windows asked for credential to join the machine to the domain. That error would explain how a domain controller was not found. This error, however, really has nothing to do with a connection error.

Figure 2. Some Windows errors are needlessly confusing.

To get Windows 7 clients to connect to the domain running Samba 3.3.4 or higher, four registry keys need to be changed. For the ones that don’t exist, create them.

Two dword keys within HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservicesLanmanWorkstationParameters:

"DomainCompatibilityMode" = 1 "DNSNameResolutionRequired" = 0

And two within HKEY_LOCAL_MACHINESYSTEMCurrentControlSetservicesNetlogonParameters:

"RequireSignOnSeal" = 0 "RequireStrongKey" = 0

After setting these, you should be able to join the machine to the existing Samba-run domain. Again, this is assuming you’re working in an already-working environment. Configuring Samba to act as a domain controller is covered in the article, Build a Primary Domain Controller With Samba.

If you are adding a new Windows 7 machine to the domain, don’t forget to create the machine account in Samba, after the Unix account exists. In Samba: ‘useradd -a -m HOSTNAME’. And finally, remember that when joining the Windows 7 machine to the domain, you must use an account that has credentials to add machines.

Windows 7 is largely the same as Vista, so figuring out other problems that crop up doesn’t take long, since people have been using and testing the operating system for a few years now. If you are planning to run a Samba domain controller for Windows 7 workstations, we recommend automating those registry setting changes within your installation environment.

Overall, Windows Vista/7 didn’t present many surprises. The most common use case of Samba, as just a basic file server, works flawlessly assuming you have a fairly recent version of Samba. Most IT environments running a few Samba shares mixed within a Windows network, should have no problem supporting Windows 7 clients.

The post Use Samba With Windows 7 Clients appeared first on Enterprise Networking Planet.