Zones and zone data collectors - Citrix Presentation Server 4.5

Even though this section is called "server farm zones," we need to dig a little deeper into how Presentation Server actually works before we can talk about the actual zones.

Even though this section is called “server farm zones,” we need to dig a little deeper into how Presentation Server actually works before we can talk about the actual zones.

Remember from the last section that Citrix Presentation Server stores all of its configuration in a database called the IMA Data Store. It’s crucially important that you know this database is only used for static configuration information. It's not used to store any information about the current state of the environment. In other words, the data store only contains information that needs to be “remembered” when the server is turned off. This is basically all of the settings that you configure via the management consoles.

But as you can imagine, a Presentation Server environment requires that the servers keep track of a lot of dynamic information too. Things like which users are running sessions on which servers, which servers are online and offline, how much load is on each server, etc. This dynamic information is needed in several cases, such as:

  • The management console needs to show the admin which users are on which servers.
  • The system needs to know which servers are online so it can route incoming requests to servers that are running.
  • The system needs to know the server load in order to load-balance an incoming connection request.
  • Etc.

In order to make these things work in Presentation Server, Citrix could have done one of two things:

  • Design CPS so that all servers shared this information with all other servers, thus ensuring that any server receiving an information request would know what to do with it.
  • Design CPS product so that one server was the “king of gossip,” tasked with knowing everything about the state of every server. Any incoming requests could be forwarded to the server that knows what’s going on.

Those of you who’ve been working with Citrix for awhile know that WinFrame and MetaFrame 1.x made use of the first option via something known as the “ICA Browser Service.” (Whoo.. thinking back to that gives me the chills!)

The idea back then was that every Citrix server would open a connection with every other Citrix server in the farm. Then whenever something interesting happened (such as a user logon), the server that the user logged onto would contact every other server and give them the news. This meant that if a new request was being made, any server in the farm could service that incoming connection request since every server knew everything about every other server.

This was a cool idea, except for one little problem: It wasn’t scalable. Two servers would have a single connection open between them for communicating such changes. Three servers would have three connections (since each server would share a connection with every other server). Four servers would require six connections. Five servers would require ten connections. If you play this out, you’ll see that 100 servers would require 100 * 99 / 2 = 4,950 open connections on the network! Clearly Citrix had to find another way.

Figure xx
[show two servers connected, three, four, five]

Starting with MetaFrame XP in 2001 (and continuing through to CPS 4.5 today), Citrix introduced the concept of a “data collector.” The idea is that a data collector is just a regular Citrix Presentation Server whose IMA service takes on the additional role of tracking all of the dynamic information of other Presentation Servers. This information is stored in memory and called the “dynamic store.” (Note that “dynamic store” and “data store” both have the initials “DS.” But of course they are very different! The data store is a database on disk. The dynamic store is information stored in memory.)

In today’s version of Presenation Server, each individual server monitors itself for changes to its dynamic information. When it sees a change, the IMA service of the Presentation Server contacts the IMA service of the data collector (via the IMA protocol on port 2512) and sends the updated information. The data collector then updates its dynamic store in memory.

Figure xx
[show several servers, each with the LHC, an IMA service, and a connection to the data store. One is acting as the ZDC with the thought-bubble style dynamic store]

When dynamic information is needed about the environment (perhaps for an incoming user connection request or when an admin fires up a management console), the request can go to any Presentation Server. If it happens to hit a Presentation Server that is not acting as the data collector, that Presentation Server can contact the data collector (again via the IMA protocol on port 2512) to find out what it needs.

Communication between CPS servers and the data collector

What information is sent back-and-forth between the data collector and the other Presentation Servers? Really anything that changes often, such as:

  • There is a user session event, like a logon, logoff, disconnect, or reconnect.
  • The server or application load changes.
  • A CPS server comes online or goes offline.
  • Any published application’s settings change.
  • A CPS server has an IP or MAC address change.

If you want to peak into the contents of the in-memory dynamic store on your data collector, you can do so with the “queryds” command. (The “DS” in QueryDS stands for “dynamic store,” not “data store.” QueryDS can be found in the "support\debug" folder of your Presentation Server installation source files.)

Data Collector Elections

Previously in this section I referred to the data collector as a regular Presentation Server whose IMA service has also taken on the role of acting as a data collector. How does a regular Presentation Server come to be a data collector? It’s elected!

A data collector must exist in order for your Presentation Server environment to function. Without it, you couldn’t use any management consoles and no users would be able to connect into the environment. For that reason, Citrix doesn’t let you simply “pick” a server to act as the data collector. Instead, they make sure that a data collector is always present. Here’s how this works:

When a Presentation Server is started up, (or, more accurately, when the IMA service starts), the IMA service starts trying to contact other servers (all via the IMA protocol on port 2512) until it finds one that’s online. When it finds one, it queries it to find out which server is acting as the data collector. It then contacts the data collector and challenges it to a “run off” election. Basically this means the new server and the existing data collector will size each other up and determine which one should “win” the election. The winner gets to be the data collector.

How is the winner of this election determined? Simple. Whoever is running the newest version of the IMA service. This means that if you have a CPS 4.0 server and a CPS 4.5 server, the 4.5 server will become the data collector. If you have a regular CPS 4.5 server and a CPS 4.5 server with Feature Pack 1, the server with FP1 will become the data collector. If you have a server with no hotfixes and a server with the latest hotfixes, chances are good that the hotfixed server will become the data collector. (I say “chances are good” in this case because not every hotfix will affect the outcome of a data collector election since not every hotfix touches the IMA service.) The bottom line is that you can control which server will act as your data collector by keeping that server the most up-to-date!

At this point, many of you are probably wondering, “What about the data collector election priority settings in the management console?” (Presentation Server Java Management Console | Right-click on farm name | Properties | Zones | highlight server | “Set Election Preference” button) You can manually configure four levels of election preference, including most preferred, preferred, default preference, and not preferred.
Here’s the deal with those settings: Those are “tiebreaker” configuration settings are only used in the event of a tie. (And remember, a “tie” would happen if multiple Presentation Servers had the exact same versions and hotfixes installed.) Of course in production environments, your servers should all be the same, so the net effect is that yes, you can totally control which server is your data collector by manually setting the preferences in the Java console. It’s just important to remember that these preferences will be ignored if a newer server is up for election. (Remember this when you want to “test” a new version of Citrix in your production farm!)

Data Collection Election Priority

  1. Whichever server has the most recent version of the IMA Service running. (This may include hotfixes.)
  2. Whichever server has the highest preference set in the data store
  3. A bunch of other stuff that doesn't matter, because if you design your environment right then your elections will never get to this step.

When does an election take place?

As I mentioned earlier, a data collector election takes place whenever the IMA service is started on any server. But that’s not the only time an election takes place. If you consider that (1) a data collector must always exist, and (2) the server with the most up-to-date software must always be the data collector, you can logically work out when an election needs to take place:

  • If a Presentation Server tries to contact the data collector and fails, that server will call for an election.
  • When the existing data collector goes offline. (If it’s shut down gracefully, it will send out an election request before it shuts down its local IMA service. If is shuts down unexpectedly, this is like Item #1.)
  • When a new server is brought online. (As mentioned previously.)
  • When the configuration of a zone changes (when a Presentation Server is added or removed, or a new zone is created). The server that the management console is connected to sends out an election request.
  • If one data collector unexpectedly finds another data collector. (There can only be one, so a new election is held.)

If you’re interested in the exact network flow that takes place during a data collector election, check out CTX108316.

Once an election is over, if a new server has been elected then the other Presentation Servers send their current session data to the new server so it can populate its dynamic store. If the Presentation Server acting as the data collector hasn’t changed from before the election, the other Presentation Servers do not resend their information since the data collector has their up-to-date information from before.

Data collector interaction with the data store

Fundamentally, your data collectors and your data store are not really related. Your data store holds permanent farm configuration information in a database, and your data collector tracks dynamic session information in its RAM.

But the data collector and the data store are actually inter-related in a few ways. In addition to their primary role to provide dynamic farm information for admin consoles or for incoming connection requests, data collectors also take part in the distribution of configuration changes to Presentation Servers in the farm. It works like this:

When you make a change via one of the management consoles, that change is written to the data store. (Well, technically, it's written to the local host cache of whichever server you're connected to, and then immediately replicated to the data store.) If this configuration change applies to servers other than the specific one that your console happens to be connected to, the other servers need to get that change into their own local host caches too. Recall from earlier in this chapter that each individual Presentation Server only looks for changes in the central data store every 30 minutes. Chances are that if you changed something, you don't want to wait up to 30 minutes for all your servers to check in and find that change. You want that change to be applied instantly.

Citrix understands this. So whenever a change is made to the data store, that change is also sent to the data collector for the zone of the server that your management console is attached to. That data collector then distributes that change (via IMA port 2512) to all of the servers in its zone, allowing each server to update its own local host cache accordingly. Furthermore, if you have more than one zone, the initial data collector contacts the data collectors in the other zones. It sends its change to them, and in turn those data collectors forward the change to all of the servers in their zones. (More on this in the next section.)

I should mention that a really big change won't get pushed out this way. If the change is larger than 64k, the data collectors don't send the actual change out. Instead they send out a notification which causes the servers in the zone to perform an "on demand" sync with the central data store. However in the real world, it's rare for a single change to be more than 64k in size.

Figure xx
[diagram of change being pushed out through a few zones]

Splitting your Farm into Zones

Even though this section is supposed to be about zones, we’ve only discussed data collectors so far. The data collector architecture works well, but as you can imagine, there are be scenarios where it could be a bottleneck. Consider the following situation:

Figure xx
[two locations, CPS servers on both sides, data collector on the opposite side of a wan]

In the diagram there are two office locations separated by a WAN. Each location has users and Presentation Servers. Imagine that a user in Office Location A wants to access an application from a Presentation Server located in Office Location A. What if the Presentation Server acting as the data collector was located in Office B? In this case the user's connection request would be routed across the WAN to the data collector in Office B (since all incoming requests need to contact the data collector to find out which server the user should connect to). Once the data collector checks its dynamic store to figure out where the user should connect, the Presentation Server name is sent back down across the WAN and the user can make a connection (in this case with a Presentation Server running in Office Location A).

From a technical standpoint this works fine. But from a practical standpoint it's not ideal since the user's connection request had to travel across the WAN twice--once to contact the data collector to find out what server the session should connect to, and once when the data collector was issuing its response. The actual amount of data that traverses the WAN is very very small--only a few kilobytes. The problem is not that you’ll overload your WAN with too much data. The potential problem is that your user will have to wait for a round-trip WAN transaction before his session can start. In your environment this time may be trivial and might not matter at all. It just depends on your situation. If your users are doctors and they’re connecting and disconnecting hundreds of times per day, a few seconds each time could really add up.

What can you do about this? The easiest thing would beto move your data collector to the other side of the WAN. (Remember how to move a data collector? First make sure that the server you want to make the data collector is the most up-to-date patch-wise, and then change the election preference in the Presentation Server Java Console.) Of course moving the data collector role to the other WAN location is going to have the reverse negative effect on the users at Location B. The ideal solution would be to have two data collectors—one on each side of the WAN. This is exactly where the concept of “zones” comes in.

When Citrix talks about zones, they talk about zones as groups of Presentation Servers that share the same data collector. While that definition is technically accurate, a more practical way to think of zones is as a method to create multipe data collectors. (And multiple data collectors means more copies of this "dynamic store" spread through your environment which leads to faster connection times.)

From a technical standpoint, yes, a zone is just a collection of Presentation Servers that all share the same data collector. Thinking back to the previous example, you could divide that farm into two zones—one for each WAN location—which would mean that each location would have its own data collector.

Configuring zones is very simple (and very arbitrary). You can pick which servers (by name) you want to participate in which zone. From a pure configuration standpoint, any server can be in any zone. So you can go through a list of your servers and say you want Servers 1, 2, 5, 6, and 9 in “Zone 1,” you want Servers 3 and 4 in “Zone 2,” and you want Servers 7 and 8 in zone “Potatohead.” (The zone names are just as arbitrary as which servers are in which zones.)

When you split your farm into multiple zones, all of the data collector elections and the data collector role and everything happens exactly the same as explained previously except that the entire process happens within each zone. In other words, if you have ten servers divided into two zones of five servers each, a new server coming online will only cause a data collector election to take place in its own zone. The other zone is not affected.

To configure additional zones, just go into the Citrix Presentation Server Java Console, click on Zones, create a second zone (give it whatever name you want) and add a server or servers to it.

As you're creating additional zones, keep in mind that when a data collector election takes places, the most up-to-date server will always win the election regardless of its election preference setting in the Presentation Server Java Console. This means that if you introduce a “test” CPS 4.5 server into your production CPS 4.0 farm, that 4.5 server will be your data collector whether you like it or not. An easy way to get around this is to edit the zone configuration of your farm via the Java console, and configure your test 4.5 server so that it’s in its own zone. This means the other production 4.0 servers will have their own election, and since (one hopes) all your servers are at the same patch and hotfix level, whichever 4.0 server you’ve configured to be “most preferred” in the Java console will still be the data collector in your production zone.

Data collector to data collector communication

Whenever you split your farm into more than one zone, each zone has its own data collector. In order for your farm to work properly, every data collector must know know the status of a;; other servers and all user sessions in the farm. In an environment with more than one zone, the data collectors of each zone keep each other up-to-date about what's happening within their own zone.

To do this, whenever a session event occurs (remember from before that a "session event" is a user logon, logoff, reconnection, or disconnect), the server on which that even happened updates its own data collector as discussed previously. However that data collector then updates the data collectors in every other zone in the farm. This behavior was created in MetaFrame XP and still exists in Presentation Server 4.5.

At this point many of you might be wondering about the little check box in the Presentation Server Console called "Share load information across zones." (Presentation Server Console | Right-click on the farm | Properties | Zones | checkbox "Share load information across zones") By default, this box is not checked.

A lot of people misunderstand this check box. Having it unchecked does not mean the data collectors won't talk to each other. Having it not checked just means that no load information is shared--the zone data collector still notifies all other data collectors whenever a session event occurs. (This session event is a bit smaller in this case since the servers only need to exchange session and user info and not load info.) Also, when this box is unchecked, data collectors in one zone do not send the server or application loads every 30 seconds to the data collectors in the other zones.

In other words, when the "Share load information across zones" option is unchecked, each zone's data collector still maintains a farm-wide list of all sessions, servers, and other farm information. It's just that the unchecked option means that each data collector only maintains load information about the servers within its own local zone.

How big can a zone get?

There's real no technical limit that would limit the number of Presentation Servers that can be in one zone. In fact the product will support up to 512 servers in a single zone right out of the box, and a quick registry key change can let you go higher than that. In the real world, you'll probably have other reasons (multiple sites, etc.) to split such a large group of servers into multiple zones long before that.

Then again, if you have 1000 or 1500 servers in the same data center, there's no real reason you can't have them all in one or two zones. It's just a matter of looking at the traffic patterns. For instance, do you want one single data collector updating 1000 servers whenever you make a change to the environment (one zone), or do you want two data collectors to each update only 500 servers (one zone).

Should you build a dedicated Zone Data Collector?

If you have 500 servers like the example mentioned above, you'll absolutely want to have a Presentation Server that's deicated to acting as the data collector and not serving any applications. But what about smaller enviornments? At what point do you need to build a “dedicated” CPS server that acts only as a data collector without hosting any user sessions.

There are no hard numbers to dictate the point at which you should build a dedicated data collector.

If you have a regular Presentation Server acting as your data collector and you're considering building a dedicated data collector, it's fairly simple to check its load to ensure that it can handle the situation. You can even use Task Manager. (Just look for the "ImaSrv.exe" process.) Look at the resources consumed by the IMA service the server acting as the data collector, and compare that to the IMA service on another Presentation Server in the same zone. (Run "query farm /zone" from the command line to determine which server is acting as the data collector in the zone.)

Note that there is no "official" way to just install the data collector role onto a Presentation Server. If you want to make a "dediciated" data collector, you actually just install Windows, enable the Terminal Services application server, install Presentation Server, configure that server to be "most preferred" in the data collector election preferences box, and then don't install or publish any applications on it. If you do this, remember to install any Citrix hotfixes or service packs to this server first. Otherwise you run the risk that your dedicated data collector could lose an election to a more up-to-date server somewhere else.

The reality in today's world of multicore servers and cheap memory, you can probably grow a zone very large before you need to move to a dedicated data collector. It's hard to pin down an exact number though since how busy a data collector is really depends on how the characteristics of your environment. If you publish individual applications versus an entire desktop, your data collector will be busier. If your users connect and disconnect continuously thoughout the day, your data collector will be busier than if users maintain a single session throughout the day.

It turns out that once farms grow to perhaps twenty servers or so, people generally build a dedicated server to handle all the miscellaneous "overhead" roles, like license servers and admin console publishing. In these cases, the overhead server is usually used for the data collector role as well. This is a fine design. But it's a product of wanting to separate out the random roles to another piece of hardware, not because the scalability of the IMA service required a standalone data collector.

In fact, once most farms grow beyond a few servers, administrators have the desire to keep all production application servers 100% identical to each other. (Or, more specifically, all application servers within the same silo. More on that later in the book.) If one of the "member" Presentation Servers is acting as a data collector, that means not all servers are identical, and this makes people uncomfortable.

So whatever the reason--the fact that you have an extra server you can use or the fact that you want to keep all your servers identical--people end up building a dedicated data collector long before scalability reasons suggest that they should, and that's perfectly fine.

Zone Strategy

Now that we've discussed what zones are, how they work, and the mechanics of what happens when you create multiple zones, let's talk about strategy. You'll have to decide:

  • How and where you break up your farm into zones
  • Whether you create dedicated data collectors

How many zones?

The main decision you'll need to make is how many zones you'll have. (Or, put another way, where your zone boundaries are going to be.) Like everything else in your environment, there are several things to take into consideration when making this decision:

  • Where your users are
  • How your users connect
  • How your farm database is setup

Remember, the primary purpose of creating a new zone is to create an additional data collector. Additional data collectors mean that you can put a data collector closer to your users, essentially "pre-caching" the session information near the users that they need to start their sessions.

That said, keep in mind that data collectors also help to distribute configuration changes throughout your farm, as each data collector sends changes to all of the servers in its zone. Imagine you have a single farm across two sites. Each site has about 50 servers.

Figure xx [one farm, two sites, 50 servers on each site]

If you create a single zone, whenever a configuration change is made to your farm, the zone data collector will push that change (via the IMA protocol port 2512) to all of the servers in the farm, meaning that change will go across the WAN 50 times (once to each server.)

Figure xx [show the same environment as the previous figure, with ZDC on one side pushing the change out to all the servers individually]

On the other hand, if you split this environment into two zones, any configuration change would only traverse the WAN once, since the data collector in the zone that made the change would only have to push it out to the data collector in the other zone. Then it's up to that data collector to push the change out to the servers in its zone.

Figure xx [ show this ]

In the previous example, it's pretty easy to see that you'd want to split this farm into two zones. In fact, even if you had only five servers at each site, you'd still probably want to divide this farm into two zones. (Although with only five servers in each site, you probably wouldn't have dedicated data collectors.

Advantages of each site being its own zone

  • User connections could be faster
  • Updates only go across the WAN once (to the data collector), instead of many times (once to each server).

With several advantages to splitting your farm into multiple zones, why not just split everything? Unfortunately, there's a point at which this doesn't become feasible. Remember how this "zones" section began, with a quick historical look at the Program Neighborhood service from the MetaFrame 1.x days? That was not scalable because every server updated every other server--an architecture which quickly led to too many connections between servers. The same thing can happen today between zones. Every session event must be transmitted by one data collector to the data collectors in every other zone. So if you have a Presentation Server farm that spans 40 physical site, you can't possibly make 40 separate zones, because every time a user logged on, logged off, connected, or disconnected, the data collector from their zone would have to update all 39 data collectors in the other zones!

Disadvantages of making a lot of zones

  • Data collectors need to send all session events to all other data collectors. More zones means more data collectors More data collectors means more traffic for every session event.

From a strategic standpoint, you have to balance the number of zones that you create. Citrix has never gone on the record with a hard recommendation of how many zones you can have. In all honesty it depends on your environment. (It "depends" based on the same reasons discussed previously when discussing whether you should build a dedicated data collector--number of user connections, number of applications, etc.)

That said, it's probably safe to say that you can build three or four zones no problem. And it's probably also safe to say that once you get above ten zones, you need to really do some testing to make sure your environment can handle it. Anything in-between is probably doable, but again you'd want to test to make sure.

Remember, one of the main testing points needs to be whether your servers and the architecture of the Citrix products can handle what you want to do. None of this stuff puts huge amounts of bits on the WAN. It's more about whether you can build an environment that's responsive and scalable enough for your needs.

Dig Deeper on Virtual desktop delivery tools

Enterprise Desktop
Cloud Computing
SearchVMware
Close