Getty Images

Connect data with PowerShell's Join-Object module

In PowerShell, connect different data sources and sets with the community module Join-Object. Examine different examples and use cases to relate your data.

Data science is all the rage in businesses across the spectrum, but for organizations to start benefitting from AI and perform advanced data analytics processes, they must use programming languages with the data itself. There are several popular languages in use -- for example Python and R -- but for many Windows-based users, PowerShell is a more familiar language to use, especially within the last 10 years.

Use Join-Object in PowerShell

While PowerShell does not really fit into a machine learning stack, it does manipulate data well, as it can connect to many data sources, such as MySQL or Open Database Connectivity connections. One of the most common tasks is to join two data sets together based on a common property. PowerShell has the community module Join-Object for this task. With Join-Object, PowerShell users can enjoy the SQL-like experience of taking two separate, unrelated objects and joining them together.

Install Join-Object

Before we can use the Join-Object cmdlet, let's download it from the PowerShell gallery to install it via the Install-Module cmdlet:

PS C:\Users\dan> Install-Module -Name Join-Object

The following code output shows the installation succeeded:

PS C:\Users\dan> Get-Module -ListAvailable Join-Object

    Directory: C:\Users\dan\Documents\PowerShell\Modules

ModuleType Version    PreRelease Name           PSEdition ExportedCommands
---------- -------    ---------- ----           --------- ----------------
Script     2.0.2        Join-Object             Desk      Join-Object

Joining data

Two of the most common PowerShell cmdlets to run are Get-Process and Get-Service. Get-Process retrieves the processes currently running on a system, while Get-Service retrieves the installed services. While not always the case, sometimes the name of a service is also the name of a process running that service. How can we verify this?

One option is to loop through each process to see if it matches any service names and, if so, return that to the pipeline.

First, let's set our variables:

$processes = get-process

$services = Get-Service

If we use the Where-Object command, we can attempt to see if each process name is contained in the $services object using the --contains operator:

PS C:\Users\dan> $processes | Where-Object {$services.Name -contains $_.Name}

Table of processes that share the same name as a service.
Figure 1

Figure 1 shows that we have some matches.

What if we want to bring other properties of the matching service for each process? It is doable, but not very simple. This is where Join-Object shines.

With one line of code, we can do that quickly.

PS C:\Users\dan> $JoinedObject = Join-Object -Left $processes -Right $services -LeftJoinProperty name -RightJoinProperty name -KeepRightJoinProperty -Type OnlyIfInBoth -Prefix service_

Let's dig into this string to understand what these parameters mean in Join-Object.

  1. We have --Left and --Right. These are simply the objects we want to join, which are $processes and $services as defined above.
  2. We use the --LeftJoinProperty and --RightJoinProperty commands. This is the property from these two objects we want to join together. In this case, they have the same name, which is "name."
  3. The next parameter, --KeepRightJoinProperty includes the right joined property in the output of the join operation. I will demonstrate this further along in this tutorial.
  4. The --Type parameter is important, because this is where you specify what type of join you would like to perform. I used OnlyIfInBoth to see any processes and services with the same name.

The other options taken directly from the PowerShell help are as follows:

  • AllInLeft. This is the default parameter, which displays all Left elements in the output present at least once, depending on how many elements apply in Right.
  • AllInRight. This is similar to AllInLeft.
  • OnlyIfInBoth. Places all elements from Left in the output, regardless of how many matches are found in Right.
  • AllInBoth. Includes all entries in both Right and Left in output.
Types of join functions are AllInLeft, AllInRight, OnlyIfInBoth and AllInBoth.
Figure 2.
  1. The last parameter is --Prefix. This is used to prefix each Right property in the join with whatever string you decide. In this case, I used _service.

To show the results, use the Select-Object command.

Output generated by Select-Object command.
Figure 3

As you can see in Figure 3, we had a few processes and services that were able to join on the name.

Data scientists don't often consider PowerShell as a potential tool, but it can simplify connecting different data sources and joining data sets. Even outside of data science, PowerShell users will come across situations where they must join data for automation processes, and Join-Object is a valuable tool for this.

Next Steps

How to use Git to save PowerShell scripts

How and why PowerShell Linux commands differ from Windows

Dig Deeper on IT operations careers and skills

Software Quality
App Architecture
Cloud Computing
Data Center