Integration Service tools overview
The extract, transform, load (ETL) component of SQL Server is redesigned from the ground up in SQL Server 2005. The new ETL component, Integration Services, replaces the Transformation Services (DTS) included in SQL Server 2000.
Integration Services introduces a rich set of tools to support the development, deployment, and administration of ETL solutions. The tools support the simplest solutions, in which you just want to perform tasks such as copying data from one location to another, to enterprise-level solutions, in which you develop a large number of complex packages in a team environment. This section describes the Integration Services tools and service in the context of the life cycle of the ETL solution: development and testing, deployment to the test or production environment, and finally, administration in the production environment.
This chapter discusses the Integration Services tools for developing and configuring packages, the tools that are available in Business Intelligence Development Studio; as well as the Integration Services management tools that are available in Server Management Studio to import or export packages, assign roles that have read and write permissions on packages, and monitor running packages. The discussion also includes information about the Integration Services command prompt utilities that you use to run or manage packages outside the Studio environments.
Business Intelligence Development Studio
Business Intelligence Development Studio is the SQL Server 2005 studio for developing business intelligence solutions, including Integration Services packages, data sources and data source views.
In Business Intelligence Development Studio you perform the following tasks:
- Design and create new packages
- Design and create the data source objects that packages use
- Design and create the data source views that packages use
- Modify existing packages
- Debug package functionality
- Create the deployment bundle that you use to deploy packages
In Business Intelligence Developments Studio, you develop your business solutions in the context of a solution and a project. A solution is a container that manages multiple projects as one unit. Typically, the projects in a solution are related, and together they support a business solution. A solution can include different types of projects such as Integration Services, Analysis Services, or Reporting Services projects. A project contains the items of a specific project type and provides the templates to build those items. The items are saved to disk, locally or remotely, as XML files.
Business Intelligence Developments Studio provides the following project types:
- Analysis Services Project
- Integration Services Project
- Report Server Project
- Reporting Model Project
In addition to starting a new project from scratch and manually constructing project items or adding existing items, you can launch the following tools with Business Intelligence Development Studio:
- Import Analysis Services 9.0 Database to create a new Analysis Services project by importing an existing SQL Server 2005 Analysis Services database
- Report Server Wizard to create a project and launch the Report Server Wizard automatically
The options for project types and tools are presented to you in the New Project window, as shown in Figure 16-1.
Figure 16-1 The New Project dialog box, in which you specify the project type, name the project, and optionally add the project to the source control.
You use the Integration Services project type to create packages and the data sources and data source views that packages use. If you choose the Integration Services project type, Business Intelligence Development Studio creates a project with a Data Sources, Data Source Views, SSIS Packages, and Miscellaneous folder (see Figure 16-2). An empty package is also provided.
Figure 16-2 The Integration Services project in Solution Explorer.
Many of the windows that you use when building packages are part of Business Intelligence Development studio: the Toolbox that provides the items for building control flow and data flow in packages, the Properties window that lists the properties of a package or package object, and the Solution Explorer that manages projects and project items, including the packages, data sources, and data source views in an Integration Services project.
Figure 16-3 shows the default layout of Business Intelligence Development Studio windows. The behavior and placement of windows are configurable. If you have used Microsoft Visual Studio, this environment is familiar to you and you probably already know how to customize the development environment. If you are new to Studios, see Chapter 13, "Inside Analysis Services OLAP Tools," for more information about the features of Business Intelligence Development Studio.
Figure 16-3The windows that Business Intelligence Development Studio provides in their default locations, with SSIS Designer in the center window.
SQL Server Import and Export Wizard
The SQL Server Import and Export Wizard is the simplest way to create an Integration Services package. The packages that you create with this wizard can extract data from a variety of data sources such as Excel spreadsheets, flat files, and relational databases, and load the data into a similar variety of data stores. For example, the package can select data from an Excel spreadsheet with a query and write the data into a SQL Server table.
You can launch the SQL Server Import and Export Wizard from SQL Server Management Studio or an Integration Services project in Business Intelligence Development Studio. In SQL Server Management Studio, the primary use of the wizard is to create and run packages as is. Administrators typically use these packages to perform ad hoc imports and exports of data, or they save the packages to rerun as part of routine data maintenance. This chapter focuses on using the wizard in Business Intelligence Development Studio.
Figure 16-4 The control flow and data flow generated by the SQL Server Import and Export Wizard for a package that copies data from one database to another and creates the table in the destination database. Depending on the options selected in the wizard, the control flow may vary.
The packages that you create with the SQL Server Import and Export Wizard can perform only very limited data transformation, such as changing column metadata. However, these packages provide a great way to get a jump start on creating more complex packages. If you run the wizard from Business Intelligence Development Studio, you cannot run the package as a step in completing the wizard. Instead, the wizard creates a package and adds it to the Integration Services project from which you launched the wizard. This package includes a basic workflow to extract and load data (see Figure 16-4). Also, depending on the options that you selected on wizard pages, the package may include tasks that prepare destination data stores, such as dropping and re-creating tables or truncating table data.
Once you have been through the wizard and the package is added to the Integration Services project, you can work with the package in SSIS Designer and enhance the package by adding other tasks, implementing advanced features such as logging and configurations and inserting transformations between the source and destination.
SSIS Designer is the graphical tool for developing packages. When you first open the designer, it consists of the four tabs: Control Flow, Data Flow, Event Handlers and Package Explorer (see Figure 16-5). When you run the package a fifth tab, named Progress, is added to the designer. After you stop the package, the Progress tab is renamed to Execution Results.
Figure 16-5SSIS Designer shows the tabs for each designer and the tab for the package content.
When you open an Integration Services project in Business Intelligence Development Studio, the SSIS menu is added to the menu bar. At this time, the menu has only one option: Work Offline (see Figure 16-6). This option applies to an entire project. When you select the Work Offline option, you are working in an offline mode. This means that Integration Services skips the aspects of package validation that make a connection to data sources and other external components.
Figure 16-6The SSIS menu option (there are multiple options) when you first open an Integration Services project.
When you open the first package in SSIS Designer, additional options become available in the SSIS menu (see Figure 16-7).
Figure 16-7The SSIS menu options after you have opened at least one package in SSIS Designer.
From the options on the SSIS menu, you can access the tools for implementing more advanced features in your packages, specify whether to work in offline mode, or switch to a different tab within SSIS Designer. The Work Offline option applies to the current Integration Services project. This option can also be set before you open SSIS Designer. The following list describes the SSIS menu options:
- Logging opens the Configure SSIS Logs dialog box, in which you add new logs and select the events and information to log.
- Package Configurations opens the Package Configuration Organizer dialog box, from which you launch the Package Configuration Wizard to create configurations.
- Digital Signing opens the Digital Signing dialog box, from which you can select the certificate to use.
- Variables opens the Variables window, in which you add, change, and delete user-defined variables and view system variables.
- Log Events opens the Log Event window, which lists the log entries that the package generates in real time.
- New Connection opens the Add SSIS Connection Manager dialog box, in which you select the type of connection manager to create.
- View provides access to the Control Flow, Data Flow, and the Event Handlers design surfaces and Package Explorer.
The Format Menu becomes available when you open a package in SSIS Designer. This menu includes many options for sizing the control flow and data flow items that a package contains and refining the layout of the control and data flows (see Figure 16-8). By applying these options to packages, you can make packages more legible and the control and data flows easier to understand.
Figure 16-8The Format Menu lists the options to modify package layout.
Depending on the layout of the package and the items selected, different options are available. For all options, except for Auto Layout, you must select at least two items before the sub-options become available.
Control Flow Designer
The Control Flow tab provides the control flow designer, in which you construct the package control flow. The control flow consists of autonomous tasks and repeating sub control flows that are linked into an ordered workflow by precedence constraints. When the Control Flow tab is active, the Toolbox lists the tasks and containers that you can use to construct control flows. Figure 16-9 shows the control flow designer and the Toolbox when the Control Flow tab is active. The Toolbox window is in the default location. The "Common Environment Configuration Scenarios" section, later in this chapter, provides information about customizing the Toolbox and the behavior of control flow items.
Figure 16-9The Control Flow tab shows a control flow that consists of a Sequence container with an Execute SQL task, a Data Flow task, and a Script task
Data Flow Design Surface
The Data Flow tab (see Figure 16-10) provides the data flow designer, in which you construct the data flows in the package. A package can include no, one, or multiple data flows. A data flow consists of one or more sources that extract data, transformations that modify the data, and one or more destinations that write data. The Toolbox lists the sources, transformations, and data control flow designer, and the default Toolbox when the Data Flow tab is active. The "Common Environment Configuration Scenarios" section, later in this chapter, provides information about customizing the Toolbox and the behavior of data flow items.
Figure 16-10The Data Flow tab shows a data flow that consists of an OLE DB source and a Row Count transformation.
Event Handlers Designer
The Event Handlers tab provides the Event Handler Designer, in which you construct an event handler for an Integration Services event. An event handler is a workflow that runs in response to an event that the runtime raises. The event handler also consists of a control flow of autonomous tasks and repeating sub control flows that are linked into an ordered workflow. If the event handler includes a data flow, then you use the data flow designer to construct the data flow.
The event handler designer is similar to the control flow designer. When the Event Handlers tab is active, the Toolbox lists the tasks and containers that you can use to construct control flows in event handlers. Figure 16-11 shows the control flow designer and the default Toolbox when the Event Handlers tab is active. The Common Environment Configuration Scenarios section, later in this chapter, provides information about customizing the Toolbox and the behavior of control flow items.
Figure 16-11The Event Handlers tab shows a control flow that consists of an Execute SQL task and a Send Mail task. The event handler is on the OnPreExecute event.
The Package Explorer tab provides an Explorer (why the cap?)-type view of package content. The view is built as you construct the package and provides a great way to understand the structure of the package. Figure 16-12 shows the expanded view of a fairly basic package; it has only one executable (Run SQL Statement) one connection manager (LocalHost.DatabaseName), and no user-defined variables. You can imagine how important this view is to understanding and communicating to others the structure of complex packages!
Figure 16-12 The Package Explorer view of a simple package.
The Explorer on the Progress tab records the progress of package execution and provides a view of package execution while the package is running. The view is built as the package makes progress in the execution of the control flow (see Figure 16-13) in the package and event handlers and in the data flow. The explorer records the beginning and completion of validation, progress percentages, and the start and end times of each executable, tasks, containers, and event handlers in the package, as well as the package itself. Depending on the tasks that the package contains, the Progress tab shows different types of information. For example, the Data Flow task might report the number of rows inserted into the destination data store. If errors or warnings occur, they are also listed in the Progress window.
In addition, the explorer on the Progress tab provides useful information about ways that you can improve the package. For example, if a data flow extracts columns from a data source and makes no subsequent use of the columns, a warning entry that identifies the unused column is written in Explorer window on the Progress tab.
Figure 16-13A package with a single Execute SQL task runs and generates the progress information shown in Figure 16-14. The green color on your screen indicates that the task executed successfully.
Figure 16-14The Progress window shows the start and finish of the validation phases and the start and finish time of the package and the Execute SQL task. Progress is reported on a query that the Execute SQL task runs. In this case, the query is a simple SELECT statement.
After you stop running the package, the name of the Progress tab changes to Execution Results. The results from the previous execution of a package remain available on the tab explorer until you rerun the package, run a different package, or exit SSIS Designer.
Connection Managers Area
The Connection Manager Area (see Figure 16-15) contains the connection managers that a package uses. Connection managers connect to data stores. They are used by sources and destinations to extract and load data, as well as many tasks, containers, and transformations that require access to a data store to do their work. You can add and configure connection managers as a separate step in the construction of a package, or you can add and configure the connection managers as you construct the control and data flows or implement logging in the package. If you choose to add and configure the connection managers as you go, Integration Services automatically makes available only the connection manager types that a particular control flow item, data flow item, or log provider can use.
The Connection Managers Area lists three connection managers.
Integration Services includes a wide variety of connection managers and provides a user interface to configure each type. You configure a connection manager as a step in adding the connection manager to the package. Later, you can modify the configuration by double-clicking the connection manager in the Connection Managers area. Figure 16-16 shows the right-click menu where you select the connection manager type and open the dialog box to configure that type.
Figure 16-16The Add SSIS Connection Manager dialog box lists the built-in connection types that Integration Services support.
Variable Management Tools
Variables are used in a million different ways in Integration Services packages. Integration Services supports system and user-defined variables. System variables are the read-only variables that Integration Services provides. User-defined variables are the variables that you define to support package functionality. You will soon find that you need to add variables to packages to support package functionality.
The following are a few of the ways that packages can use variables:
- Provide values to input parameters in SQL statements and capture values from output parameters
- Serve the expressions that variables, precedence constraints, property expressions, and data flow components use.
- Provide values to use in scripts
- Capture the row count from the Row Count transformation
- Provide the SQL and XML (code) that tasks and data flow components use
The Variables window for working with variables is not part of SSIS Designer, but variables exist within the context of a package and you must open the package in SSIS Designer before you can add, delete, and configure variables.
To open the Variables window, click Variables on the SSIS menu. By default, the window is docked in the upper-left corner of the Business Intelligence Development Studio. Like other Business Intelligence Development Studio windows, you can move this window and configure it to be a dockable or floating window or a tabbed document and use auto-hide.
The Variables window can add, delete, and list variables. By default the window contains columns for the name, scope, data type, and value of variables. Figure 16-17 shows the default Variables window. You can set additional variable properties in the Choose Variable Columns dialog box.
Figure 16-17 The Variables window when you click the Choose Variable Columns icon.
In the Choose Variable Columns dialog box, you can add the less frequently configured variable properties, the namespace, and whether an event is raised when the variable changes value to the Variables window.
Logging Configuration Tools
Integration Services include a variety of log provider types that you can use to implement logging in your packages. The log provider types include types to log to text and XML files, SQL Server Profiler, SQL Server, and Windows Events Log. You use the Configure SSIS Logs dialog box to configure logging. In this dialog box, you can specify type of log provider to implement, the logs to use, and the log entries to write to the log. The Configure SSIS Logs dialog box (see Figure 16-18) is not part of SSIS Designer, but log providers exist within the context of a package and you must open the package in SSIS Designer before you can configure logging. To open the Configure SSIS Logs dialog box, click Logging on the SSIS menu.
Figure 16-18The Configure SSIS Logs dialog box shows the log provider types.
The logs are defined at the package level (see Figure 16-19). After you have defined the logs, the tasks and containers in the package can use them. A package is a hierarchical collection of objects with the package object at the top of the hierarchy. In this hierarchy, every executable (task or container), except the package itself, has a parent. If you do not want to configure logging for each executable, you can specify that an executable uses the logging specifications of its parent container.
Figure 16-19The Configure SSIS Logs dialog box shows the events and information that you can select to log for a package object.
Properties Management Tools
Integration Services provides a variety of tools for setting the properties of packages and the objects that packages contain. The tools include custom tools for configuring the properties of tasks, containers, sources, transformations, and destinations, as well as the generic Advanced Editor dialog box that you can use to configure most data flow components.
The Properties window (see Figure 16-20), built into Business Intelligence Development Studio, provides an alternative way to configure package items. For packages and the Sequence container, the Properties window is the only tool available to set properties. In addition, the Properties window lists properties that are not available to configure in the custom tools such as properties that are read-only or properties for which the default values are often used. To view properties of an item in the Properties window, click the item in the package and then click Properties Window on the View menu. To show the properties of a package in the Properties window, click the background of the control flow designer.
You can update the value of properties with the evaluation results of expressions by implementing property expressions on properties. A property expression is an expression that you write using the Integration Services expression language and assign to a property. You can access the tools for building property expressions from the Properties window (see Figure 16-21).
Figure 16-20The Properties window lists the properties of the Sequence container.
Figure 16-21The Expressions node, from which you open the tools you use to write property expressions.
To learn more about using property expressions, see the "Common Package Development Scenarios" section, later in this chapter.
Package Configuration Tools
Package configurations update the values of packages and package objects at runtime. Each configuration is a name/value pair in which the name specifies the path of the property to update and the value specifies the property value. By implementing configurations in a package, you can tailor each deployment of the package to a specific environment. For example, you can update the connection string of a connection manager to point to a different server.
Integration Services supports a variety of configuration types. You can store configurations in XML files, environment and parent package variables, Registry entries, or SQL Server tables. The XML file and SQL Server table can store multiple configurations; the other types only a single configuration. If you choose to use a SQL Server table, you can store the configurations for multiple packages in the table and specify a filter to identify configurations for different packages. Integration Services provides two tools for package configurations: the Package Configuration Organizer dialog box and the Package Configuration Wizard.
Package Configuration Organizer
In the Package Configurations Organizer dialog box (see Figure 16-22), you enable the package to use configurations and specify the order in which the configurations are loaded at runtime. The configurations are loaded in top-to-bottom order. If multiple configurations update the same property, the configuration that is loaded last wins. To launch the Package Configuration Wizard click Add.
Depending on whether the configuration type supports multiple configurations, you can select one or multiple package and package object properties and then complete the wizard. The configuration is added to the bottom of the list of package configurations in the Package Configuration Organizer dialog box. You can use the up and down arrows to position the new configuration in its correct loading position. If you want to edit the package configuration, click Edit and then rerun the Package Configuration Wizard.
Figure 16-22The Package Configurations dialog box, after you have enabled configurations on the package.
Package Configuration Wizard
The Package Configuration Wizard guides you though the steps to create configurations. On the Select Configuration Type page, you select the type in the Configuration type list (see Figure 16-23). You can specify the configuration location directly or choose an existing environment variable to specify the location. Table 16-1 lists and describes the various Integration Services configuration types.
Figure 16-23The Package Configuration Wizard dialog box shows the configuration types.
Table 16-1Integration Services Configuration Types
|XML configuration file||Select an existing file or provide the name of a file to create. The file is created when you complete the wizard|
|Environment variable||Select an existing environment variable from the list|
|Registry entry||Type the name of an existing Registry key. The key must exist in HKEY_CURRENT_USER, and the key must include a value named Value. The value can be a string or a DWORD.|
|Parent package variable||Type the name of a user-defined variable with package-level scope. Variable names are case sensitive and the name you provide must be a case-sensitive match of an existing variable.|
|SQL Server||Select an existing OLE DB connection manager to use or create a new connection manager. The connection manager, in turn, specifies the SQL Server database that contains the table to store the configurations. You can select an existing table or create a new table. After you specify the table, you can select the filter to use for the configuration or type the name of a new filter. If a configuration already exists for a property that uses the specified filter, the configuration is overwritten.|
Business Intelligence Development Studio provides the same debug windows as Visual Studio. If you have debugged applications in Visual Studio, you already know how to set breakpoints and how to use the windows. If you are new to the debug environment, the Microsoft Visual Studio documentation provides information about how you access and use the debug windows.
The Integration Services breakpoints are similar to the breakpoints you may have used when writing code in Visual Studio. As in code, Integration Services breakpoints suspend execution to enable you to examine the values of variables, the call stack, and so forth to help you identify and correct errors. You can set breakpoints on packages, tasks, and other container types. To set a breakpoint you enable a break condition on the container.
In addition to enabling a break condition, you further identify break conditions by specifying how many times the break condition occurs before execution is suspended.
You use the Set Breakpoints
Figure 16-24 The Set Breakpoints dialog box shows that breakpoints are enabled on the OnPreExecute and OnPostExecute events of the Execute SQL task.
If a container in a package has a breakpoint, the breakpoint icon (a red dot) appears on the container shape in the control flow designer. The Control Flow tab represents the package, and if you enable breakpoints on the package, the breakpoint icon appears on the label of the Control Flow tab.
Package Deployment Tools
After package development is completed, you use the Build feature for Integration Services that Business Intelligence Development Studio provides to create a deployment bundle. The deployment bundle is the set of files you will copy to the target computer and then use to install the packages and their dependencies. The Build process creates a deployment manifest and includes it, the packages in the Integration Services project, package dependencies, and any files that you added to the Miscellaneous folder in the deployment bundle.
After you have copied the deployment bundle to the computer on which you want to install the packages, you run the Package Installation Wizard on the target computer. The wizard guides you through the steps to install packages. On the wizard pages, you must make the following decisions:
- Install packages to the file system or an instance of the SQL Server Database Engine.
- Choose whether or not the packages are validated after installation.
- Specify the folder in which packages (if deploying to the file system) and package dependencies are installed.
- If packages use configurations, whether to update the value of properties in the configurations or not.
SQL Server Management Studio
SQL Server Management Studio is the SQL Server 2005 studio for managing Integration Services packages. In SQL Server Management Studio you perform the following tasks:
- Organize packages in folders
- Import and export packages
- Assign read and write permissions to packages
- Run packages
- Monitor running packages
- View summaries of packages properties
After you connect to Integration Service, the Object Explorer in SQL Server Management Studio provides access to the folders for storing and running packages (see Figure 16-25). The Stored Packages folder and its subfolders list the packages saved to the package store; the packages can be saved to the sysdtspackages90 table in the msdb database or to the file system folders that the Integration Services service monitors. The package store is a logical store that can consist of msdb and specified folders in the file system. To learn more about which folders are by default part of the package store and how to add other folders to the package store, read the Common Package Management Scenarios section later in this chapter.
Figure 16-25Integration Services in the Object Explorer window of SQL Server Management Studio.
From the right-click menus of folders you access the tools to perform various management tasks (see Figure 16-26). For example, expand the Stored Packages folder and its subfolders, right-click a package, and then click the menu option to import or export the package, run the package, assign roles to the packager, or delete the package.
Figure 16-26The right-click menu on a package in an Integration Services folder listed in the Object Explorer window of SQL Server Management Studio.
Command Prompt Utilities
Integration Services provides two command prompt utilities for running and managing packages. You use the dtexec utility to run packages, and the dtutil utility to manage packages. In addition, Integration Services provides the Execute Package Utility, a graphical interface on dtexec. The SQL Server 2005 documentation provides detailed information about the options and option arguments for both utilities. This chapter does not include this information; instead, it includes samples of command lines that you might find useful to help you write command lines that fit your business needs. For usage scenarios, see the "Common Package Management Scenarios" section later in this chapter.
Use the following table of contents to navigate to chapter excerpts, or click here to view Inside Integration Service Tools in its entirety.
Inside SQL Server Integration Services Tools
Part 1: Integration Services overview
Part 2: Integration Services tools overview
Part 3: Using Integration Services tools in business scenarios
Part 4: Common package deployment scenarios
Part 5: Common package management scenarios
|About the book|
|Microsoft SQL Server 2005's high-powered management tools can dramatically improve DBA productivity and effectiveness. Now there's a comprehensive guide to SQL Server 2005's toolset, straight from the Microsoft team that created it. This book covers the entire toolset in unprecedented depth, guides database professionals in choosing the right tools, and shows them how to use various tools collectively to solve real-world problems. Purchase Inside SQL Server 2005 Tools from Addison-Wesley.|
|About the author|
|Lead author Michael Raheem is a senior product manager in the SQL Server Marketing team at Microsoft. Michael currently leads SQL Server enterprise marketing efforts, including high availability, scalability, performance, and SQL Server Always On Technologies. Prior to joining the marketing team, he led the design and implementation of several SQL Server 2005 tools such as Management Studio, Upgrade Advisor, Database Mail and Surface Area Configuration. Michael has spoken at several conferences, including TechEd, TechReady, PASS and SQL Connections. Additionally, he has contributed to the Answers from Microsoft column in SQL Server Magazine and has over 13 years of experience in designing and developing solutions with Microsoft SQL Server.|