Table of contents

Data integration phases

InfoSphere® Information Server supports all phases of an effective data integration project. These phases constantly evolve as the lifecycle of the project grows and changes. By providing key data integration capabilities, InfoSphere Information Server addresses each phase to ensure that your project is successful.

The following diagram shows how the suite components work together to create a unified data integration solution. A common metadata foundation enables different types of users to create and manage metadata by using tools that are optimized for their roles. This focus on individualized tooling makes it easier to collaborate across roles.

The graphic shows the InfoSphere Information Server architecture, highlighting each of the main components in the suite. Each component prepares data for other components in the suite. The underlying framework of InfoSphere Information Server enables the components in the suite to integrate and share metadata that is stored in the shared metadata repository.
  • Analysts, data scientists, and line-of-business users can use IBM® InfoSphere Data Click to retrieve data and populate new systems on demand. For example, an analyst can optimize a business intelligence environment by integrating warehouse data, or line-of-business users can deliver their data for analysis.

  • Business analysts, data analysts, data integration specialists, and other users can use InfoSphere Information Governance Catalog to explore and manage the assets that are produced and used by InfoSphere Information Server. InfoSphere Information Governance Catalog provides end-to-end data flow reporting and impact analysis of your organization's data assets.

    You can use InfoSphere Information Governance Catalog to understand and manage the flow of data through your enterprise, and discover and analyze relationships between information assets in the InfoSphere Information Server metadata repository. You can use InfoSphere Metadata Asset Manager to import technical information, such as BI reports, logical models, physical schemas, and InfoSphere DataStage® and QualityStage® jobs, into the InfoSphere Information Server metadata repository.

  • Data quality specialists can use InfoSphere Information Analyzer to design, develop, and manage data quality rules for your organization's data to ensure data quality. As your organization's data evolves, these rules can be modified in real time so that trusted information is delivered to InfoSphere Information Governance Catalog, InfoSphere FastTrack, InfoSphere DataStage, InfoSphere QualityStage, and other InfoSphere Information Server components.

  • Data analysts can use InfoSphere Information Analyzer to discover where sensitive data is stored and to identify table relationships.

  • Data analysts can use InfoSphere FastTrack to create mapping specifications that translate business requirements into business applications. Data integration specialists use these specifications to generate jobs that become the starting point for complex data transformation in InfoSphere DataStage and InfoSphere QualityStage.

  • Data integration specialists can use InfoSphere DataStage and QualityStage Designer to develop jobs that extract, transform, load, and check the quality of data.

  • SOA architects can use InfoSphere Information Services Director to deploy integration tasks from the suite components as consistent, reusable information services.

  • Data stewards and data source owners can use IBM Stewardship Center to manage and collaborate on data quality issues. You can use the Data Quality Exception Console to monitor and manage data quality challenges discovered in the information landscape of your enterprise.

InfoSphere Information Server supports these phases in a data integration project:

Discover and analyze

InfoSphere Information Server can help you automatically discover the structure of your data, and then analyze the meaning, relationships, and lineage of that information. By using a unified, common metadata repository that is shared across the entire suite, InfoSphere Information Server provides insight into the source, usage, and evolution of each piece of data.

You can access the metadata repository by using the tools that are optimized for your role. For example, data analysts can use analysis and reporting functions to generate integration specifications and business rules that they can monitor over time. Subject matter experts can use web-based tools to define, annotate, and report on fields of business data.

By automating data profiling and data-quality auditing within systems, your organization can achieve the following goals:

  • Understand data sources and relationships
  • Eliminate the risk of using or proliferating bad data
  • Improve productivity through automation
  • Use existing information assets throughout your project

Design

InfoSphere Information Server can help you design and create data models based on the specific requirements of your information project. Carefully designing your physical data models, logical data models, and databases ensures that your architecture can handle changes as they occur, rather than reacting to changes after they happen.

New data continuously enters your applications, data warehouses, and business analytic systems. By using InfoSphere Information Server, you can design sophisticated data quality rules that you can modify in real time as your data evolves. In addition, you can scan samples of your data to determine their quality and structure so that you can correct problems before they affect your project. This approach ensures reliability and integrity of your data by consistently monitoring changes and making modifications.

You can also design your architecture to move large quantities of data in real time from your source applications to your data warehouse or analytics dashboard. Poor design requires constant changes to adapt your environment as the size of data volumes fluctuate. InfoSphere Information Server helps you to design your architecture to handle these demands from the outset so that the information that you need in your warehouses and analytic systems is delivered quickly and reliably.

Develop

InfoSphere Information Server supports information quality and consistency by standardizing, validating, matching, and merging data. By using the suite of components, you can and enrich common data elements, use trusted data, such as postal records for name and address information, and match records across or within data sources.

InfoSphere Information Server enables a single record to survive from the best information across sources for each unique entity, helping you to create a single, comprehensive, and accurate view of information across source systems.

In addition, InfoSphere Information Server transforms and enriches information to ensure that it is in the required context for new uses. Hundreds of prebuilt transformation functions combine, restructure, and aggregate information. Transformation functions are broad and flexible to meet the requirements of varied integration scenarios.

For example, InfoSphere Information Server provides inline validation and transformation of complex data types such as US Health Insurance Portability and Accountability Act (HIPAA), and high-speed joins and sorts of heterogeneous data. InfoSphere Information Server also provides high-volume, complex data transformation and movement functions that can be used for stand-alone extract, transform, and load (ETL) scenarios, or as a real-time engine for processing applications or processes.

Deploy

InfoSphere Information Server is built on a framework that enables information to be moved throughout the stages of your data integration project. These tools provide the capabilities necessary to integrate with your source code control system, move information assets throughout the enterprise, monitor the operational environment, and administer changes.

After you design and develop extract, transform, and load (ETL) jobs and other transformation jobs, you can monitor the activity, system resources, and workload management queues. A console aggregates this information so that you can troubleshoot problems with jobs, improve the performance of job runs, and actively monitor the status of your environment from a single location.

You can also import, export, and manage common metadata assets that are used by various components in the InfoSphere Information Server suite. After you share imports to the metadata repository, the imported assets are available to users of other suite tools. Other users can analyze these assets, use them in jobs, assign them to terms, or designate stewards for the assets. Deploying a common metadata repository ensures that your metadata is consistently applied and available to all users in your enterprise.

Core administration tasks, such as security, licensing, logging, and scheduling are centralized in a single console. Administrators can stop and restart services, manage user accounts, and back up and restore data across the enterprise. Changes are effective throughout the entire suite, simplifying administration and accelerating deployment to other InfoSphere components.

Deliver

InfoSphere Information Server includes the capabilities to virtualize, synchronize, and move information to the people, processes, and applications that need it. Information can be delivered by using federation-based, time-based, or event-based processing, moved in large bulk volumes from location to location, or accessed in place when it cannot be consolidated.

InfoSphere Information Server provides direct, local access to various information sources. It provides access to databases, files, services, and packaged applications, and to content repositories and collaboration systems. Companion products allow high-speed replication, synchronization, and distribution across databases, change data capture, and event-based publishing of information.