Wednesday, January 6, 2010

Data Architect versus Database Architect

Introduction

It seems that many people don't understand the difference between a Data Architect and a Database Architect and the two titles are used interchangeably.  I, however, view them as separate skill sets and positions.
These web sites are examples of how these terms are used.
http://www.databaseguides.com/database-architecture
This definition of database architecture seems to start out by describing the physical infrastructure of a database system, but then switches to the architecture of the physical data model or data schema (the structure of how the data will be stored).
http://www.wisegeek.com/what-is-a-data-architect.htm
This person's definition of a Data Architect is consistent with mine.

Confusion

Perhaps the confusion begins with the term database.  Is a database the structure (schema) of how the data will be stored (Hierarchical, Object, Relational, Dimensional, CODASYL, etc.) or what I call the "engine"? 

The database engine is the software application, or the Database Management System (DBMS), that allows us to store the data.  The most common database engines today are used for Relational Database Management Systems (RDBMS).  These include Teradata, Oracle, and MS SQL Server.  Within each DBMS, there may be multiple schemata, or collections of data. (Each individual collection of data within a DBMS is called a schemata; the plural is schema, but the plural form is often used in the singular.)

The term database seems to be freely used for both the schema and the DBMS (engine) and the distinction only becomes apparent through the context the word is used in.  Therefore, I feel there should be a clear delineation between what a Data Architect does and what a Database Architect does.

Data Architect

A Data Architect (DA) is someone who focuses first on the data itself and how it will be used and maybe later on how it will be stored (schema) on the chosen DBMS.  It is the DA who strives to ensure that the "Garbage" in "Garbage In, Garbage Out" (GIGO) won't exist.  The DA should start working at the very beginning of a project in the Plan and Analyze phases.

At this point, the DA will be working with the Business Analyst (BA) and the client stakeholders to get the meta data for the data items (entities and attributes) that will be needed for the solution.  It is very important to understand early in the project what the business means by Customer, Order Date, Sale Date, etc.  The DA should be considering the big picture, i.e. the enterprise (global) use of the data, so the same data item won't have different definitions (meta data) and data types used in other data stores.

Typical deliverables in the plan and analyze phases are the Conceptual and Logical Data Models.  These documents are not concerned with what database engine (Teradata, Oracle, MS SQL Server, etc.) the data will be stored on.  Once there is consensus on what data items are needed and how they are related, then the DA may pass the Logical Data Model off to a Data Modeler or Data Analyst to create the Physical Data Model or schema; the DA may do this model, too.

This is generally done in the Design phase of the project even though it is really in the DA's Build and Deploy phases.  This is because there are others who can't do their work until the data store's structure has been completed and published.  At this point, the DA enters a supporting role to those who will be using the data.

Once the Logical data model is completed and agreed to with the business stakeholders, the database engine (probably a RDBMS) should be chosen.  Based on data volume metrics for initial and future storage capacity needed, maximum expected number of concurrent users, batch processing needed, and other usage factors, a decision should be made as to which database engine is the best.  Quite often, however, this decision has already been made because the company has invested money, time, and staff to one database engine already and does not want to invest more into a second database engine even if the existing database engine will have scalability issues.

Database Architect

When the data-centric project gets to the point of choosing what DBMS will be used to store the data, this is the person who guides the selection of a DBMS and architects the infrastructure for it.  Things like how many servers will be needed, how many tiers there will be, how much disk space is needed initially and three years later, how many concurrent users must be supported, and so forth are what the Database Architect is concerned with and will be creating a solution for.

The Database Architect is generally someone who is expert in the chosen DBMS.  At first, there may be several of these people, each representing a competing DBMS provider.  Vendors usually provide these people to design, estimate, and (if chosen) be the company's DBMS solution provider.  Once the DBMS is chosen by the program / project managers and team members and the CIO/CFO approve the selection, then a Database Architect who specializes in that DBMS is needed.

The Database Architect could be the same person who will be the Database Adminstrator (DBA); it would be better, in my opinion, to have a consultant act as the Database Architect because once the database system (DBMS and hardware) is designed and installed, there won't be much need for a Database Architect until the system needs to be upgraded, modified, or expanded.  Generally, the Database Architect should be provided by the vendor of the DBMS since they will have the greatest and most current knowledge of their DBMS.  A DBA should work along side of the Database Architect so there can be knowledge transfer of the infrastructure that the DBMS will reside on and the DBA will be monitoring.

Database Administrator

I included this third role in this dissertation even though it was not my original intent.  It seems that most people understand the operational role of the Database Administrator (DBA), but may not understand the DBA's role in modeling the data store.

The Data Architect (or Modeler or Analyst) will work with the DBA when the Logical Data Model is transformed into the Physical Data Model (schema).  It is very rare that a Utopian Logical Data Model is transformed directly to a Physical Data Model.  What is great in theory (or from a logical business perspective) is not always best in reality.  The DBA should be the one guiding the person doing the Physical Data Model as to what will work best with the chosen DBMS.  The DBA needs to understand the relationships between the entities, constraints and data validation needed, and what attributes will need indexing for SQL performance.  Once the Data Architect has gone on to other projects, the DBA will generally be the first contact for developers and users who need to understand how to get data out of the schema contained within the DBMS.  The DBA may need to consult with the Data Architect, but the users will generally contact the DBA first.

Sometimes, the DBA is also the Physical Data Modeler, but I don't think this is a good approach.  The DBA focuses on the performance and ease of maintenance of the DBMS; the Data Architect / Modeler / Analyst focuses on the schema, data quality, data structure, data relationships, and data use (queries).  These are definitely interrelated aspects, but it is hard for one person to advocate them equally.  What the modeler sees as a best solution may not be what the DBA sees as providing the best performance and least disk space consumption.  It is best when two or more people can provide input for both sides of the issue and come up with a solution that is agreeable to both.

1 comment:

  1. great i was confused with these roles but now i cleared

    Thanks...

    ReplyDelete