by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :)by MathLes and ClintCode AYYILDIZ TEAM :) Computer Engineering Tips - Computer Engineering news and articles - Data Warehousing in Bioinformatics Computer Engineering Tips - Computer Engineering news and articles - Data Warehousing in Bioinformatics
  Home arrow Bioinformatics arrow Datawarehousing and DataMining arrow Data Warehousing in Bioinformatics
Computer Engineering Tips - Computer Engineering news and articles | _DATE_FORMAT_LC
 
 
Main Menu
Home
News
Algorithms
Artificial Intelligence
Bioinformatics
Computer Graphics
Computer Networks
Computer Organization
Computer Security
Data Mining
Data Structures
Database Systems
Distributed Computing
Internet
Operating Systems
Parallel Computing
Programming Languages
Publications
Robotics
Software Engineering
Other Articles


Partners

All Partners

 
 
 
Data Warehousing in Bioinformatics _CMN_EMAIL
_USER_RATING: / 0
_VOTE_POOR_VOTE_BEST 

Bioinformatics is such a field where data grows at an exponential rate and knowledge grows only at linear rate. The ultimate challenge for the biological database community is to help in closing this gap between growth of data and knowledge. Bioinformatics data consists of different views of biological information. Bioinformatics databases are diverse in terms of data formats, and are highly redundant too. The data views in bioinformatics includes biological sequences (like DNA, RNA and proteins), gene or protein expression, functional properties, molecular interactions, clinical data, system descriptions, and related publications. The data appear as sequences, sequence annotations, structural models, physical maps, clinical records, interactions pathways, gene and protein expressions, protein-protein interactions, and other forms in data sources such as databases, private data collections, and related publications. Data warehousing emerges in bioinformatics to support biological knowledge discovery. A data warehousing is a structured repository of a large volume of data integrated from different operational sources to support analytical processing.

There is substantial diversity and variation in bioinformatics data, even among databases containing data of the same view (same type of data). Each database has its own infrastructure and proprietary data format i.e. common data standards and data exchange formats are not established in this field. For example, sequence entries are described in different formats in GenBank, Swiss-Prot, and EMBL. ASN.1 format (Abstract Syntax Notation One) is developed by GenBank while Swiss-Prot designed its own format. Also, the Swiss-Prot format slightly differs from the format of EMBL database. Further, the introduction of XML (Extensible Markup Language) as the generic data exchange format has also given rise to several variants of XML representations of bioinformatics data.

The bioinformatics data is characterized by enormous diversity matched by high redundancy, across individual as well as multiple databases. Enabling interoperability of the data from different sources requires resolution of data disparity and transformation in the common form (data integration), and the removal of redundant data, errors, and discrepancies (data cleaning). Frequently encountered data redundancy issues are:

  • fragments and potential entries of the same item ,example sequence, may be stored in several source records
  • databases update and cross-reference one another with a negative side effect of occasionally creating duplicates, redundant entries, and proliferating errors
  • the same sequence may be submitted to more than one database without cross-referencing those records
  • the owners of the sequence record may submit a sequence more than once to the same database

To enable the extraction of knowledge in a data warehousing environment, these are rectified by data warehousing integration and data cleaning components.

A bioinformatics data warehouse requires several components for operation:1) retrieval of data from databases, 2) mechanism for cleaning data, 3) flexibility of manipulating the datasets, and 4) integrating and designing purposeful analysis tools that can be used jointly or independently.

Currently, not all data warehousing concepts have been applied to bioinformatics. For example, the dimensional data model, based on relational tables, is not widely seen in bioinformatics due to the complexity of the real data. Data warehousing has been historically developed using mainly relational databases systems, which are not as broadly used in bioinformatics. The major components of a data warehouse provide for initial and incremental data integration, data annotation, and data mining. Data integration also includes subcomponents for data retrieval, data cleaning, and data transformation. These components enable compilation of raw data from various databases. Data cleaning support tools are used for filtering of irrelevant and redundant records. To enable the interoperability of tools on the heterogeneous datasets and for ease of data management, the data is transformed into a common data format or interoperable formats. Because of information being updated in databases on regular basis, these steps of retrieving, cleaning, and transforming must be repeated for an incremental data integration process.

Once the initial dataset is created, the annotation component enables the addition of value-added information from experimental results or related publications to the dataset. Data cleaning is also supported at the annotation stage, facilitating removal of erroneous data propagated from the data sources. The analysis of data is enabled by incorporating general or specific tools for data mining. Then the new knowledge extracted from these analyses is produced as an elaborate report.

 
 Sponsored Links

_USER_RATING: / 0
_VOTE_POOR_VOTE_BEST 
 
       
         
     
 
Advertisement
 
   

Copyright 2005 - 2006 Science Tips Team. All rights reserved.

Nanotechnology Development Blog