OSCHINA-MIRROR/DengMingChen-datahub

Присоединиться к Gitlife

Откройте для себя и примите участие в публичных проектах с открытым исходным кодом с участием более 10 миллионов разработчиков. Приватные репозитории также полностью бесплатны :)

Присоединиться бесплатно

Клонировать/Скачать

data-source-onboarding.md 1.5 КБ

# How to onboard a new data source?

In the [metadata-ingestion](https://github.com/linkedin/datahub/tree/master/metadata-ingestion), DataHub provides various kinds of metadata sources onboarding, including [Hive](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/hive-etl), [Kafka](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/kafka-etl), [LDAP](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/ldap-etl), [mySQL](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/mysql-etl), and generic [RDBMS](https://github.com/linkedin/datahub/tree/master/metadata-ingestion/rdbms-etl) as ETL scripts to feed the metadata to the [GMS](../what/gms.md).

## 1. Extract
The extract process will be specific tight to the data source, hence, the [data accessor](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/ldap-etl/ldap_etl.py#L103) should be able to reflect the correctness of the metadata from underlying data platforms.

## 2. Transform
In the transform stage, the extracted metadata should be [encapsulated in a valid MetadataChangeEvent](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/ldap-etl/ldap_etl.py#L56) under the defined aspects and snapshots.

## 3. Load
The load part will leverage the [Kafka producer](https://github.com/linkedin/datahub/blob/master/metadata-ingestion/ldap-etl/ldap_etl.py#L80) to enable the pub-sub event-based ingestion. Meanwhile, the schema validation will be involved to check metadata quality.

Копировать Редактировать Web IDE Исходные данные Просмотреть построчно История

Отправлено 26.07.2020 20:41 adf975e

How to onboard a new data source?

In the metadata-ingestion, DataHub provides various kinds of metadata sources onboarding, including Hive, Kafka, LDAP, mySQL, and generic RDBMS as ETL scripts to feed the metadata to the GMS.

1. Extract

The extract process will be specific tight to the data source, hence, the data accessor should be able to reflect the correctness of the metadata from underlying data platforms.

2. Transform

In the transform stage, the extracted metadata should be encapsulated in a valid MetadataChangeEvent under the defined aspects and snapshots.

3. Load

The load part will leverage the Kafka producer to enable the pub-sub event-based ingestion. Meanwhile, the schema validation will be involved to check metadata quality.

Опубликовать ( 0 )

Вы можете оставить комментарий после Вход в систему

https://api.gitlife.ru/oschina-mirror/DengMingChen-datahub.git

git@api.gitlife.ru:oschina-mirror/DengMingChen-datahub.git

oschina-mirror

DengMingChen-datahub

master

OSCHINA-MIRROR/DengMingChen-datahub

How to onboard a new data source?

1. Extract

2. Transform

3. Load

Опубликовать ( 0 )

Введение

Обновления

Участники

Недавние действия

OSCHINA-MIRROR/DengMingChen-datahub .gitee-modal { min-width: 500px !important; } .gitee-modal .close { right: 0.6rem !important; }

How to onboard a new data source?

1. Extract

2. Transform

3. Load

Опубликовать ( 0 )

Введение

Обновления

Участники

Недавние действия

OSCHINA-MIRROR/DengMingChen-datahub