大数据技术进展:大数据关键技术(1)

大数据技术关键技术

大数据技术进展:大数据关键技术(2)

大数据关键技术主要有数据采集、数据储存与管理、数据处理与分析、数据隐私与安全。

数据采集

利用ETL工具将分布的、异构数据源中的数据如关系数据、平面数据文件等,抽取到临时中间层后进行清洗、转换、集成,最后加载到数据仓库或数据集市中,成为联机分析处理、数据挖掘的基础;或者也可以把实时采集的数据作为流计算系统的输入,进行实时处理分析。

数据存储和管理

利用分布式文件系统、数据仓库、关系数据库、NoSQL数据库、云数据库等,实现对结构化、半结构化和非结构化海量数据的存储和管理。

数据处理与分析

利用分布式并行编程模型和计算框架,结合机器学习和数据挖掘算法,实现对海量数据的处理和分析;对分析结果进行可视化呈现,帮助人们更好地理解数据、分析数据。

数据隐私和安全

在从大数据中挖掘潜在的巨大商业价值和学术价值的同时,构建隐私数据保护体系和数据安全体系,有效保护个人隐私和数据安全。

The key technologies of big data mainly include data collection, data storage and management, data processing and analysis, data privacy and security.

data collection

Use ETL tools to extract data from distributed and heterogeneous data sources, such as relational data, flat data files, etc., to the temporary intermediate layer for cleaning, conversion, and integration, and finally load it into the data warehouse or data mart to become online analysis The basis of processing and data mining; or the real-time collected data can be used as the input of the stream computing system for real-time processing and analysis.

Data storage and management

Using distributed file systems, data warehouses, relational databases, NoSQL databases, cloud databases, etc., to realize the storage and management of structured, semi-structured and unstructured mass data.

Data processing and analysis

Distributed parallel programming model and computing framework, combined with machine learning and data mining algorithms, realize the processing and analysis of massive data; visually present the analysis results to help people better understand and analyze data.

Data privacy and security

While mining the potential huge commercial value and academic value from big data, construct a privacy data protection system and data security system to effectively protect personal privacy and data security.

(来源:Google翻译)

大数据技术进展:大数据关键技术(3)

大数据技术进展:大数据关键技术(4)

大数据两大核心技术:

分布式存储

GFS\HDFS

BigTable\HBase

NoSQL(键值、列族、图形、文档数据库)

NewSQL(如:SQL Azure)

分布式处理

MapReduce

Spark

Flink

Two core technologies of big data:

Distributed storage

GFS\HDFS

BigTable\HBase

NoSQL (key value, column family, graph, document database)

NewSQL (eg: SQL Azure)

Distributed processing

MapReduce

Spark

Flink

(来源:Google翻译)

素材来源:本文部分素材来源百度,本文由LearningYard新学苑原创,如有侵权请联系沟通~

,