10/31/2023 0 Comments Filemaker server ubuntu![]() ![]() >Databricks - tuned spark version(companies use this) >Apache spark - Plain open source spark version >MapReduce (Distributed computing in Hadoop) Now we got the data but need to process this data so that downstream people can do analysis on this dataįor processing or computing, we can go with any of the below frameworks ![]() The process we used to get data from various sources to data like is ELT (Extract load transform) Now how to ingest or get data from various sources to the Data lake, for this, we can use the below frameworks ![]() >On Premise - HDFS(Hadoop Distributed File System) We have data lakes from various providers ![]() Now if we need to store all types of data like structured, semi-structured & unstructured data we cannot store it in a data warehouse, for this we need a centralized repository which is called a Data lake. Transactional systems are also called OLTP systems which store day-to-day data and these data are structured data.Īnalytical systems are also called OLAP systems which store historical data and are used for analysis purposes.Īs data grows our transactional system is not efficient in handling historical data so we need to store this data in a data warehouse so that our transactional system doesn't get overburdened There are two types of systems 1 is transactional (RDBMS - MySql, Oracle) and another one is an analytical system (Data Warehouse). But do we know how big data systems handle such huge data and how data engineers make this data usable to downstream data analysts or data scientists? Data is everywhere and we already know that. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |