Skip to main content

Command Palette

Search for a command to run...

Most Commonly Used Terminology in Big Data Engineering

Updated
โ€ข2 min read
Most Commonly Used Terminology in Big Data Engineering
N

I am a Tech Enthusiast having 13+ years of experience in ๐ˆ๐“ as a ๐‚๐จ๐ง๐ฌ๐ฎ๐ฅ๐ญ๐š๐ง๐ญ, ๐‚๐จ๐ซ๐ฉ๐จ๐ซ๐š๐ญ๐ž ๐“๐ซ๐š๐ข๐ง๐ž๐ซ, ๐Œ๐ž๐ง๐ญ๐จ๐ซ, with 12+ years in training and mentoring in ๐’๐จ๐Ÿ๐ญ๐ฐ๐š๐ซ๐ž ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ , ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ , ๐“๐ž๐ฌ๐ญ ๐€๐ฎ๐ญ๐จ๐ฆ๐š๐ญ๐ข๐จ๐ง ๐š๐ง๐ ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐œ๐ž. I have ๐’•๐’“๐’‚๐’Š๐’๐’†๐’… ๐’Ž๐’๐’“๐’† ๐’•๐’‰๐’‚๐’ 10,000+ ๐‘ฐ๐‘ป ๐‘ท๐’“๐’๐’‡๐’†๐’”๐’”๐’Š๐’๐’๐’‚๐’๐’” and ๐’„๐’๐’๐’…๐’–๐’„๐’•๐’†๐’… ๐’Ž๐’๐’“๐’† ๐’•๐’‰๐’‚๐’ 500+ ๐’•๐’“๐’‚๐’Š๐’๐’Š๐’๐’ˆ ๐’”๐’†๐’”๐’”๐’Š๐’๐’๐’” in the areas of ๐’๐จ๐Ÿ๐ญ๐ฐ๐š๐ซ๐ž ๐ƒ๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ฆ๐ž๐ง๐ญ, ๐ƒ๐š๐ญ๐š ๐„๐ง๐ ๐ข๐ง๐ž๐ž๐ซ๐ข๐ง๐ , ๐‚๐ฅ๐จ๐ฎ๐, ๐ƒ๐š๐ญ๐š ๐€๐ง๐š๐ฅ๐ฒ๐ฌ๐ข๐ฌ, ๐ƒ๐š๐ญ๐š ๐•๐ข๐ฌ๐ฎ๐š๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง๐ฌ, ๐€๐ซ๐ญ๐ข๐Ÿ๐ข๐œ๐ข๐š๐ฅ ๐ˆ๐ง๐ญ๐ž๐ฅ๐ฅ๐ข๐ ๐ž๐ง๐œ๐ž ๐š๐ง๐ ๐Œ๐š๐œ๐ก๐ข๐ง๐ž ๐‹๐ž๐š๐ซ๐ง๐ข๐ง๐ . I am interested in ๐ฐ๐ซ๐ข๐ญ๐ข๐ง๐  ๐›๐ฅ๐จ๐ ๐ฌ, ๐ฌ๐ก๐š๐ซ๐ข๐ง๐  ๐ญ๐ž๐œ๐ก๐ง๐ข๐œ๐š๐ฅ ๐ค๐ง๐จ๐ฐ๐ฅ๐ž๐๐ ๐ž, ๐ฌ๐จ๐ฅ๐ฏ๐ข๐ง๐  ๐ญ๐ž๐œ๐ก๐ง๐ข๐œ๐š๐ฅ ๐ข๐ฌ๐ฌ๐ฎ๐ž๐ฌ, ๐ซ๐ž๐š๐๐ข๐ง๐  ๐š๐ง๐ ๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐  new subjects.

  1. Data Source: The origin of the data, which can be databases, files, APIs, or streaming platforms.

  2. Extraction: The process of gathering data from different sources, and transforming it into a suitable format for processing.

  3. Transformation: Manipulating and converting data into a desired format or structure, including cleaning, filtering, aggregating, and joining operations.

  4. Load: The process of storing transformed data into a destination system, such as databases or data warehouses.

  5. ETL: Stands for Extract, Transform, Load. It refers to the overall process of extracting data from various sources, transforming it, and loading it into a target system.

  6. Batch Processing: Handling and processing data in large volumes at scheduled intervals or in batches.

  7. Real-time Processing: Processing and analyzing data as it arrives, providing immediate insights and actions.

  8. Streaming: Handling and processing continuous data streams in real-time.

  9. Data Pipeline: A series of interconnected steps that enable the movement and processing of data from source to destination.

  10. Data Warehouse: A central repository for storing structured and organized data, optimized for querying and analysis.

  11. Data Lake: A storage repository that stores vast amounts of raw or unprocessed data in its native format.

  12. Data Governance: A set of policies and practices to ensure data quality, integrity, security, and compliance throughout the data pipeline.

  13. Data Quality: The measure of data's accuracy, completeness, consistency, reliability, and relevance.

  14. Metadata: Information about the data, such as its source, structure, format, and meaning.

  15. Workflow Orchestration: Coordinating and managing the execution of different tasks and dependencies in a data pipeline.

  16. Data Partitioning: Splitting and organizing data into smaller, manageable subsets based on specific criteria (e.g., time, location, or category).

  17. Data Replication: Copying and synchronizing data across different systems or locations for redundancy, scalability, or fault tolerance.

  18. Data Integration: Combining data from multiple sources or systems into a unified view.

  19. Data Modeling: Designing and structuring data to represent real-world entities, relationships, and business logic.

  20. Data Pipeline Monitoring: Monitoring the health, performance, and data flow within a pipeline, often with the help of metrics, alerts, and logging.

Do you want to connect with me I have started mentoring for career and interviews at ๐ญ๐จ๐ฉ๐ฆ๐š๐ญ๐ž.๐ข๐จ/๐ง๐š๐ฏ๐ž๐ž๐ง๐ฉ๐ง

More from this blog

Naveen P.N's Tech Blog

94 posts