数据摄取：工具选择策略

《Analyzing and Comparing Lakehouse Storage Systems》讨论了 LakeHouse 系统设计的难点，在不可变高延迟的对象存储之上，增加事务特性，三大系统都使用了OCC做隔离，事务实现都用了MVCC，源数据库管理delta和hudi用了表格式，iceberg用了层次存储(单节点处理)，数据更新三者都支持CoW(适合读多写少场景)，hudi和iceberg支持MoR(适合写多的场景)

阅读全文

系统调优

2024年1月7日

| 系统

一些系统调优的排查工具汇总

阅读全文

Doris Advanced

2024年1月5日

| 大数据

Pipeline Execution Engine, Nereids-the Brand New Planner, High-Concurrency Point Query, Materialized View, Statistics, Join Optimization. Multi-catalog, Spark Doris Connector, Other Connector, Plugin Development Manual, CloudCanal Data Import, DBT Doris Adapter, UDF, cluster management, Data Admin, Other Manager, Maintenance and Monitor, Metadata Operations and Maintenance

阅读全文

Doris Basic

2024年1月5日

| 大数据

Introduce Doris,include: Data Model(Aggregate Model,Unique Model,Duplicate Model), Data Partition(Rollup),Index(Inverted Index,BloomFilter Index,NGram BloomFilter Index,Bitmap Index). Import Scenes,Import Way(Broker Load,Routine Load,Spark Load,Stream Load,MySql Load,S3 Load,Insert Into,Importing Data in JSON Format,Min Load Replica Num),Export,Update and Delete

阅读全文

难忘的时刻(2023年)

2023年12月31日

| 文学和艺术

2023年的最后一天，记录一下那些难忘的时刻

阅读全文

记录每个瞬间

数据摄取：工具选择策略

YARN 简单总结

数据开发工程师的工作内容（meta）

数据摄取：架构和模式

关于 HMS 的原理和一些优化

分析和比较几种 LakeHouse 存储系统

系统调优

Doris Advanced

Doris Basic

难忘的时刻(2023年)

最近文章

分类

归档

标签

RSS