Iceberg 简单总结

branch 和 tag,schema、partition、sort order演化,快照的维护(合并,删除孤儿文件等),manifest list 和 manifest files 两级布局;Hive的问题以及iceberg的优化:list是O(1)的,细粒度的partition,OCC,并发冲突,单节点的plan;Hidden partitioning,Time travelVersion rollback。支持:Spark、Flink、Hive、Trino、ClickHouse、Presto、Dremio、starrocks、Athena、EMR、Impala、Dori

阅读全文

Analyzing and Comparing Lakehouse Storage Systems

讨论了 LakeHouse 系统设计的难点,在不可变高延迟的对象存储之上,增加事务特性,三大系统都使用了OCC做隔离,事务实现都用了MVCC,源数据库管理delta和hudi用了表格式,iceberg用了层次存储(单节点处理),数据更新三者都支持CoW(适合读多写少场景),hudi和iceberg支持MoR(适合写多的场景)

阅读全文

Doris Advanced

Pipeline Execution Engine, Nereids-the Brand New Planner, High-Concurrency Point Query, Materialized View, Statistics, Join Optimization. Multi-catalog, Spark Doris Connector, Other Connector, Plugin Development Manual, CloudCanal Data Import, DBT Doris Adapter, UDF, cluster management, Data Admin, Other Manager, Maintenance and Monitor, Metadata Operations and Maintenance

阅读全文

Doris Basic

Introduce Doris,include: Data Model(Aggregate Model,Unique Model,Duplicate Model), Data Partition(Rollup),Index(Inverted Index,BloomFilter Index,NGram BloomFilter Index,Bitmap Index). Import Scenes,Import Way(Broker Load,Routine Load,Spark Load,Stream Load,MySql Load,S3 Load,Insert Into,Importing Data in JSON Format,Min Load Replica Num),Export,Update and Delete

阅读全文