分类 大数据 中的文章

Apache Iceberg中的压缩

压缩,可以将多个小文件合并为大文件提高读性能,几种压缩策略:binpack(简单合并)、sort、z-order(适合多列查询),Expire Snapshots 可以删除过期的数据文件,还提供了参数可以自动删除manifest 文件、保留多少manifest文件,以及清除orphan 文件
+4

阅读全文

Apache Iceberg Tables 的读写和查询管理

介绍了存储的结构,元数据层包括:manifest files、manifest list, metadata files,catalog指向最新的 metadata files;每一层都可以做裁减,包括数据层,介绍了读取、time travel过程,是自上往下的读取和裁减过程;写入过程:插入、删除、merge过程,写 过程是自下而上的,通过 切换catalog指向,利用OCC控制并发,实现ACID
+4

阅读全文

Iceberg 简单总结

branch 和 tag,schema、partition、sort order演化,快照的维护(合并,删除孤儿文件等),manifest list 和 manifest files 两级布局;Hive的问题以及iceberg的优化:list是O(1)的,细粒度的partition,OCC,并发冲突,单节点的plan;Hidden partitioning,Time travelVersion rollback。支持:Spark、Flink、Hive、Trino、ClickHouse、Presto、Dremio、starrocks、Athena、EMR、Impala、Dori

阅读全文

分析和比较几种 LakeHouse 存储系统

《Analyzing and Comparing Lakehouse Storage Systems》讨论了 LakeHouse 系统设计的难点,在不可变高延迟的对象存储之上,增加事务特性,三大系统都使用了OCC做隔离,事务实现都用了MVCC,源数据库管理delta和hudi用了表格式,iceberg用了层次存储(单节点处理),数据更新三者都支持CoW(适合读多写少场景),hudi和iceberg支持MoR(适合写多的场景)
https://woquhaha.gitee.io/pic_tech_1/post/2021/12/%E5%88%86%E5%B8%83%E5%BC%8F%E6%95%B0%E6%8D%AE%E5%BA%93%E8%AF%BE%E7%A8%8B%E4%B8%AD%E7%9A%84%E8%AE%BA%E6%96%87/a-1.jpg 1 2 3 4

阅读全文

Doris Advanced

Pipeline Execution Engine, Nereids-the Brand New Planner, High-Concurrency Point Query, Materialized View, Statistics, Join Optimization. Multi-catalog, Spark Doris Connector, Other Connector, Plugin Development Manual, CloudCanal Data Import, DBT Doris Adapter, UDF, cluster management, Data Admin, Other Manager, Maintenance and Monitor, Metadata Operations and Maintenance
https://cdnd.selectdb.com/assets/images/nereids-tpch-d31958316d7c0404806812d5b41f0286.png https://cdnd.selectdb.com/assets/images/bucket_shuffle_join-86cfc2fda814d3b3502d9afa5812e17b.png https://camo.githubusercontent.com/2d080a0693d5d0a1f30ac1e5acef603bef8c59d46cb13124c8534a60db999e55/687474703a2f2f7374617469632e7a7962756c756f2e636f6d2f6b616e676b616973656e2f786474707865636d6b70716d776c643237756537377879692f254535254231253846254535254239253935254535254246254142254537253835254137253230323031382d31302d3137253230254534254238253842254535253844253838342e31362e30342e706e67 https://camo.githubusercontent.com/24c4bd33f09e35b51a5847d721913eeb1f49fe29c837020e6baa4033b6c3f160/687474703a2f2f7374617469632e7a7962756c756f2e636f6d2f6b616e676b616973656e2f3570696838747268366c6966387971626734736431306a732f254535254231253846254535254239253935254535254246254142254537253835254137253230323031382d31302d3137253230254534254238253842254535253844253838342e33322e35352e706e67 https://camo.githubusercontent.com/fa80b3f9dc7e470db089daa64aa9b5aaf24414eba061bd308037e5983e0ed0f2/687474703a2f2f7374617469632e7a7962756c756f2e636f6d2f6b616e676b616973656e2f676970366b626272646564793376726871777274643133752f254535254231253846254535254239253935254535254246254142254537253835254137253230323031382d31302d3137253230254534254238253842254535253844253838342e34322e31372e706e67 https://cdnd.selectdb.com/assets/images/image-20220523152004731-75f68720dea2695d7041d74458cd9971.png https://cdnd.selectdb.com/assets/images/image-20220523151902368-0d9eb00032651d13327e8f892c900d5d.png+10

阅读全文

最近文章

分类

归档

标签

RSS