Deltalake CDF & CDC
目录格式

CDC提交信息

注:以下元数据信息不是来自同一delta table,此处只是为了说明元数据包含的内容
Create
1 | {"commitInfo":{"timestamp":1666008673760,"operation":"WRITE","operationParameters":{"mode":"ErrorIfExists","partitionBy":"[\"part\"]"},"isolationLevel":"Serializable","isBlindAppend":true,"operationMetrics":{"numFiles":"4","numOutputRows":"6","numOutputBytes":"2906"},"engineInfo":"Apache-Spark/3.3.0 Delta-Lake/2.2.0-SNAPSHOT","txnId":"c7bb9dc7-a4fd-4cab-a133-7a2981efc52e"}} |
Insert
1 | {"add":{"path":"part-00000-488f0dec-025d-4f93-8ecd-b476c1fc491d-c000.snappy.parquet","partitionValues":{},"size":500,"modificationTime":1665587891400,"dataChange":true,"stats":"{\"numRecords\":5,\"minValues\":{\"id\":0},\"maxValues\":{\"id\":4},\"nullCount\":{\"id\":0}}"}} |
Delete
1 | {"commitInfo":{"timestamp":1666008690517,"operation":"DELETE","operationParameters":{"predicate":"[\"(spark_catalog.delta.`C:\\\\Users\\\\Asura7969\\\\AppData\\\\Local\\\\Temp\\\\spark-5b6ca97d-30c8-4fa2-93da-2ff8e0872514`.part = 0L)\"]"},"readVersion":1,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numRemovedFiles":"2","numAddedChangeFiles":"0","executionTimeMs":"592","scanTimeMs":"591","rewriteTimeMs":"0"},"engineInfo":"Apache-Spark/3.3.0 Delta-Lake/2.2.0-SNAPSHOT","txnId":"4784af1f-6d48-4fd4-8375-744414ddc76e"}} |
Merge
1 | {"commitInfo":{"timestamp":1666008686732,"operation":"MERGE","operationParameters":{"predicate":"(s.id = t.id)","matchedPredicates":"[{\"predicate\":\"(s.id = 1L)\",\"actionType\":\"update\"},{\"predicate\":\"(s.id = 3L)\",\"actionType\":\"delete\"}]","notMatchedPredicates":"[]"},"readVersion":0,"isolationLevel":"Serializable","isBlindAppend":false,"operationMetrics":{"numTargetRowsCopied":"3","numTargetRowsDeleted":"1","numTargetFilesAdded":"2","executionTimeMs":"3631","numTargetRowsInserted":"0","scanTimeMs":"2294","numTargetRowsUpdated":"1","numOutputRows":"4","numTargetChangeFilesAdded":"1","numSourceRows":"4","numTargetFilesRemoved":"3","rewriteTimeMs":"1329"},"engineInfo":"Apache-Spark/3.3.0 Delta-Lake/2.2.0-SNAPSHOT","txnId":"cca2b832-199b-45bb-9606-9b83ab986c64"}} |
Cdc
1 | {"commitInfo":{"timestamp":1665587909035,"operation":"Manual Update","operationParameters":{},"readVersion":1,"isolationLevel":"SnapshotIsolation","isBlindAppend":false,"operationMetrics":{},"engineInfo":"Apache-Spark/3.3.0 Delta-Lake/2.2.0-SNAPSHOT","txnId":"894b2f5f-b8bf-4b0b-89d6-84b9d9401409"}} |
cdc type:
- insert
- update_preimage
- update_postimage
| id | _change_type |
|---|---|
| 20 | insert |
| 21 | insert |
| 22 | insert |
| 23 | insert |
| 24 | insert |
| 26 | update_preimage |
| 27 | update_postimage |
此表只有id一个字段, _change_type 为 delta内置字段
参考
本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 Asura7969 Blog!




