Improve Parquet documentation for AvroCompat, @doc annotation by clairemcginty · Pull Request #885 · spotify/magnolify

clairemcginty · 2024-01-11T19:37:52Z

No description provided.

clairemcginty · 2024-01-11T19:38:32Z

site/src/main/paradox/parquet.md

+
+However, the parquet-avro API encodes array types differently: as a nested array inside a required group.
+
+```scala mdoc


after running sbt site/mdoc this evaluates to:

import org.apache.avro.Schema val avroSchema = new Schema.Parser().parse("{\"type\":\"record\",\"name\":\"MyRecord\",\"fields\":[{\"name\": \"listField\", \"type\": {\"type\": \"array\", \"items\": \"string\"}}]}") // avroSchema: Schema = {"type":"record","name":"MyRecord","fields":[{"name":"listField","type":{"type":"array","items":"string"}}]} import org.apache.parquet.avro.AvroSchemaConverter new AvroSchemaConverter().convert(avroSchema) // res4: org.apache.parquet.schema.MessageType = message MyRecord { // required group listField (LIST) { // repeated binary array (STRING); // } // } //

clairemcginty · 2024-01-11T19:39:02Z

site/src/main/paradox/parquet.md

+writer.write(MyRecord(List(1,2,3)))
+writer.close()
+
+ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration())).getFileMetaData


after running sbt site/mdoc this block evaluates to:

import magnolify.parquet._ import magnolify.parquet.ParquetArray.AvroCompat._ import magnolify.shared._ @doc("Top level annotation") case class MyRecord(@doc("field annotation") listField: List[Int]) val writer = ParquetType[MyRecord] .writeBuilder(HadoopOutputFile.fromPath(path, new Configuration())) .build() // writer: org.apache.parquet.hadoop.ParquetWriter[MyRecord] = org.apache.parquet.hadoop.ParquetWriter@432302e5 writer.write(MyRecord(List(1,2,3))) writer.close() ParquetFileReader.open(HadoopInputFile.fromPath(path, new Configuration())).getFileMetaData // res12: org.apache.parquet.hadoop.metadata.FileMetaData = FileMetaData{schema: message repl.MdocSession.MdocApp9.MyRecord { // required group listField (LIST) { // repeated int32 array (INTEGER(32,true)); // } // } // , metadata: {writer.model.name=magnolify, parquet.avro.schema={"type":"record","name":"MyRecord","namespace":"repl.MdocSession.MdocApp9","doc":"Top level annotation","fields":[{"name":"listField","type":{"type":"array","items":"int"},"doc":"field annotation"}]}}}

codecov · 2024-01-12T13:38:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e57d06d) 95.17% compared to head (ac2f81c) 95.17%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #885   +/-   ##
=======================================
  Coverage   95.17%   95.17%           
=======================================
  Files          51       51           
  Lines        1825     1825           
  Branches      157      157           
=======================================
  Hits         1737     1737           
  Misses         88       88

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

clairemcginty requested review from RustedBones and shnapz January 11, 2024 19:37

clairemcginty commented Jan 11, 2024

View reviewed changes

clairemcginty added 2 commits January 12, 2024 08:24

Improve Parquet documentation for AvroCompat, @doc annotation

de3f4f5

clear up wording

170abc2

clairemcginty force-pushed the avro-compat-doc branch from 5fd732c to 170abc2 Compare January 12, 2024 13:31

clear up wording

4915048

organize page

ac2f81c

RustedBones approved these changes Jan 12, 2024

View reviewed changes

clairemcginty merged commit 8153389 into main Jan 12, 2024

clairemcginty deleted the avro-compat-doc branch January 12, 2024 14:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Parquet documentation for AvroCompat, @doc annotation#885

Improve Parquet documentation for AvroCompat, @doc annotation#885
clairemcginty merged 4 commits intomainfrom
avro-compat-doc

clairemcginty commented Jan 11, 2024

Uh oh!

clairemcginty Jan 11, 2024

Uh oh!

clairemcginty Jan 11, 2024

Uh oh!

codecov bot commented Jan 12, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		However, the parquet-avro API encodes array types differently: as a nested array inside a required group.

		```scala mdoc

Conversation

clairemcginty commented Jan 11, 2024

Uh oh!

clairemcginty Jan 11, 2024

Choose a reason for hiding this comment

Uh oh!

clairemcginty Jan 11, 2024

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jan 12, 2024 •

edited

Loading