In my previous blog about splitting big files with Apache Camel, I said we were working on another solution, which is a new camel-stax component. The work is now complete, and the component will be part of the next Apache Camel 2.9.0 release. Thanks to Romain for his contribution.
The stax component
The stax component allows you to split big XML files as well, but it requires using JAXB and StAX. This means you need to define a POJO class(es) with JAXB annotations, to bind to the XML schema.
However the benefit is that you then work with the POJO classes in Camel.
For example the records example from the previous blog could be written with camel-stax as follows in the Java DSL
Where Record is the POJO class which has the JAXB annotations. And stax is a static import from the class org.apache.camel.component.stax.StAXBuilder.
If you are using XML DSL, then consult the camel-stax documentation which has such an example.
A little test
So I run the equivalent test from the previous blog as well with 40.000 elements, and the memory usage delta was about 8mb.
The test logs the time: Processed file with 40000 elements in: 55.962 seconds.
Running the test with 200.000 elements results in 9mb memory usage delta, and test time of: 250 seconds.
The unit test is in camel-stax component as the org.apache.camel.component.stax.StAXXPathSplitChoicePerformanceTest class.