Thursday, April 24, 2008

ISO: large XML collections

It is interesting that while oodles of XML get processed every second around the world, there are relatively few sources of XML collections that can be downloaded and used for software testing purposes. The standard collection I used for many years, DBLP is now on the order of hundreds of megabytes in a single file - which is good, but smaller "large" files are also desirable.

Wikimedia has their content in XML format up.

The search is on for more readily available data...

