May 26, 2012

Data mining, text mining, topic modeling

From today's NY Times - http://opinionator.blogs.nytimes.com/2011/05/29/of-monsters-men-and-topic-modeling/

"Topic modeling is a probabilistic, statistical method that can uncover themes and categories in amounts of text so large that they cannot be read by any individual human being."  Applying topic modeling techniques to newspaper articles published from 1861 through 1864 in the Richmond Daily Dispatch, a study undertaken by the DIgital Scholarship Lab at the University of Richmond shows a close association between anti-northern diatribes and patriotism and poetry articles.

"We see the steady rise of patriotic poetry and vitriolic attacks during the secession crisis and the early months of the war. Later we see the sharp jump following the implementation of the draft in April 1862. And we see the last gasp of Confederate patriotism and nationalism at the very end of the war, as Southerners made one final attempt to rally the troops and salvage their cause."