Text Parser and Word Frequency using R
I spent a little time trying to find a simple parser for text files, and this certainly works, and is straightforward:
Example Code
Example Results
Example Code
# Simple word frequency
#
# Read text
wordCount.text <- scan(file = choose.files(), what = 'char', sep = '\n')
# Removes whitepace
wordCount.text <- paste(wordCount.text, collapse = " ")
# Convert to lower
wordCount.text <- tolower(wordCount.text)
# Split on white space
wordCount.words.list <- strsplit(wordCount.text, '\\W+', perl = TRUE)
# convert to vector
wordCount.words.vector <- unlist(wordCount.words.list)
# table method builds word count list
wordCount.freq.list <- table(wordCount.words.vector)
# Sorts word count list
wordCount.sorted.freq.list <- sort(wordCount.freq.list, decreasing = TRUE)
# Concantenates the word and count before export
wordCount.sorted.table <- paste(names(wordCount.sorted.freq.list), wordCount.sorted.freq.list, sep = '\t')
# Output to file
cat('Word\tFREQ', wordCount.sorted.table, file = choose.files(), sep = '\n')
Example Results
Word FREQ
and 53
to 41
the 33
in 22
business 18
james 16
of 16
as 15
for 14
with 14
on 13
team 11
employee 10
(..truncated for brevity)
Comments
Post a Comment