process a collection of email messages and create an R data frame of “derived” v
process a collection of email messages and create an R data frame of “derived” variables that
give various measures of the email messages, e.g. the number of recipients to whom the mail was sent, the
percentage of capital words in the body of the text, is the message a reply to another message. See below
for a list of all the variables and also consider other variables you think might help help classify a message
as SPAM versus HAM. The messages are in 5 different directories/folders. The name of the directory indicates whether the messages
it contains are HAM or SPAM. There are 6,541 messages in total. This is a large amount of data.
Leave a Reply