A simple example for the usage of the tm package¶
The following makes use of the tm package. You may need to install it from
CRAN using the code
install.packages("tm")
if you want to run this on your computer. (The
package is already installed on the notebook container, however.)
# Activating the 'tm' package
library(tm)
Lade nötiges Paket: NLP
# We activate the 'acq' data, a corpus of 50 example news articles
data(acq)
acq
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 0
Content: documents: 50
# We take a look at the first element in the corpus, a text document:
class(acq[[1]])
[1] "PlainTextDocument" "TextDocument"
acq[[1]]
<<PlainTextDocument>>
Metadata: 15
Content: chars: 1287
inspect(acq[[1]])
<<PlainTextDocument>>
Metadata: 15
Content: chars: 1287
Computer Terminal Systems Inc said
it has completed the sale of 200,000 shares of its common
stock, and warrants to acquire an additional one mln shares, to
<Sedio N.V.> of Lugano, Switzerland for 50,000 dlrs.
The company said the warrants are exercisable for five
years at a purchase price of .125 dlrs per share.
Computer Terminal said Sedio also has the right to buy
additional shares and increase its total holdings up to 40 pct
of the Computer Terminal's outstanding common stock under
certain circumstances involving change of control at the
company.
The company said if the conditions occur the warrants would
be exercisable at a price equal to 75 pct of its common stock's
market price at the time, not to exceed 1.50 dlrs per share.
Computer Terminal also said it sold the technolgy rights to
its Dot Matrix impact technology, including any future
improvements, to <Woodco Inc> of Houston, Tex. for 200,000
dlrs. But, it said it would continue to be the exclusive
worldwide licensee of the technology for Woodco.
The company said the moves were part of its reorganization
plan and would help pay current operation costs and ensure
product delivery.
Computer Terminal makes computer generated labels, forms,
tags and ticket printers and terminals.
Reuter
# We take a look at the document metadata
meta(acq[[1]])
author : character(0)
datetimestamp: 1987-02-26 15:18:06
description :
heading : COMPUTER TERMINAL SYSTEMS <CPML> COMPLETES SALE
id : 10
language : en
origin : Reuters-21578 XML
topics : YES
lewissplit : TRAIN
cgisplit : TRAINING-SET
oldid : 5553
places : usa
people : character(0)
orgs : character(0)
exchanges : character(0)
DublinCore(acq[[1]])
contributor: character(0)
coverage : character(0)
creator : character(0)
date : 1987-02-26 15:18:06
description:
format : character(0)
identifier : 10
language : en
publisher : character(0)
relation : character(0)
rights : character(0)
source : character(0)
subject : character(0)
title : COMPUTER TERMINAL SYSTEMS <CPML> COMPLETES SALE
type : character(0)
- R file: tm-simple.R
- Rmarkdown file: tm-simple.Rmd
- Jupyter notebook file: tm-simple.ipynb
- Interactive version of the Jupyter notebook (shuts down after 60s):
- Interactive version of the Jupyter notebook (sign in required):