A simple example for the usage of the tm package¶
The following makes use of the tm package. You may need to install it from CRAN using the code
install.packages("tm") if you want to run this on your computer. (The package is already installed on the notebook container, however.)
# Activating the 'tm' package library(tm)
Lade nötiges Paket: NLP
# We activate the 'acq' data, a corpus of 50 example news articles data(acq) acq
<<VCorpus>> Metadata: corpus specific: 0, document level (indexed): 0 Content: documents: 50
# We take a look at the first element in the corpus, a text document: class(acq[])
 "PlainTextDocument" "TextDocument"
<<PlainTextDocument>> Metadata: 15 Content: chars: 1287
<<PlainTextDocument>> Metadata: 15 Content: chars: 1287 Computer Terminal Systems Inc said it has completed the sale of 200,000 shares of its common stock, and warrants to acquire an additional one mln shares, to <Sedio N.V.> of Lugano, Switzerland for 50,000 dlrs. The company said the warrants are exercisable for five years at a purchase price of .125 dlrs per share. Computer Terminal said Sedio also has the right to buy additional shares and increase its total holdings up to 40 pct of the Computer Terminal's outstanding common stock under certain circumstances involving change of control at the company. The company said if the conditions occur the warrants would be exercisable at a price equal to 75 pct of its common stock's market price at the time, not to exceed 1.50 dlrs per share. Computer Terminal also said it sold the technolgy rights to its Dot Matrix impact technology, including any future improvements, to <Woodco Inc> of Houston, Tex. for 200,000 dlrs. But, it said it would continue to be the exclusive worldwide licensee of the technology for Woodco. The company said the moves were part of its reorganization plan and would help pay current operation costs and ensure product delivery. Computer Terminal makes computer generated labels, forms, tags and ticket printers and terminals. Reuter
# We take a look at the document metadata meta(acq[])
author : character(0) datetimestamp: 1987-02-26 15:18:06 description : heading : COMPUTER TERMINAL SYSTEMS <CPML> COMPLETES SALE id : 10 language : en origin : Reuters-21578 XML topics : YES lewissplit : TRAIN cgisplit : TRAINING-SET oldid : 5553 places : usa people : character(0) orgs : character(0) exchanges : character(0)
contributor: character(0) coverage : character(0) creator : character(0) date : 1987-02-26 15:18:06 description: format : character(0) identifier : 10 language : en publisher : character(0) relation : character(0) rights : character(0) source : character(0) subject : character(0) title : COMPUTER TERMINAL SYSTEMS <CPML> COMPLETES SALE type : character(0)
Downloadable R script and interactive version
The link with the “jupyterhub” icon directs you to an interactive Jupyter1 notebook, which runs inside a Docker container2. There are two variants of the interative notebook. One shuts down after 60 seconds and does not require a sign it. The other requires sign in using your ORCID3 credentials, yet shuts down only after 24 hours. (There is no guarantee that such a container persists that long, it may be shut down earlier for maintenance purposes.) After shutdown all data within the container will be reset, i.e. all files created by the user will be deleted.4
Above you see a rendered version of the Jupyter notebook.5
ORCID is a free service for the authentication of researchers. It also allows to showcase publications and contributions to the academic community such as peer review.. See
https://info.orcid.org/what-is-orcid/for more information.
The Jupyter notebooks come with NO WARRANTY whatsoever. They are provided for educational and illustrative purposes only. Do not use them for production work.
The notebook is rendered with the help of the nbsphinx extension.