wordpiece: R Implementation of Wordpiece Tokenization

Apply 'Wordpiece' (<arXiv:1609.08144>) tokenization to input text, given an appropriate vocabulary. The 'BERT' (<arXiv:1810.04805>) tokenization conventions are used by default.

Version: 2.0.1
Depends: R (≥ 3.3.0)
Imports: dlr (≥ 1.0.0), piecemaker (≥ 1.0.0), purrr (≥ 0.2.3), rlang, stringi (≥ 1.0), wordpiece.data (≥ 1.0.2)
Suggests: covr, knitr, rmarkdown, testthat (≥ 3.0.0)
Published: 2021-10-18
Author: Jonathan Bratt ORCID iD [aut, cre], Jon Harmon ORCID iD [aut], Bedford Freeman & Worth Pub Grp LLC DBA Macmillan Learning [cph]
Maintainer: Jonathan Bratt <jonathan.bratt at macmillan.com>
BugReports: https://github.com/macmillancontentscience/wordpiece/issues
License: Apache License (≥ 2)
URL: https://github.com/macmillancontentscience/wordpiece
NeedsCompilation: no
Materials: README NEWS
CRAN checks: wordpiece results

Documentation:

Reference manual: wordpiece.pdf
Vignettes: Using wordpiece

Downloads:

Package source: wordpiece_2.0.1.tar.gz
Windows binaries: r-devel: wordpiece_2.0.1.zip, r-release: wordpiece_1.0.2.zip, r-oldrel: wordpiece_2.0.1.zip
macOS binaries: r-release (arm64): wordpiece_1.0.2.tgz, r-release (x86_64): wordpiece_2.0.1.tgz, r-oldrel: wordpiece_2.0.1.tgz
Old sources: wordpiece archive

Linking:

Please use the canonical form https://CRAN.R-project.org/package=wordpiece to link to this page.