CRAN Package Check Results for Maintainer ‘Toby Hocking <toby.hocking at r-project.org>’

Last updated on 2019-11-21 23:49:20 CET.

Package ERROR NOTE OK
animint2 5 8
directlabels 8 5
namedCapture 13
nc 1 12
PeakError 13
PeakSegDisk 13
PeakSegDP 13
PeakSegJoint 13
PeakSegOptimal 13
penaltyLearning 13
WeightedROC 13

Package animint2

Current CRAN status: NOTE: 5, OK: 8

Version: 2019.7.3
Check: dependencies in R code
Result: NOTE
    Namespaces in Imports field not imported from:
     ‘lazyeval’ ‘tibble’
     All declared Imports should be used.
Flavors: r-devel-linux-x86_64-fedora-clang, r-devel-linux-x86_64-fedora-gcc, r-patched-solaris-x86, r-release-osx-x86_64, r-oldrel-osx-x86_64

Version: 2019.7.3
Check: installed package size
Result: NOTE
     installed size is 5.3Mb
     sub-directories of 1Mb or more:
     R 1.0Mb
     data 2.9Mb
Flavor: r-patched-solaris-x86

Package directlabels

Current CRAN status: NOTE: 8, OK: 5

Version: 2018.05.22
Check: package dependencies
Result: NOTE
    Package suggested but not available for checking: 'inlinedocs'
Flavors: r-devel-linux-x86_64-debian-clang, r-devel-linux-x86_64-debian-gcc, r-devel-linux-x86_64-fedora-clang, r-devel-linux-x86_64-fedora-gcc, r-devel-windows-ix86+x86_64-gcc8, r-patched-linux-x86_64, r-patched-solaris-x86, r-release-linux-x86_64

Package namedCapture

Current CRAN status: OK: 13

Package nc

Current CRAN status: ERROR: 1, OK: 12

Version: 2019.10.19
Check: examples
Result: ERROR
    Running examples in ‘nc-Ex.R’ failed
    The error most likely occurred in:
    
    > ### Name: capture_all_str
    > ### Title: Capture all matches in a single subject string
    > ### Aliases: capture_all_str
    >
    > ### ** Examples
    >
    >
    > chr.pos.vec <- c(
    + "chr10:213,054,000-213,055,000",
    + "chrM:111,000-222,000",
    + "this will not match",
    + NA, # neither will this.
    + "chr1:110-111 chr2:220-222") # two possible matches.
    > keep.digits <- function(x)as.integer(gsub("[^0-9]", "", x))
    > ## By default elements of subject are treated as separate lines (and
    > ## NAs are removed). Named arguments are used to create capture
    > ## groups, and conversion functions such as keep.digits are used to
    > ## convert the previously named group.
    > int.pattern <- list("[0-9,]+", keep.digits)
    > (match.dt <- nc::capture_all_str(
    + chr.pos.vec,
    + chrom="chr.*?",
    + ":",
    + chromStart=int.pattern,
    + "-",
    + chromEnd=int.pattern))
     chrom chromStart chromEnd
    1: chr10 213054000 213055000
    2: chrM 111000 222000
    3: chr1 110 111
    4: chr2 220 222
    > str(match.dt)
    Classes ‘data.table’ and 'data.frame': 4 obs. of 3 variables:
     $ chrom : chr "chr10" "chrM" "chr1" "chr2"
     $ chromStart: int 213054000 111000 110 220
     $ chromEnd : int 213055000 222000 111 222
     - attr(*, ".internal.selfref")=<externalptr>
    >
    > ## Data downloaded from
    > ## https://en.wikipedia.org/wiki/Hindu%E2%80%93Arabic_numeral_system
    > numerals <- system.file(
    + "extdata", "Hindu-Arabic-numerals.txt.gz", package="nc")
    >
    > ## Use engine="ICU" for unicode character classes
    > ## http://userguide.icu-project.org/strings/regexp e.g. match any
    > ## character with a numeric value of 2 (including japanese etc).
    > nc::capture_all_str(
    + numerals,
    + " ",
    + two="[\\p{numeric_value=2}]",
    + " ",
    + engine="ICU")
     two
     1: 2
     2: 二
     3: २
     4: ੨
     5: ༢
     6: ২
     7: ೨
     8: ୨
     9: ൨
    10: ౨
    11: ២
    12: ๒
    13: ໒
    14: ၂
    15: ᠒
    >
    > ## Create a table of numerals with script names.
    > digits.pattern <- list()
    > for(digit in 0:9){
    + digits.pattern[[length(digits.pattern)+1]] <- list(
    + "[|]",
    + nc::group(digit, "[^{|]+"),
    + "[|]")
    + }
    > nc::capture_all_str(
    + numerals,
    + "\n",
    + digits.pattern,
    + "[|]",
    + " *",
    + "\\[\\[",
    + name="[^\\]|]+")
     0 1 2 3 4 5 6 7 8 9
     1: 0 1 2 3 4 5 6 7 8 9
     2: 〇/零 一 二 三 四 五 六 七 八 九
     3: ο/ō Αʹ Βʹ Γʹ Δʹ Εʹ Ϛʹ Ζʹ Ηʹ Θʹ
     4: א ב ג ד ה ו ז ח ט
     5: ० १ २ ३ ४ ५ ६ ७ ८ ९
     6: ૦ \t૧ \t૨ \t૩ \t૪ \t૫ \t૬ \t૭ \t૮ \t૯
     7: ੦ ੧ ੨ ੩ ੪ ੫ ੬ ੭ ੮ ੯
     8: <big>༠ ༡ ༢ ༣ ༤ ༥ ༦ ༧ ༨ ༩</big>
     9: ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯
    10: ೦ ೧ ೨ ೩ ೪ ೫ ೬ ೭ ೮ ೯
    11: ୦ ୧ ୨ ୩ ୪ ୫ ୬ ୭ ୮ ୯
    12: ൦ ൧ ൨ ൩ ൪ ൫ ൬ ൭ ൮ ൯
    13: ౦ ౧ ౨ ౩ ౪ ౫ ౬ ౭ ౮ ౯
    14: ០ ១ ២ ៣ ៤ ៥ ៦ ៧ ៨ ៩
    15: ๐ ๑ ๒ ๓ ๔ ๕ ๖ ๗ ๘ ๙
    16: ໐ ໑ ໒ ໓ ໔ ໕ ໖ ໗ ໘ ໙
    17: ၀ ၁ ၂ ၃ ၄ ၅ ၆ ၇ ၈ ၉
    18: ᠐ ᠑ ᠒ ᠓ ᠔ ᠕ ᠖ ᠗ ᠘ ᠙
     name
     1: Arabic_numerals
     2: East Asia
     3: Modern Greek
     4: Hebrew
     5: Devanagari
     6: Gujarati alphabet
     7: Gurmukhī alphabet
     8: Tibetan script
     9: Bengali alphabet
    10: Kannada alphabet
    11: Odia alphabet
    12: Malayalam script
    13: Telugu script
    14: Khmer alphabet
    15: Thai alphabet
    16: Lao alphabet
    17: Burmese script
    18: Mongolian_script
    >
    > ## Extract all fields from each alignment block, using two regex
    > ## patterns, then dcast.
    > info.txt.gz <- system.file(
    + "extdata", "SweeD_Info.txt.gz", package="nc")
    > info.vec <- readLines(info.txt.gz)
    > info.vec[24:40]
     [1] " Alignment 1" ""
     [3] "\t\tChromosome:\t\tscaffold_0" "\t\tSequences:\t\t14"
     [5] "\t\tSites:\t\t\t1670366" "\t\tDiscarded sites:\t1264068"
     [7] "" "\t\tProcessing:\t\t155.53 seconds"
     [9] "" "\t\tPosition:\t\t8.936200e+07"
    [11] "\t\tLikelihood:\t\t4.105582e+02" "\t\tAlpha:\t\t\t6.616326e-06"
    [13] "" ""
    [15] " Alignment 2" ""
    [17] "\t\tChromosome:\t\tscaffold_1"
    > info.dt <- nc::capture_all_str(
    + sub("Alignment ", "//", info.vec),
    + "//",
    + alignment="[0-9]+",
    + fields="[^/]+")
    > (fields.dt <- info.dt[, nc::capture_all_str(
    + fields,
    + "\t+",
    + variable="[^:]+",
    + ":\t*",
    + value=".*"),
    + by=alignment])
     alignment variable value
     1: 1 Chromosome scaffold_0
     2: 1 Sequences 14
     3: 1 Sites 1670366
     4: 1 Discarded sites 1264068
     5: 1 Processing 155.53 seconds
     6: 1 Position 8.936200e+07
     7: 1 Likelihood 4.105582e+02
     8: 1 Alpha 6.616326e-06
     9: 2 Chromosome scaffold_1
    10: 2 Sequences 14
    11: 2 Sites 1447008
    12: 2 Discarded sites 1093595
    13: 2 Processing 138.83 seconds
    14: 2 Position 8.722482e+07
    15: 2 Likelihood 2.531514e+02
    16: 2 Alpha 1.031963e-05
    17: 3 Chromosome scaffold_2
    18: 3 Sequences 14
    19: 3 Sites 1379975
    20: 3 Discarded sites 1043204
    21: 3 Processing 134.50 seconds
    22: 3 Position 8.461182e+07
    23: 3 Likelihood 2.945708e+02
    24: 3 Alpha 8.684652e-06
    25: 4 Chromosome scaffold_3
    26: 4 Sequences 14
    27: 4 Sites 1293978
    28: 4 Discarded sites 988465
    29: 4 Processing 120.76 seconds
    30: 4 Position 4.182126e+07
    31: 4 Likelihood 6.110444e+02
    32: 4 Alpha 3.335514e-06
    33: 5 Chromosome scaffold_4
    34: 5 Sequences 14
    35: 5 Sites 1319920
    36: 5 Discarded sites 1011446
    37: 5 Processing 126.99 seconds
    38: 5 Position 6.978721e+07
    39: 5 Likelihood 2.884914e+02
    40: 5 Alpha 1.062780e-05
    41: 6 Chromosome scaffold_5
    42: 6 Sequences 14
    43: 6 Sites 1295460
    44: 6 Discarded sites 990655
    45: 6 Processing 119.64 seconds
    46: 6 Position 8.837822e+07
    47: 6 Likelihood 3.304343e+02
    48: 6 Alpha 7.572795e-06
    49: 7 Chromosome scaffold_6
    50: 7 Sequences 14
    51: 7 Sites 1197964
    52: 7 Discarded sites 908454
    53: 7 Processing 115.17 seconds
    54: 7 Position 3.444713e+07
    55: 7 Likelihood 3.261829e+02
    56: 7 Alpha 3.427719e-06
    57: 8 Chromosome scaffold_7
    58: 8 Sequences 14
    59: 8 Sites 1315248
    60: 8 Discarded sites 998530
    61: 8 Processing 125.20 seconds
    62: 8 Position 2.337819e+07
    63: 8 Likelihood 4.023517e+02
    64: 8 Alpha 5.350802e-06
    65: 9 Chromosome scaffold_8
    66: 9 Sequences 14
    67: 9 Sites 1110658
    68: 9 Discarded sites 845039
    69: 9 Processing 109.15 seconds
    70: 9 Position 8.152571e+07
    71: 9 Likelihood 3.114815e+02
    72: 9 Alpha 3.899136e-06
    73: 10 Chromosome scaffold_9
    74: 10 Sequences 14
    75: 10 Sites 1091036
    76: 10 Discarded sites 833765
    77: 10 Processing 104.91 seconds
    78: 10 Position 2.669453e+07
    79: 10 Likelihood 1.829336e+02
    80: 10 Alpha 8.380941e-06
     alignment variable value
    > (fields.wide <- data.table::dcast(fields.dt, alignment ~ variable))
     alignment Alpha Chromosome Discarded sites Likelihood Position
     1: 1 6.616326e-06 scaffold_0 1264068 4.105582e+02 8.936200e+07
     2: 10 8.380941e-06 scaffold_9 833765 1.829336e+02 2.669453e+07
     3: 2 1.031963e-05 scaffold_1 1093595 2.531514e+02 8.722482e+07
     4: 3 8.684652e-06 scaffold_2 1043204 2.945708e+02 8.461182e+07
     5: 4 3.335514e-06 scaffold_3 988465 6.110444e+02 4.182126e+07
     6: 5 1.062780e-05 scaffold_4 1011446 2.884914e+02 6.978721e+07
     7: 6 7.572795e-06 scaffold_5 990655 3.304343e+02 8.837822e+07
     8: 7 3.427719e-06 scaffold_6 908454 3.261829e+02 3.444713e+07
     9: 8 5.350802e-06 scaffold_7 998530 4.023517e+02 2.337819e+07
    10: 9 3.899136e-06 scaffold_8 845039 3.114815e+02 8.152571e+07
     Processing Sequences Sites
     1: 155.53 seconds 14 1670366
     2: 104.91 seconds 14 1091036
     3: 138.83 seconds 14 1447008
     4: 134.50 seconds 14 1379975
     5: 120.76 seconds 14 1293978
     6: 126.99 seconds 14 1319920
     7: 119.64 seconds 14 1295460
     8: 115.17 seconds 14 1197964
     9: 125.20 seconds 14 1315248
    10: 109.15 seconds 14 1110658
    >
    > ## Capture all csv tables in report -- the file name can be given as
    > ## the subject to nc::capture_all_str, which calls readLines to get
    > ## data to parse.
    > (report.txt.gz <- system.file(
    + "extdata", "SweeD_Report.txt.gz", package="nc"))
    [1] "/home/ripley/R/Lib32/nc/extdata/SweeD_Report.txt.gz"
    > (report.dt <- nc::capture_all_str(
    + report.txt.gz,
    + "//",
    + alignment="[0-9]+",
    + "\n",
    + csv="[^/]+"
    + )[, {
    + data.table::fread(text=csv)
    + }, by=alignment])
     alignment Position Likelihood Alpha
     1: 1 700.0 4.637328e-03 2.763840e+02
     2: 1 130585.6 3.781283e-01 8.490200e-04
     3: 1 260471.2 3.602315e-02 4.691340e-03
     4: 1 390356.9 7.618749e-01 5.377668e-04
     5: 1 520242.5 2.979971e-08 1.411765e-01
     ---
     9996: 10 82991564.8 8.051006e-03 1.357819e-03
     9997: 10 83074967.8 7.048433e-03 1.825764e-03
     9998: 10 83158370.8 1.012360e-07 7.999999e-03
     9999: 10 83241773.8 3.977189e-08 9.999997e-01
    10000: 10 83325174.0 3.980538e-08 1.200000e+03
    >
    > ## Join report with info fields.
    > report.dt[fields.wide, on=.(alignment)]
     alignment Position Likelihood Alpha i.Alpha Chromosome
     1: 1 700.0 4.637328e-03 2.763840e+02 6.616326e-06 scaffold_0
     2: 1 130585.6 3.781283e-01 8.490200e-04 6.616326e-06 scaffold_0
     3: 1 260471.2 3.602315e-02 4.691340e-03 6.616326e-06 scaffold_0
     4: 1 390356.9 7.618749e-01 5.377668e-04 6.616326e-06 scaffold_0
     5: 1 520242.5 2.979971e-08 1.411765e-01 6.616326e-06 scaffold_0
     ---
     9996: 9 85297670.3 1.078915e-01 1.730811e-02 3.899136e-06 scaffold_8
     9997: 9 85383396.6 2.282976e-02 2.002634e-02 3.899136e-06 scaffold_8
     9998: 9 85469122.8 1.573487e+00 1.169200e-03 3.899136e-06 scaffold_8
     9999: 9 85554849.1 6.892966e-02 5.344763e-03 3.899136e-06 scaffold_8
    10000: 9 85640578.0 0.000000e+00 1.200000e+03 3.899136e-06 scaffold_8
     Discarded sites i.Likelihood i.Position Processing Sequences
     1: 1264068 4.105582e+02 8.936200e+07 155.53 seconds 14
     2: 1264068 4.105582e+02 8.936200e+07 155.53 seconds 14
     3: 1264068 4.105582e+02 8.936200e+07 155.53 seconds 14
     4: 1264068 4.105582e+02 8.936200e+07 155.53 seconds 14
     5: 1264068 4.105582e+02 8.936200e+07 155.53 seconds 14
     ---
     9996: 845039 3.114815e+02 8.152571e+07 109.15 seconds 14
     9997: 845039 3.114815e+02 8.152571e+07 109.15 seconds 14
     9998: 845039 3.114815e+02 8.152571e+07 109.15 seconds 14
     9999: 845039 3.114815e+02 8.152571e+07 109.15 seconds 14
    10000: 845039 3.114815e+02 8.152571e+07 109.15 seconds 14
     Sites
     1: 1670366
     2: 1670366
     3: 1670366
     4: 1670366
     5: 1670366
     ---
     9996: 1110658
     9997: 1110658
     9998: 1110658
     9999: 1110658
    10000: 1110658
    >
    > ## parsing nbib citation file.
    > (pmc.nbib <- system.file(
    + "extdata", "PMC3045577.nbib", package="nc"))
    [1] "/home/ripley/R/Lib32/nc/extdata/PMC3045577.nbib"
    > blank <- "\n "
    > pmc.dt <- nc::capture_all_str(
    + pmc.nbib,
    + Abbreviation="[A-Z]+",
    + " *- ",
    + value=list(
    + ".*",
    + list(blank, ".*"), "*"),
    + function(x)sub(blank, "", x))
    > str(pmc.dt)
    Classes ‘data.table’ and 'data.frame': 50 obs. of 2 variables:
     $ Abbreviation: chr "PMID" "OWN" "STAT" "DCOM" ...
     $ value : chr "21113027" "NLM" "MEDLINE" "20110512" ...
     - attr(*, ".internal.selfref")=<externalptr>
    >
    > ## What do the variable fields mean? It is explained on
    > ## https://www.nlm.nih.gov/bsd/mms/medlineelements.html which has a
    > ## local copy in this package (downloaded 18 Sep 2019).
    > fields.html <- system.file(
    + "extdata", "MEDLINE_Fields.html", package="nc")
    > if(interactive())browseURL(fields.html)
    > fields.vec <- readLines(fields.html)
    >
    > ## It is pretty easy to capture fields and abbreviations if gsub
    > ## used to remove some tags first.
    > no.strong <- gsub("</?strong>", "", fields.vec)
    > no.comments <- gsub("<!--.*?-->", "", no.strong)
    > ## grep then capture_first_vec can be used if each desired row in
    > ## the output comes from a single line of the input file.
    > (h3.vec <- grep("<h3", no.comments, value=TRUE))
     [1] "<h3><a id=\"ab\" name=\"ab\"></a>Abstract (AB)</h3>"
     [2] "<h3><a id=\"ci\" name=\"ci\"></a>Copyright Information (CI)</h3>"
     [3] "<h3><a id=\"ad\" name=\"ad\"></a>Affiliation (AD)</h3>"
     [4] "<h3><a id=\"irad\" name=\"irad\"></a>Investigator Affiliation (IRAD)</h3>"
     [5] "<h3><a id=\"aid\" name=\"aid\"></a>Article Identifier (AID)</h3>"
     [6] "<h3><a id=\"au\" name=\"au\"></a>Author (AU)</h3>"
     [7] "<h3><a id=\"auid\" name=\"auid\"></a>Author Identifier (AUID)</h3>"
     [8] "<h3><a id=\"fau\" name=\"fau\"></a>Full Author (FAU)</h3>"
     [9] "<h3><a id=\"cc2\" name=\"bti\"></a>Book Title (BTI)</h3>"
    [10] "<h3><a id=\"cc4\" name=\"cti\"></a>Collection Title (CTI)</h3>"
    [11] "<h3><a id=\"cc\" name=\"cc\"></a>Comments/Corrections (See fields and field tags listed below.)</h3>"
    [12] "<h3><a id=\"coi\" name=\"coi\"></a>Conflict of Interest Statement (COIS)</h3>"
    [13] "<h3><a id=\"cn\" name=\"cn\"></a>Corporate Author (CN)</h3>"
    [14] "<h3><a id=\"dcom2\" name=\"crdt\"></a>Create Date (CRDT)</h3>"
    [15] "<h3><a id=\"dcom\" name=\"dcom\"></a>Date Completed (DCOM)</h3>"
    [16] "<h3><a id=\"da\" name=\"da\"></a>Date Created (DA)</h3>"
    [17] "<h3><a id=\"lr\" name=\"lr\"></a>Date Last Revised (LR)</h3>"
    [18] "<h3><a id=\"dep\" name=\"dep\"></a>Date of Electronic Publication (DEP)</h3>"
    [19] "<h3><a id=\"dp\" name=\"dp\"></a>Date of Publication (DP)</h3>"
    [20] "<h3><a id=\"edat2\" name=\"ed\"></a>Editor (ED) and Full Editor Name (FED)</h3>"
    [21] "<h3><a id=\"edat3\" name=\"en\"></a>Edition (EN)</h3>"
    [22] "<h3><a id=\"edat\" name=\"edat\"></a>Entrez Date (EDAT)</h3>"
    [23] "<h3><a id=\"gs\" name=\"gs\"></a>Gene Symbol (GS): not currently input</h3>"
    [24] "<h3><a id=\"gn\" name=\"gn\"></a>General Note (GN)</h3>"
    [25] "<h3><a id=\"gr\" name=\"gr\"></a>Grant Number (GR)</h3>"
    [26] "<h3><a id=\"ir\" name=\"ir\"></a>Investigator Name (IR) and Full Investigator Name (FIR)</h3>"
    [27] "<h3><a id=\"is2\" name=\"isbn\"></a>ISBN (ISBN)</h3>"
    [28] "<h3><a id=\"is\" name=\"is\"></a>ISSN (IS)</h3>"
    [29] "<h3><a id=\"ip\" name=\"ip\"></a>Issue (IP)</h3>"
    [30] "<h3><a id=\"ta\" name=\"ta\"></a>Journal Title Abbreviation (TA)</h3>"
    [31] "<h3><a id=\"jt\" name=\"jt\"></a>Journal Title (JT)</h3>"
    [32] "<h3><a id=\"la\" name=\"la\"></a>Language (LA)</h3>"
    [33] "<h3><a id=\"la3\" name=\"lid\"></a>Location Identifier (LID)</h3>"
    [34] "<h3><a id=\"la2\" name=\"mid\"></a>Manuscript Identifier (MID)</h3>"
    [35] "<h3><a id=\"mhda\" name=\"mhda\"></a>MeSH Date (MHDA)</h3>"
    [36] "<h3><a id=\"mh\" name=\"mh\"></a>MeSH Terms (MH)</h3>"
    [37] "<h3><a id=\"jid\" name=\"jid\"></a>NLM Unique ID (JID)</h3>"
    [38] "<h3><a id=\"rf\" name=\"rf\"></a>Number of References (RF)</h3>"
    [39] "<h3><a id=\"oab\" name=\"oab\"></a>Other Abstract (OAB)</h3>"
    [40] "<h3><a id=\"oci\" name=\"oci\"></a>Other Copyright Information (OCI)</h3>"
    [41] "<h3><a id=\"oid\" name=\"oid\"></a>Other ID (OID)</h3>"
    [42] "<h3><a id=\"ot\" name=\"ot\"></a>Other Term (OT)</h3>"
    [43] "<h3><a id=\"oto\" name=\"oto\"></a>Other Term Owner (OTO)</h3>"
    [44] "<h3><a id=\"own\" name=\"own\"></a>Owner (OWN)</h3>"
    [45] "<h3><a id=\"pg\" name=\"pg\"></a>Pagination (PG)</h3>"
    [46] "<h3><a id=\"ps\" name=\"ps\"></a>Personal Name as Subject (PS)</h3>"
    [47] "<h3><a id=\"fps\" name=\"fps\"></a>Full Personal Name as Subject (FPS)</h3>"
    [48] "<h3><a id=\"pl\" name=\"pl\"></a>Place of Publication (PL)</h3>"
    [49] "<h3><a id=\"phst\" name=\"phst\"></a>Publication History Status (PHST)</h3>"
    [50] "<h3><a id=\"pst\" name=\"pst\"></a>Publication Status (PST)</h3>"
    [51] "<h3><a id=\"pt\" name=\"pt\"></a>Publication Type (PT)</h3>"
    [52] "<h3><a id=\"pubm\" name=\"pubm\"></a>Publishing Model (PUBM)</h3>"
    [53] "<h3><a id=\"pmid2\" name=\"pmc\"></a>PubMed Central Identifer (PMC)</h3>"
    [54] "<h3><a id=\"pmid3\" name=\"pmcr\"></a>PubMed Central Release (PMCR)</h3>"
    [55] "<h3><a id=\"pmid\" name=\"pmid\"></a>PubMed Unique Identifier (PMID)</h3>"
    [56] "<h3><a id=\"rn\" name=\"rn\"></a>Registry Number/EC Number (RN)</h3>"
    [57] "<h3><a id=\"nm\" name=\"nm\"></a>Substance Name (NM)</h3>"
    [58] "<h3><a id=\"si\" name=\"si\"></a>Secondary Source ID (SI)</h3>"
    [59] "<h3><a id=\"so\" name=\"so\"></a>Source (SO)</h3>"
    [60] "<h3><a id=\"sfm\" name=\"sfm\"></a>Space Flight Mission (SFM)</h3>"
    [61] "<h3><a id=\"stat\" name=\"stat\"></a>Status (STAT)</h3>"
    [62] "<h3><a id=\"sb\" name=\"sb\"></a>Subset (SB)</h3>"
    [63] "<h3><a id=\"ti\" name=\"ti\"></a>Title (TI)</h3>"
    [64] "<h3><a id=\"tt\" name=\"tt\"></a>Transliterated Title (TT)</h3>"
    [65] "<h3><a id=\"vi\" name=\"vi\"></a>Volume (VI)</h3>"
    [66] "<h3><a id=\"cc3\" name=\"vti\"></a>Volume Title (VTI)</h3>"
    > h3.pattern <- list(
    + nc::field("name", '="', '[^"]+'),
    + '"></a>',
    + fields.abbrevs="[^<]+")
    > first.fields.dt <- nc::capture_first_vec(
    + h3.vec, h3.pattern)
    > field.abbrev.pattern <- list(
    + Field=".*?",
    + " \\(",
    + Abbreviation="[^)]+",
    + "\\)",
    + "(?: and |$)?")
    > (first.each.field <- first.fields.dt[, nc::capture_all_str(
    + fields.abbrevs, field.abbrev.pattern),
    + by=fields.abbrevs])
     fields.abbrevs
     1: Abstract (AB)
     2: Copyright Information (CI)
     3: Affiliation (AD)
     4: Investigator Affiliation (IRAD)
     5: Article Identifier (AID)
     6: Author (AU)
     7: Author Identifier (AUID)
     8: Full Author (FAU)
     9: Book Title (BTI)
    10: Collection Title (CTI)
    11: Comments/Corrections (See fields and field tags listed below.)
    12: Conflict of Interest Statement (COIS)
    13: Corporate Author (CN)
    14: Create Date (CRDT)
    15: Date Completed (DCOM)
    16: Date Created (DA)
    17: Date Last Revised (LR)
    18: Date of Electronic Publication (DEP)
    19: Date of Publication (DP)
    20: Editor (ED) and Full Editor Name (FED)
    21: Editor (ED) and Full Editor Name (FED)
    22: Edition (EN)
    23: Entrez Date (EDAT)
    24: Gene Symbol (GS): not currently input
    25: General Note (GN)
    26: Grant Number (GR)
    27: Investigator Name (IR) and Full Investigator Name (FIR)
    28: Investigator Name (IR) and Full Investigator Name (FIR)
    29: ISBN (ISBN)
    30: ISSN (IS)
    31: Issue (IP)
    32: Journal Title Abbreviation (TA)
    33: Journal Title (JT)
    34: Language (LA)
    35: Location Identifier (LID)
    36: Manuscript Identifier (MID)
    37: MeSH Date (MHDA)
    38: MeSH Terms (MH)
    39: NLM Unique ID (JID)
    40: Number of References (RF)
    41: Other Abstract (OAB)
    42: Other Copyright Information (OCI)
    43: Other ID (OID)
    44: Other Term (OT)
    45: Other Term Owner (OTO)
    46: Owner (OWN)
    47: Pagination (PG)
    48: Personal Name as Subject (PS)
    49: Full Personal Name as Subject (FPS)
    50: Place of Publication (PL)
    51: Publication History Status (PHST)
    52: Publication Status (PST)
    53: Publication Type (PT)
    54: Publishing Model (PUBM)
    55: PubMed Central Identifer (PMC)
    56: PubMed Central Release (PMCR)
    57: PubMed Unique Identifier (PMID)
    58: Registry Number/EC Number (RN)
    59: Substance Name (NM)
    60: Secondary Source ID (SI)
    61: Source (SO)
    62: Space Flight Mission (SFM)
    63: Status (STAT)
    64: Subset (SB)
    65: Title (TI)
    66: Transliterated Title (TT)
    67: Volume (VI)
    68: Volume Title (VTI)
     fields.abbrevs
     Field Abbreviation
     1: Abstract AB
     2: Copyright Information CI
     3: Affiliation AD
     4: Investigator Affiliation IRAD
     5: Article Identifier AID
     6: Author AU
     7: Author Identifier AUID
     8: Full Author FAU
     9: Book Title BTI
    10: Collection Title CTI
    11: Comments/Corrections See fields and field tags listed below.
    12: Conflict of Interest Statement COIS
    13: Corporate Author CN
    14: Create Date CRDT
    15: Date Completed DCOM
    16: Date Created DA
    17: Date Last Revised LR
    18: Date of Electronic Publication DEP
    19: Date of Publication DP
    20: Editor ED
    21: Full Editor Name FED
    22: Edition EN
    23: Entrez Date EDAT
    24: Gene Symbol GS
    25: General Note GN
    26: Grant Number GR
    27: Investigator Name IR
    28: Full Investigator Name FIR
    29: ISBN ISBN
    30: ISSN IS
    31: Issue IP
    32: Journal Title Abbreviation TA
    33: Journal Title JT
    34: Language LA
    35: Location Identifier LID
    36: Manuscript Identifier MID
    37: MeSH Date MHDA
    38: MeSH Terms MH
    39: NLM Unique ID JID
    40: Number of References RF
    41: Other Abstract OAB
    42: Other Copyright Information OCI
    43: Other ID OID
    44: Other Term OT
    45: Other Term Owner OTO
    46: Owner OWN
    47: Pagination PG
    48: Personal Name as Subject PS
    49: Full Personal Name as Subject FPS
    50: Place of Publication PL
    51: Publication History Status PHST
    52: Publication Status PST
    53: Publication Type PT
    54: Publishing Model PUBM
    55: PubMed Central Identifer PMC
    56: PubMed Central Release PMCR
    57: PubMed Unique Identifier PMID
    58: Registry Number/EC Number RN
    59: Substance Name NM
    60: Secondary Source ID SI
    61: Source SO
    62: Space Flight Mission SFM
    63: Status STAT
    64: Subset SB
    65: Title TI
    66: Transliterated Title TT
    67: Volume VI
    68: Volume Title VTI
     Field Abbreviation
    >
    > ## If we want to capture the information after the initial h3 line
    > ## of the input, e.g. the rest column below which contains a
    > ## description/example for each field, then capture_all_str can be
    > ## used on the full input file.
    > h3.fields.dt <- nc::capture_all_str(
    + no.comments,
    + h3.pattern,
    + '</h3>\n',
    + rest="(?:.*\n)+?", #exercise: get the examples.
    + "<hr />\n")
    > (h3.each.field <- h3.fields.dt[, nc::capture_all_str(
    + fields.abbrevs, field.abbrev.pattern),
    + by=fields.abbrevs])
     fields.abbrevs
     1: Abstract (AB)
     2: Copyright Information (CI)
     3: Affiliation (AD)
     4: Investigator Affiliation (IRAD)
     5: Article Identifier (AID)
     6: Author (AU)
     7: Author Identifier (AUID)
     8: Full Author (FAU)
     9: Book Title (BTI)
    10: Collection Title (CTI)
    11: Comments/Corrections (See fields and field tags listed below.)
    12: Conflict of Interest Statement (COIS)
    13: Corporate Author (CN)
    14: Create Date (CRDT)
    15: Date Completed (DCOM)
    16: Date Created (DA)
    17: Date Last Revised (LR)
    18: Date of Electronic Publication (DEP)
    19: Date of Publication (DP)
    20: Editor (ED) and Full Editor Name (FED)
    21: Editor (ED) and Full Editor Name (FED)
    22: Edition (EN)
    23: Entrez Date (EDAT)
    24: Gene Symbol (GS): not currently input
    25: General Note (GN)
    26: Grant Number (GR)
    27: Investigator Name (IR) and Full Investigator Name (FIR)
    28: Investigator Name (IR) and Full Investigator Name (FIR)
    29: ISBN (ISBN)
    30: ISSN (IS)
    31: Issue (IP)
    32: Journal Title Abbreviation (TA)
    33: Journal Title (JT)
    34: Language (LA)
    35: Location Identifier (LID)
    36: Manuscript Identifier (MID)
    37: MeSH Date (MHDA)
    38: MeSH Terms (MH)
    39: NLM Unique ID (JID)
    40: Number of References (RF)
    41: Other Abstract (OAB)
    42: Other Copyright Information (OCI)
    43: Other ID (OID)
    44: Other Term (OT)
    45: Other Term Owner (OTO)
    46: Owner (OWN)
    47: Pagination (PG)
    48: Personal Name as Subject (PS)
    49: Full Personal Name as Subject (FPS)
    50: Place of Publication (PL)
    51: Publication History Status (PHST)
    52: Publication Status (PST)
    53: Publication Type (PT)
    54: Publishing Model (PUBM)
    55: PubMed Central Identifer (PMC)
    56: PubMed Central Release (PMCR)
    57: PubMed Unique Identifier (PMID)
    58: Registry Number/EC Number (RN)
    59: Substance Name (NM)
    60: Secondary Source ID (SI)
    61: Source (SO)
    62: Space Flight Mission (SFM)
    63: Status (STAT)
    64: Subset (SB)
    65: Title (TI)
    66: Transliterated Title (TT)
    67: Volume (VI)
    68: Volume Title (VTI)
     fields.abbrevs
     Field Abbreviation
     1: Abstract AB
     2: Copyright Information CI
     3: Affiliation AD
     4: Investigator Affiliation IRAD
     5: Article Identifier AID
     6: Author AU
     7: Author Identifier AUID
     8: Full Author FAU
     9: Book Title BTI
    10: Collection Title CTI
    11: Comments/Corrections See fields and field tags listed below.
    12: Conflict of Interest Statement COIS
    13: Corporate Author CN
    14: Create Date CRDT
    15: Date Completed DCOM
    16: Date Created DA
    17: Date Last Revised LR
    18: Date of Electronic Publication DEP
    19: Date of Publication DP
    20: Editor ED
    21: Full Editor Name FED
    22: Edition EN
    23: Entrez Date EDAT
    24: Gene Symbol GS
    25: General Note GN
    26: Grant Number GR
    27: Investigator Name IR
    28: Full Investigator Name FIR
    29: ISBN ISBN
    30: ISSN IS
    31: Issue IP
    32: Journal Title Abbreviation TA
    33: Journal Title JT
    34: Language LA
    35: Location Identifier LID
    36: Manuscript Identifier MID
    37: MeSH Date MHDA
    38: MeSH Terms MH
    39: NLM Unique ID JID
    40: Number of References RF
    41: Other Abstract OAB
    42: Other Copyright Information OCI
    43: Other ID OID
    44: Other Term OT
    45: Other Term Owner OTO
    46: Owner OWN
    47: Pagination PG
    48: Personal Name as Subject PS
    49: Full Personal Name as Subject FPS
    50: Place of Publication PL
    51: Publication History Status PHST
    52: Publication Status PST
    53: Publication Type PT
    54: Publishing Model PUBM
    55: PubMed Central Identifer PMC
    56: PubMed Central Release PMCR
    57: PubMed Unique Identifier PMID
    58: Registry Number/EC Number RN
    59: Substance Name NM
    60: Secondary Source ID SI
    61: Source SO
    62: Space Flight Mission SFM
    63: Status STAT
    64: Subset SB
    65: Title TI
    66: Transliterated Title TT
    67: Volume VI
    68: Volume Title VTI
     Field Abbreviation
    >
    > ## Either method of capturing abbreviations gives the same result.
    > identical(first.each.field, h3.each.field)
    [1] TRUE
    >
    > ## but the capture_all_str method returns the additional rest column
    > ## which contains data after the initial h3 line.
    > names(first.fields.dt)
    [1] "name" "fields.abbrevs"
    > names(h3.fields.dt)
    [1] "name" "fields.abbrevs" "rest"
    > cat(h3.fields.dt[fields.abbrevs=="Volume (VI)", rest])
    <p>The volume number of the journal in which the article was published is recorded here.</p>
    <p class="examplekm">Examples:<br />VI - 7<br />VI - 5 Spec No<br />VI - 49 Suppl 20</p>
    <p>Some records (especially records from <a href="/databases/databases_oldmedline.html">OLDMEDLINE</a>) contain the Issue field but lack the Volume field; some contain the Volume field but lack the Issue field; and some records contain Volume and Issue data in the Volume element.</p>
    >
    > ## There are 66 Field rows across three tables.
    > a.href <- list('<a href=[^>]+>')
    > (td.vec <- fields.vec[240:280])
     [1] "<td><a href=\"#ab\">Abstract</a></td>"
     [2] "<td><a href=\"#ab\">(AB)</a></td>"
     [3] "</tr>"
     [4] "<tr style=\"background-color: #cccccc;\">"
     [5] "<td><a href=\"#ci\">Copyright Information</a></td>"
     [6] "<td>"
     [7] "<div><a href=\"#ci\">(CI)</a></div>"
     [8] "</td>"
     [9] "</tr>"
    [10] "<tr>"
    [11] "<td><a href=\"#ad\">Affiliation</a></td>"
    [12] "<td>"
    [13] "<div><a href=\"#ad\">(AD)</a></div>"
    [14] "</td>"
    [15] "</tr>"
    [16] "<tr style=\"background-color: #cccccc;\">"
    [17] "<td><a href=\"#irad\">Investigator Affiliation</a></td>"
    [18] "<td>"
    [19] "<div><a href=\"#irad\">(IRAD)</a></div>"
    [20] "</td>"
    [21] "</tr>"
    [22] "<tr>"
    [23] "<td><a href=\"#aid\">Article Identifier</a></td>"
    [24] "<td>"
    [25] "<div><a href=\"#aid\">(AID)</a></div>"
    [26] "</td>"
    [27] "</tr>"
    [28] "<tr style=\"background-color: #cccccc;\">"
    [29] "<td><a href=\"#au\">Author</a></td>"
    [30] "<td>"
    [31] "<div><a href=\"#au\">(AU)</a></div>"
    [32] "</td>"
    [33] "</tr>"
    [34] "<tr>"
    [35] "<td><a href=\"#auid\">Author Identifier</a></td>"
    [36] "<td><a href=\"#auid\">(AUID)</a></td>"
    [37] "</tr>"
    [38] "<tr>"
    [39] "<td style=\"background-color: #cccccc;\"><a href=\"#fau\">Full Author</a></td>"
    [40] "<td style=\"background-color: #cccccc;\">"
    [41] "<div><a href=\"#fau\">(FAU)</a></div>"
    > fields.pattern <- list(
    + "<td.*?>",
    + a.href,
    + Fields="[^()<]+",
    + "</a></td>\n")
    > (td.only.Fields <- nc::capture_all_str(fields.vec, fields.pattern))
     Fields
     1: Abstract
     2: Copyright Information
     3: Affiliation
     4: Investigator Affiliation
     5: Article Identifier
     6: Author
     7: Author Identifier
     8: Full Author
     9: Book Title
    10: Collection Title
    11: Comments/Corrections
    12: Conflict of Interest Statement
    13: Corporate Author
    14: Create Date
    15: Date Completed
    16: Date Created
    17: Date Last Revised
    18: Date of Electronic Publication
    19: Date of Publication
    20: Edition
    21: Editor and Full Editor Name
    22: Entrez Date
    23: Gene Symbol
    24: General Note
    25: Grant Number
    26: Investigator Name and Full Investigator Name
    27: ISBN
    28: ISSN
    29: Issue
    30: Journal Title Abbreviation
    31: Journal Title
    32: Language
    33: Location Identifier
    34: Manuscript Identifier
    35: MeSH Date
    36: MeSH Terms
    37: NLM Unique ID
    38: Number of References
    39: Other Abstract
    40: Other Copyright Information
    41: Other ID
    42: Other Term
    43: Other Term Owner
    44: Owner
    45: Pagination
    46: Personal Name as Subject
    47: Full Personal Name as Subject
    48: Place of Publication
    49: Publication History Status
    50: Publication Status
    51: Publication Type
    52: Publishing Model
    53: PubMed Central Identifier
    54: PubMed Central Release
    55: PubMed Unique Identifier
    56: Registry Number/EC Number
    57: Substance Name
    58: Secondary Source ID
    59: Source
    60: Space Flight Mission
    61: Status
    62: Subset
    63: Title
    64: Transliterated Title
    65: Volume
    66: Volume Title
     Fields
    >
    > ## Extract Fields and Abbreviations. Careful: most fields have one
    > ## abbreviation, but one field has none, and two fields have two.
    > (td.fields.dt <- nc::capture_all_str(
    + fields.vec,
    + fields.pattern,
    + "<td[^>]*>",
    + "(?:\n<div>)?",
    + a.href, "?",
    + abbrevs=".*?",
    + "</"))
     Fields abbrevs
     1: Abstract (AB)
     2: Copyright Information (CI)
     3: Affiliation (AD)
     4: Investigator Affiliation (IRAD)
     5: Article Identifier (AID)
     6: Author (AU)
     7: Author Identifier (AUID)
     8: Full Author (FAU)
     9: Book Title (BTI)
    10: Collection Title (CTI)
    11: Comments/Corrections &nbsp;
    12: Conflict of Interest Statement (COIS)
    13: Corporate Author (CN)
    14: Create Date (CRDT)
    15: Date Completed (DCOM)
    16: Date Created (DA)
    17: Date Last Revised (LR)
    18: Date of Electronic Publication (DEP)
    19: Date of Publication (DP)
    20: Edition (EN)
    21: Editor and Full Editor Name (ED)<br />(FED)
    22: Entrez Date (EDAT)
    23: Gene Symbol (GS)
    24: General Note (GN)
    25: Grant Number (GR)
    26: Investigator Name and Full Investigator Name (IR) (FIR)
    27: ISBN (ISBN)
    28: ISSN (IS)
    29: Issue (IP)
    30: Journal Title Abbreviation (TA)
    31: Journal Title (JT)
    32: Language (LA)
    33: Location Identifier (LID)
    34: Manuscript Identifier (MID)
    35: MeSH Date (MHDA)
    36: MeSH Terms (MH)
    37: NLM Unique ID (JID)
    38: Number of References (RF)
    39: Other Abstract (OAB)
    40: Other Copyright Information (OCI)
    41: Other ID (OID)
    42: Other Term (OT)
    43: Other Term Owner (OTO)
    44: Owner (OWN)
    45: Pagination (PG)
    46: Personal Name as Subject (PS)
    47: Full Personal Name as Subject (FPS)
    48: Place of Publication (PL)
    49: Publication History Status (PHST)
    50: Publication Status (PST)
    51: Publication Type (PT)
    52: Publishing Model (PUBM)
    53: PubMed Central Identifier (PMC)
    54: PubMed Central Release (PMCR)
    55: PubMed Unique Identifier (PMID)
    56: Registry Number/EC Number (RN)
    57: Substance Name (NM)
    58: Secondary Source ID (SI)
    59: Source (SO)
    60: Space Flight Mission (SFM)
    61: Status (STAT)
    62: Subset (SB)
    63: Title (TI)
    64: Transliterated Title (TT)
    65: Volume (VI)
    66: Volume Title (VTI)
     Fields abbrevs
    >
    > ## Get each individual abbreviation from the previously captured td
    > ## data.
    > td.each.field <- td.fields.dt[, {
    + f <- nc::capture_all_str(
    + Fields,
    + Field=".*?",
    + "(?:$| and )")
    + a <- nc::capture_all_str(
    + abbrevs,
    + "\\(",
    + Abbreviation="[^)]+",
    + "\\)")
    + if(nrow(a)==0)list() else cbind(f, a)
    + }, by=Fields]
    > str(td.each.field)
    Classes ‘data.table’ and 'data.frame': 67 obs. of 3 variables:
     $ Fields : chr "Abstract" "Copyright Information" "Affiliation" "Investigator Affiliation" ...
     $ Field : chr "Abstract" "Copyright Information" "Affiliation" "Investigator Affiliation" ...
     $ Abbreviation: chr "AB" "CI" "AD" "IRAD" ...
     - attr(*, ".internal.selfref")=<externalptr>
    > td.each.field[td.fields.dt, .(
    + count=.N
    + ), on=.(Fields), by=.EACHI][order(count)]
     Fields count
     1: Comments/Corrections 0
     2: Abstract 1
     3: Copyright Information 1
     4: Affiliation 1
     5: Investigator Affiliation 1
     6: Article Identifier 1
     7: Author 1
     8: Author Identifier 1
     9: Full Author 1
    10: Book Title 1
    11: Collection Title 1
    12: Conflict of Interest Statement 1
    13: Corporate Author 1
    14: Create Date 1
    15: Date Completed 1
    16: Date Created 1
    17: Date Last Revised 1
    18: Date of Electronic Publication 1
    19: Date of Publication 1
    20: Edition 1
    21: Entrez Date 1
    22: Gene Symbol 1
    23: General Note 1
    24: Grant Number 1
    25: ISBN 1
    26: ISSN 1
    27: Issue 1
    28: Journal Title Abbreviation 1
    29: Journal Title 1
    30: Language 1
    31: Location Identifier 1
    32: Manuscript Identifier 1
    33: MeSH Date 1
    34: MeSH Terms 1
    35: NLM Unique ID 1
    36: Number of References 1
    37: Other Abstract 1
    38: Other Copyright Information 1
    39: Other ID 1
    40: Other Term 1
    41: Other Term Owner 1
    42: Owner 1
    43: Pagination 1
    44: Personal Name as Subject 1
    45: Full Personal Name as Subject 1
    46: Place of Publication 1
    47: Publication History Status 1
    48: Publication Status 1
    49: Publication Type 1
    50: Publishing Model 1
    51: PubMed Central Identifier 1
    52: PubMed Central Release 1
    53: PubMed Unique Identifier 1
    54: Registry Number/EC Number 1
    55: Substance Name 1
    56: Secondary Source ID 1
    57: Source 1
    58: Space Flight Mission 1
    59: Status 1
    60: Subset 1
    61: Title 1
    62: Transliterated Title 1
    63: Volume 1
    64: Volume Title 1
    65: Editor and Full Editor Name 2
    66: Investigator Name and Full Investigator Name 2
     Fields count
    >
    > ## There is a typo in the data captured from the h3 headings.
    > td.each.field[!Field %in% h3.each.field$Field]
     Fields Field Abbreviation
    1: PubMed Central Identifier PubMed Central Identifier PMC
    > h3.each.field[!Field %in% td.each.field$Field]
     fields.abbrevs
    1: Comments/Corrections (See fields and field tags listed below.)
    2: PubMed Central Identifer (PMC)
     Field Abbreviation
    1: Comments/Corrections See fields and field tags listed below.
    2: PubMed Central Identifer PMC
    >
    > ## Abbreviations are consistent.
    > td.each.field[!Abbreviation %in% h3.each.field$Abbreviation]
    Empty data.table (0 rows and 3 cols): Fields,Field,Abbreviation
    > h3.each.field[!Abbreviation %in% td.each.field$Abbreviation]
     fields.abbrevs
    1: Comments/Corrections (See fields and field tags listed below.)
     Field Abbreviation
    1: Comments/Corrections See fields and field tags listed below.
    >
    > ## There is a a table that provides a description of each comment
    > ## type.
    > (comment.vec <- fields.vec[840:860])
     [1] "<tr>"
     [2] "<th><strong>Comment or Correction Type</strong></th>"
     [3] "<th><strong>MEDLINE Display Field Tag</strong></th>"
     [4] "<th><strong>Description</strong></th>"
     [5] "</tr>"
     [6] "<tr>"
     [7] "<td><strong>Comment in</strong></td>"
     [8] "<td><strong>(CIN)</strong></td>"
     [9] "<td>cites the reference containing a commentary about the article (appears on citation for original article); began use with journal issues published in 1989.</td>"
    [10] "</tr>"
    [11] "<tr>"
    [12] "<td><strong>Comment on</strong></td>"
    [13] "<td><strong>(CON)</strong></td>"
    [14] "<td>cites the reference upon which the article comments; began use with journal issues published in 1989.</td>"
    [15] "</tr>"
    [16] "<tr>"
    [17] "<td><strong>Erratum in</strong></td>"
    [18] "<td><strong>(EIN)</strong></td>"
    [19] "<td>cites a published erratum to the article (appears on citation for original article); began use in 1987.</td>"
    [20] "</tr>"
    [21] "<tr>"
    > comment.dt <- nc::capture_all_str(
    + fields.vec,
    + "<td><strong>",
    + Field="[^<]+",
    + "</strong></td>\n",
    + "<td><strong>\\(",
    + Abbreviation="[^)]+",
    + "\\)</strong></td>\n",
    + "<td>",
    + description=".*",
    + "</td>\n")
    > str(comment.dt)
    Classes ‘data.table’ and 'data.frame': 18 obs. of 3 variables:
     $ Field : chr "Comment in" "Comment on" "Erratum in" "Erratum for" ...
     $ Abbreviation: chr "CIN" "CON" "EIN" "EFR" ...
     $ description : chr "cites the reference containing a commentary about the article (appears on citation for original article); began"| __truncated__ "cites the reference upon which the article comments; began use with journal issues published in 1989." "cites a published erratum to the article (appears on citation for original article); began use in 1987." "cites the original article for which there is a published erratum. As of 2016, partial retractions are considered errata." ...
     - attr(*, ".internal.selfref")=<externalptr>
    >
    > ## Join to original PMC citation file in order to see what the
    > ## abbreviations used in that file mean.
    > all.abbrevs <- rbind(
    + td.each.field[, .(Field, Abbreviation)],
    + comment.dt[, .(Field, Abbreviation)])
    > all.abbrevs[pmc.dt, .(
    + Abbreviation,
    + Field,
    + value=substr(value, 1, 20)
    + ), on=.(Abbreviation)]
     Abbreviation Field value
     1: PMID PubMed Unique Identifier 21113027
     2: OWN Owner NLM
     3: STAT Status MEDLINE
     4: DCOM Date Completed 20110512
     5: LR Date Last Revised 20181113
     6: IS ISSN 1362-4962 (Electroni
     7: IS ISSN 0305-1048 (Print)
     8: IS ISSN 0305-1048 (Linking)
     9: VI Volume 39
    10: IP Issue 4
    11: DP Date of Publication 2011 Mar
    12: TI Title A manually curated C
    13: PG Pagination e25
    14: LID Location Identifier 10.1093/nar/gkq1187
    15: AB Abstract Chromatin immunoprec
    16: FAU Full Author Rye, Morten Beck
    17: AU Author Rye MB
    18: AD Affiliation Department of Cancer
    19: FAU Full Author Sætrom, Pål
    20: AU Author Sætrom P
    21: FAU Full Author Drabløs, Finn
    22: AU Author Drabløs F
    23: LA Language eng
    24: PT Publication Type Evaluation Studies
    25: PT Publication Type Journal Article
    26: PT Publication Type Research Support, No
    27: DEP Date of Electronic Publication 20101126
    28: TA Journal Title Abbreviation Nucleic Acids Res
    29: JT Journal Title Nucleic acids resear
    30: JID NLM Unique ID 0411011
    31: RN Registry Number/EC Number 0 (Transcription Fac
    32: SB Subset IM
    33: MH MeSH Terms Benchmarking
    34: MH MeSH Terms Binding Sites
    35: MH MeSH Terms *Chromatin Immunopre
    36: MH MeSH Terms *High-Throughput Nuc
    37: MH MeSH Terms *Software
    38: MH MeSH Terms Transcription Factor
    39: PMC PubMed Central Identifier PMC3045577
    40: EDAT Entrez Date 2010/11/30 06:00
    41: MHDA MeSH Date 2011/05/13 06:00
    42: CRDT Create Date 2010/11/30 06:00
    43: PHST Publication History Status 2010/11/30 06:00 [en
    44: PHST Publication History Status 2010/11/30 06:00 [pu
    45: PHST Publication History Status 2011/05/13 06:00 [me
    46: AID Article Identifier 10.1093/nar/gkq1187
    47: AID Article Identifier gkq1187 [pii]
    48: AID Article Identifier gkq1187 [pii]
    49: PST Publication Status ppublish
    50: SO Source Nucleic Acids Res. 2
     Abbreviation Field value
    >
    > ## There is a listing of examples for each comment type.
    > (comment.ex.dt <- nc::capture_all_str(
    + fields.vec[938],
    + "br />\\s*",
    + Abbreviation="[A-Z]+",
    + "\\s*-\\s*",
    + citation="[^<]+?",
    + list(
    + "[.] ",
    + nc::field("PMID", ": ", "[0-9]+")
    + ), "?",
    + "<"))
     Abbreviation citation
     1: CON Dev Cell. 2002 Jul;3(1):85-97
     2: CIN N Engl J Med. 2003 Jul 17;349(3):211-2
     3: CRI Orthop Nurs. 2003 May-Jun;22(3):232-9
     4: CRF Biochemistry. 1994 May 10;33(18):5614-22
     5: EIN Acta Obstet Gynecol Scand. 2003 Jan;82(1):102
     6: EFR J Arthroplasty. 2002 Jun;17(4):524-6
     7: RIN J Biochem Mol Biol. 2002 Nov 30;35(6):642
     8: ROF Ware FE, Lehrman MA. J Biol Chem. 1996 Jun 14;271(24):13935-8
     9: UIN Cochrane Database Syst Rev. 2002;(3):CD003688
    10: UOF Cochrane Database Syst Rev. 2002;(2):CD003680
    11: SPIN Ann Intern Med. 2003 Jun 3;138(11):I60
    12: ORI Ann Intern Med. 2003 Jun 3;138(11):907-16
     PMID
     1: 12110170
     2: 12867604
     3: 12872752
     4: 8180186
     5:
     6: 12066289
     7: 12476908
     8: 8663248
     9: 12137706
    10: 12076500
    11: 12779314
    12: 12779301
    >
    > ## Join abbreviations to see what kind of comments.
    > all.abbrevs[comment.ex.dt, on=.(Abbreviation)]
     Field Abbreviation
     1: Comment on CON
     2: Comment in CIN
     3: Corrected and Republished in CRI
     4: Corrected and Republished from CRF
     5: Erratum in EIN
     6: Erratum for EFR
     7: Retraction in RIN
     8: Retraction of ROF
     9: Update in UIN
    10: Update of UOF
    11: Summary for patients in SPIN
    12: Original report in ORI
     citation PMID
     1: Dev Cell. 2002 Jul;3(1):85-97 12110170
     2: N Engl J Med. 2003 Jul 17;349(3):211-2 12867604
     3: Orthop Nurs. 2003 May-Jun;22(3):232-9 12872752
     4: Biochemistry. 1994 May 10;33(18):5614-22 8180186
     5: Acta Obstet Gynecol Scand. 2003 Jan;82(1):102
     6: J Arthroplasty. 2002 Jun;17(4):524-6 12066289
     7: J Biochem Mol Biol. 2002 Nov 30;35(6):642 12476908
     8: Ware FE, Lehrman MA. J Biol Chem. 1996 Jun 14;271(24):13935-8 8663248
     9: Cochrane Database Syst Rev. 2002;(3):CD003688 12137706
    10: Cochrane Database Syst Rev. 2002;(2):CD003680 12076500
    11: Ann Intern Med. 2003 Jun 3;138(11):I60 12779314
    12: Ann Intern Med. 2003 Jun 3;138(11):907-16 12779301
    >
    > ## parsing bibtex file.
    > refs.bib <- system.file(
    + "extdata", "namedCapture-refs.bib", package="nc")
    > refs.vec <- readLines(refs.bib)
    > at.lines <- grep("@", refs.vec, value=TRUE)
    > str(at.lines)
     chr [1:24] " @Manual{namedCapture," " @Manual{TRE," " @Manual{re2r," ...
    > refs.dt <- nc::capture_all_str(
    + refs.vec,
    + "@",
    + type="[^{]+",
    + "{",
    + ref="[^,]+",
    + ",\n",
    + fields="(?:.*\n)+?.*",
    + "}\\s*(?:$|\n)")
    > str(refs.dt)
    Classes ‘data.table’ and 'data.frame': 24 obs. of 3 variables:
     $ type : chr "Manual" "Manual" "Manual" "Manual" ...
     $ ref : chr "namedCapture" "TRE" "re2r" "rematch2" ...
     $ fields: chr " title = {namedCapture: Named Capture Regular Expressions},\n author = {Toby Dylan Hocking},\n year = "| __truncated__ " title = {TRE: The free and portable approximate regex matching library},\n author = {Ville Laurikari},\n"| __truncated__ " title = {re2r: RE2 Regular Expression},\n author = {Qin Wenfeng},\n year = {2017},\n note = {R pac"| __truncated__ " title = {rematch2: Tidy Output from Regular Expression Matching},\n author = {Gábor Csárdi},\n year ="| __truncated__ ...
     - attr(*, ".internal.selfref")=<externalptr>
    >
    > ## parsing each field of each entry.
    > eq.lines <- grep("=", refs.vec, value=TRUE)
    > str(eq.lines)
     chr [1:140] " title = {namedCapture: Named Capture Regular Expressions}," ...
    > strip <- function(x)sub("^\\s*\\{*", "", sub("\\}*,?$", "", x))
    > refs.fields <- refs.dt[, nc::capture_all_str(
    + fields,
    + "\\s+",
    + variable="\\S+",
    + "\\s+=",
    + value=".*", strip),
    + by=.(type, ref)]
    > str(refs.fields)
    Classes ‘data.table’ and 'data.frame': 140 obs. of 4 variables:
     $ type : chr "Manual" "Manual" "Manual" "Manual" ...
     $ ref : chr "namedCapture" "namedCapture" "namedCapture" "namedCapture" ...
     $ variable: chr "title" "author" "year" "note" ...
     $ value : chr "namedCapture: Named Capture Regular Expressions" "Toby Dylan Hocking" "2019" "R package version 2019.01.14" ...
     - attr(*, ".internal.selfref")=<externalptr>
    > with(refs.fields[ref=="HockingUseR2011"], structure(
    + as.list(value), names=variable))
    $author
    [1] "Toby Dylan Hocking"
    
    $title
    [1] "Fast, named capture regular expressions in R 2.14"
    
    $year
    [1] "2011"
    
    $url
    [1] "http://web.warwick.ac.uk/statsdept/user-2011/TalkSlides/Lightening/2-StatisticsAndProg\\_3-Hocking.pdf"
    
    $booktitle
    [1] "useR 2011 conference proceedings"
    
    > ## the URL of my talk is now
    > ## https://user2011.r-project.org/TalkSlides/Lightening/2-StatisticsAndProg_3-Hocking.pdf
    >
    > ## Parsing wikimedia tables: each begins with {| and ends with |}.
    > emoji.txt.gz <- system.file(
    + "extdata", "wikipedia-emoji-text.txt.gz", package="nc")
    > tables <- nc::capture_all_str(
    + emoji.txt.gz,
    + "\n[{][|]",
    + first=".*",
    + '\n[|][+] style="',
    + nc::field("font-size", ":", '.*?'),
    + '" [|] ',
    + title=".*",
    + lines="(?:\n.*)*?",
    + "\n[|][}]")
    Error in substring(subject, first, last) :
     invalid multibyte string at '<f0>
    Calls: <Anonymous> -> substring
    Execution halted
Flavor: r-patched-solaris-x86

Package PeakError

Current CRAN status: OK: 13

Package PeakSegDisk

Current CRAN status: OK: 13

Package PeakSegDP

Current CRAN status: OK: 13

Package PeakSegJoint

Current CRAN status: OK: 13

Package PeakSegOptimal

Current CRAN status: OK: 13

Package penaltyLearning

Current CRAN status: OK: 13

Package WeightedROC

Current CRAN status: OK: 13