Kiwi Class

Kiwi class is provide method for korean mophological analyze result.

Methods

Public methods

Kiwi$print()
Kiwi$new()
Kiwi$add_user_word()
Kiwi$add_pre_analyzed_words()
Kiwi$add_rules()
Kiwi$load_user_dictionarys()
Kiwi$extract_words()
Kiwi$analyze()
Kiwi$tokenize()
Kiwi$split_into_sents()
Kiwi$get_tidytext_func()
Kiwi$clone()

Method `print()`

print method for Kiwi objects

Usage

Kiwi$print(x, ...)

Arguments

x: self
...: ignored

Method `new()`

Create a kiwi instance.

Usage

Kiwi$new(
  num_workers = 0,
  model_size = "base",
  integrate_allomorph = TRUE,
  load_default_dict = TRUE
)

Arguments

num_workers: int(optional): use multi-thread core number. default is 0 which means use all core.
model_size: char(optional): kiwi model select. default is "base". "small", "large" is available.
integrate_allomorph: bool(optional): default is TRUE.
load_default_dict: bool(optional): use defualt dictionary. default is TRUE.

Method `add_user_word()`

add user word with pos and score

Usage

Kiwi$add_user_word(word, tag, score, orig_word = "")

Arguments

word: char(required): target word to add.
tag: Tags(required): tag information about word.
score: num(required): score information about word.
orig_word: char(optional): origin word.

Method `add_pre_analyzed_words()`

TODO

Usage

Kiwi$add_pre_analyzed_words(form, analyzed, score)

Arguments

form: char(required): target word to add analyzed result.
analyzed: data.frame(required): analyzed result expected.
score: num(required): score information about pre analyzed result.

Method `add_rules()`

TODO

Usage

Kiwi$add_rules(tag, pattern, replacement, score)

Arguments

tag: Tags(required): target tag to add rules.
pattern: char(required): regular expression.
replacement: char(required): replace text.
score: num(required): score information about rules.

Method `load_user_dictionarys()`

add user dictionary using text file.

Usage

Kiwi$load_user_dictionarys(user_dict_path)

Arguments

user_dict_path: char(required): path of user dictionary file.

Method `extract_words()`

Extract Noun word candidate from texts.

Usage

Kiwi$extract_words(
  input,
  min_cnt,
  max_word_len,
  min_score,
  pos_threshold,
  apply = FALSE
)

Arguments

input: char(required): target text data
min_cnt: int(required): minimum count of word in text.
max_word_len: int(required): max word length.
min_score: num(required): minimum score.
pos_threshold: num(required): pos threashold.
apply: bool(optional): apply extracted word as user word dict.

Method `analyze()`

Analyze text to token and tag results.

Usage

Kiwi$analyze(text, top_n = 3, match_option = Match$ALL, stopwords = FALSE)

Arguments

text: char(required): target text.
top_n: int(optional): number of result. Default is 3.
match_option: match_option Match: use Match. Default is Match$ALL
stopwords: stopwords option. Default is FALSE which is use nothing. If TRUE, use embaded stopwords dictionany. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

Returns

list of result.

Method `tokenize()`

Analyze text to token and pos result just top 1.

Usage

Kiwi$tokenize(
  text,
  match_option = Match$ALL,
  stopwords = FALSE,
  form = "tibble"
)

Arguments

text: char(required): target text.
match_option: match_option Match: use Match. Default is Match$ALL
stopwords: stopwords option. Default is FALSE which is use nothing. If TRUE, use embaded stopwords dictionany. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.
form: char(optional): return form. default is "tibble". "list", "tidytext" is available.

Method `split_into_sents()`

Some text may not split sentence by sentence. split_into_sents works split sentences to sentence by sentence.

Usage

Kiwi$split_into_sents(text, match_option = Match$ALL, return_tokens = FALSE)

Arguments

text: char(required): target text.
match_option: match_option Match: use Match. Default is Match$ALL
return_tokens: bool(optional): add tokenized resault.

Method `get_tidytext_func()`

set function to tidytext unnest_tokens.

Usage

Kiwi$get_tidytext_func(match_option = Match$ALL, stopwords = FALSE)

Arguments

match_option: match_option Match: use Match. Default is Match$ALL
stopwords: stopwords option. Default is TRUE which is to use embaded stopwords dictionary. If FALSE, use not embaded stopwords dictionary. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

Returns

function

Examples

\dontrun{
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")
}

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Kiwi$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

if (FALSE) {
  kw <- Kiwi$new()
  kw$analyze("test")
  kw$tokenize("test")
  }

## ------------------------------------------------
## Method `Kiwi$get_tidytext_func`
## ------------------------------------------------

if (FALSE) {
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")
}

Methods

Public methods

Method print()

Usage

Arguments

Method new()

Usage

Arguments

Method add_user_word()

Usage

Arguments

Method add_pre_analyzed_words()

Usage

Arguments

Method add_rules()

Usage

Arguments

Method load_user_dictionarys()

Usage

Arguments

Method extract_words()

Usage

Arguments

Method analyze()

Usage

Arguments

Returns

Method tokenize()

Usage

Arguments

Method split_into_sents()

Usage

Arguments

Method get_tidytext_func()

Usage

Arguments

Returns

Examples

Method clone()

Usage

Arguments

Examples

Method `print()`

Method `new()`

Method `add_user_word()`

Method `add_pre_analyzed_words()`

Method `add_rules()`

Method `load_user_dictionarys()`

Method `extract_words()`

Method `analyze()`

Method `tokenize()`

Method `split_into_sents()`

Method `get_tidytext_func()`

Method `clone()`