Skip to contents

Kiwi class is provide method for korean mophological analyze result.

Methods


Method print()

print method for Kiwi objects

Usage

Kiwi$print(x, ...)

Arguments

x

self

...

ignored


Method new()

Create a kiwi instance.

Usage

Kiwi$new(
  num_workers = 0,
  model_size = "base",
  integrate_allomorph = TRUE,
  load_default_dict = TRUE
)

Arguments

num_workers

int(optional): use multi-thread core number. default is 0 which means use all core.

model_size

char(optional): kiwi model select. default is "base". "small", "large" is available.

integrate_allomorph

bool(optional): default is TRUE.

load_default_dict

bool(optional): use defualt dictionary. default is TRUE.


Method add_user_word()

add user word with pos and score

Usage

Kiwi$add_user_word(word, tag, score, orig_word = "")

Arguments

word

char(required): target word to add.

tag

Tags(required): tag information about word.

score

num(required): score information about word.

orig_word

char(optional): origin word.


Method add_pre_analyzed_words()

TODO

Usage

Kiwi$add_pre_analyzed_words(form, analyzed, score)

Arguments

form

char(required): target word to add analyzed result.

analyzed

data.frame(required): analyzed result expected.

score

num(required): score information about pre analyzed result.


Method add_rules()

TODO

Usage

Kiwi$add_rules(tag, pattern, replacement, score)

Arguments

tag

Tags(required): target tag to add rules.

pattern

char(required): regular expression.

replacement

char(required): replace text.

score

num(required): score information about rules.


Method load_user_dictionarys()

add user dictionary using text file.

Usage

Kiwi$load_user_dictionarys(user_dict_path)

Arguments

user_dict_path

char(required): path of user dictionary file.


Method extract_words()

Extract Noun word candidate from texts.

Usage

Kiwi$extract_words(
  input,
  min_cnt,
  max_word_len,
  min_score,
  pos_threshold,
  apply = FALSE
)

Arguments

input

char(required): target text data

min_cnt

int(required): minimum count of word in text.

max_word_len

int(required): max word length.

min_score

num(required): minimum score.

pos_threshold

num(required): pos threashold.

apply

bool(optional): apply extracted word as user word dict.


Method analyze()

Analyze text to token and tag results.

Usage

Kiwi$analyze(text, top_n = 3, match_option = Match$ALL, stopwords = FALSE)

Arguments

text

char(required): target text.

top_n

int(optional): number of result. Default is 3.

match_option

match_option Match: use Match. Default is Match$ALL

stopwords

stopwords option. Default is FALSE which is use nothing. If TRUE, use embaded stopwords dictionany. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

Returns

list of result.


Method tokenize()

Analyze text to token and pos result just top 1.

Usage

Kiwi$tokenize(
  text,
  match_option = Match$ALL,
  stopwords = FALSE,
  form = "tibble"
)

Arguments

text

char(required): target text.

match_option

match_option Match: use Match. Default is Match$ALL

stopwords

stopwords option. Default is FALSE which is use nothing. If TRUE, use embaded stopwords dictionany. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

form

char(optional): return form. default is "tibble". "list", "tidytext" is available.


Method split_into_sents()

Some text may not split sentence by sentence. split_into_sents works split sentences to sentence by sentence.

Usage

Kiwi$split_into_sents(text, match_option = Match$ALL, return_tokens = FALSE)

Arguments

text

char(required): target text.

match_option

match_option Match: use Match. Default is Match$ALL

return_tokens

bool(optional): add tokenized resault.


Method get_tidytext_func()

set function to tidytext unnest_tokens.

Usage

Kiwi$get_tidytext_func(match_option = Match$ALL, stopwords = FALSE)

Arguments

match_option

match_option Match: use Match. Default is Match$ALL

stopwords

stopwords option. Default is TRUE which is to use embaded stopwords dictionary. If FALSE, use not embaded stopwords dictionary. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

Returns

function

Examples

\dontrun{
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")
}


Method clone()

The objects of this class are cloneable with this method.

Usage

Kiwi$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples

if (FALSE) {
  kw <- Kiwi$new()
  kw$analyze("test")
  kw$tokenize("test")
  }

## ------------------------------------------------
## Method `Kiwi$get_tidytext_func`
## ------------------------------------------------

if (FALSE) {
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")
}