An analysis of intelligent methods and algorithms for unlabeled data processing


Cite item

Full Text

Abstract

Intelligent algorithms and method is well-suited to many problems in data processing, where unlabeled data may
be abundant. We survey previously used selection strategies for intelligent model, and propose two novel algorithms
to address their shortcomings, focus on Active Learning (AL). While has already been shown to markedly reduce
the annotation efforts for many sequence labeling tasks compared to random selection, AL remains unconcerned about
the internal structure of the selected sequences (typically, sentences). We propose a semi-supervised AL approach
for sequence labeling.

References

  1. Lewis D., Gale W. A sequential algorithm for training text classifiers // Proc. of the ACM SIGIR Conf. on R & D in Information Retrieval. 1994. P. 3-12.
  2. Settles B., Craven M., Ray S. Multiple-instance active learning // Advances in Neural Information Processing Systems (NIPS). 2008. Vol. 20. P. 1289-1296.
  3. Seung H. S., Opper M, Sompolinsky H. Query by committee // Proc. of the ACM Workshop on Computational Learning Theory. 1992. P. 287-294.
  4. Tong S., Koller D. Active Learning for Parameter Estimation in Bayesian Networks // NIPS. 2000. P. 647-653.
  5. Cohn D., Ghahramani Z., Jordan M. I. Active learning with statistical models // J. of Artificial Intelligence Research. 1996. Vol. 4. P. 129-145.
  6. Roy N., McCallum A. Toward optimal active learning through sampling estimation of error reduction // Proc. of the Intern. Conf. on Machine Learning (ICML). 2001. P. 441-448.
  7. Settles B., Craven M. An analysis of active learning strategies for sequence labeling tasks // Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP). 2008. P. 1069-1078.
  8. Lafferty J., McCallum A., Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data // Proc. of the 18th Intern. Conf. on Machine Learning. 2001. P. 282-289.
  9. Rabiner L. R. A tutorial on hidden markov models and selected applications in speech recognition // Proc. of the IEEE. 1989. Vol. 77. № 2. P. 257-286.
  10. McCallum A., Freitag D., Pereira F. Maximum entropy Markov models for information extraction and segmentation // Proc. of the 17th Intern. Conf. on Machine Learning. 2000. P. 591-598.
  11. Sang E. F. Introduction to the conll-2002 shared task: Language-independent named entity recognition // Proc. of the CoNLL-2002. 2002. P. 155-158.
  12. Stegeman L. Part-of-speech tagging and chunk parsing of spoken Dutch using support vector machines // Proc. of the 4th Twente Student Conf. on IT. 2006.
  13. Engel E. A. Modified artificial neural network for information processing with the selection of essential connections : Ph. D. thesis. Krasnoyarsk, 2004.
  14. Tong S., Koller D. Support vector machine active learning with applications to text classification // Proc. of the ICML-00, 17th Intern. Conf. on Machine Learning. 2000. P. 999-1006.

Supplementary files

Supplementary Files
Action
1. JATS XML

Copyright (c) 2011 Engel E.А., Engel' E.A.

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies

You consent to our cookies if you continue to use our website.

About Cookies