Commit 08082884 authored by Duc Cao's avatar Duc Cao

Source code of the modified wapiti library: allow the run wapiti model with a folder as input

parent 1e9d0be3
Wapiti - A linear-chain CRF tool
Copyright (c) 2009-2013 CNRS
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
18/12/2013
Release v1.5.0: Update mode and bug fixes
Add precision specifier for model dumping
Add update mode to modify a model
Lots of english corrections in the manual
Fix bug in model format compatibility
Fix memory allocation with large models
Fix small memory bug in quark database
Fix bug with bigram features in raw mode
23/04/2012
Release v1.4.0: Forced decoding, optimizer state, and bug fixes
Add forced decoding to partialy decode sequences
Add optimizer state saving for L-BFGS and R-PROP
Switched to elapsed time instead of wall time in progress
Fix bug in Makefile (thanks to Lars Buitinck)
Fix local normalization decoding in MEMM (thanks to Anoop Deoras)
Fix bad handling of objwin option
Fix bug in reader for single letter obs and lbl
03/11/2011
Release v1.3.0: MEMM, faster gradient and bug fixes
Added support for Maximum Entropy Markov Models.
Added use of atomic operation in gradient computation
Improved RProp numerical stability
Fix bug with unseen features in raw mode (thanks to Jurgen Van Gael)
Fix bug discarding some features in maxent mode (thanks to George Foster)
Switched code to stdint, should resolve some issue with size_t
on exotic systems.
07/07/2011
Release v1.2.0: RProp improvements and bug fix
Switch from splay-trees to critbit-tries
Add RProp+ and RProp- variants of the RProp algorithm
Add a new projection scheme for RProp with l1
Make maxent work with sequences
Add space matching in regexp
Fix a few small bugs
19/03/2011
Release v1.1.3: Some small improvements
New option --jobsize for fine grained multi threading
Improved SGD index construction : a lot faster
Fix a small bug in sparse multi-threaded gradient
12/11/2010
Release v1.1.2: Bug fix release
Fix a small bug in L-BFGS/OWL-QN, should improve a bit
convergence speed in some case.
Fix a bug in multi-thread job system thanks to Alexander Fraser,
should fix error rates and training speed on large dataset.
Fix two small memory leaks.
Some improvment in quark database handling.
24/09/2010
Release v1.1.1: Mainly multi-threading improvements
RPROP algorithm is now fully multi-threaded.
Error rate estimation during training is now multi-threaded.
Better jobs scheduling in multi-threaded gradient.
Multi-threading code can be disabled (compilation on Windows should
be simpler).
Fixed bug in L1 optimization with RPROP (should improve stability).
08/09/2010
Release v1.1.0: A few new features
Added maxent mode.
Added decoding through posteriors, this should improve accuracy
at the price of computational time.
Added the RPROP optimization algorithm.
Added absolute indexing in patterns.
Changed the scored output format as the posterior decoding
provide normalized score at each position. The output is now
compatible with CRF++.
Some code cleanup.
29/07/2010
Release v1.0.2: Mainly a bug fix version.
Fixed some memory leaks, thanks to David Keeler
Fixed argument processing to be more user friendly
Fixed small bug in model compaction
Added reading of raw files
Spell corrections in man page
18/06/2010
Release v1.0.0: Initial public version.
Wapiti installation
If you have a recent compiler, normally you can just do the classical:
make
make install
switch to super user for the second. If you want to install somewhere else than
in /usr/local you will have to edit the variable definitions at the head of the
Makefile.
You can disable the non C99 compliant features by modifying the wapiti.h in the
src/ directory. This should allow you to compile Wapiti on almost any platform
who have a C99 compiler.
CFLAGS =-std=c99 -W -Wall -Wextra -O3
LIBS =-lm -lpthread
DESTDIR=
PREFIX =/usr/local
INSTALL= install -p
INSTALL_EXEC= $(INSTALL) -m 0755
INSTALL_DATA= $(INSTALL) -m 0644
SRC=src/*.c
HDR=src/*.h
wapiti: $(SRC) $(HDR)
@echo "CC: wapiti.c --> wapiti"
@$(CC) -DNDEBUG $(CFLAGS) -o wapiti $(SRC) $(LIBS)
debug: $(SRC) $(HDR)
@echo "CC: wapiti.c --> wapiti"
@$(CC) -g $(CFLAGS) -o wapiti $(SRC) $(LIBS)
install: wapiti
@echo "CP: wapiti --> $(DESTDIR)$(PREFIX)/bin"
@mkdir -p $(DESTDIR)$(PREFIX)/bin
@mkdir -p $(DESTDIR)$(PREFIX)/share/man/man1
@$(INSTALL_EXEC) wapiti $(DESTDIR)$(PREFIX)/bin
@$(INSTALL_DATA) doc/wapiti.1 $(DESTDIR)$(PREFIX)/share/man/man1
clean:
@echo "RM: wapiti"
@rm -f wapiti
.PHONY: clean install
# Wapiti - A linear-chain CRF tool
Copyright (c) 2009-2013 CNRS
All rights reserved.
For more detailed information see the [homepage](http://wapiti.limsi.fr).
Wapiti is a very fast toolkit for segmenting and labeling sequences with
discriminative models. It is based on maxent models, maximum entropy Markov
models and linear-chain CRF and proposes various optimization and regularization
methods to improve both the computational complexity and the prediction
performance of standard models. Wapiti is ranked first on the sequence tagging
task for more than a year on MLcomp web site.
Wapiti is developed by LIMSI-CNRS and was partially funded by ANR projects
CroTaL (ANR-07-MDCO-003) and MGA (ANR-07-BLAN-0311-02).
For suggestions, comments, or patchs, you can contact me at lavergne@limsi.fr
If you use Wapiti for research purpose, please use the following citation:
@inproceedings{lavergne2010practical,
author = {Lavergne, Thomas and Capp\'{e}, Olivier and Yvon,
Fran\c{c}ois},
title = {Practical Very Large Scale {CRFs}},
booktitle = {Proceedings the 48th Annual Meeting of the Association
for Computational Linguistics ({ACL})},
month = {July},
year = {2010},
location = {Uppsala, Sweden},
publisher = {Association for Computational Linguistics},
pages = {504--513},
url = {http://www.aclweb.org/anthology/P10-1052}
}
*
U:Wrd-1 X=%x[ 0,0]
U:wrd-1LL=%X[-2,0]
U:wrd-1 L=%X[-1,0]
U:wrd-1 X=%X[ 0,0]
U:wrd-1 R=%X[ 1,0]
U:wrd-1RR=%X[ 2,0]
U:wrd-2 L=%X[-1,0]/%X[ 0,0]
U:wrd-2 R=%X[ 0,0]/%X[ 1,0]
*:Pos-1LL=%x[-2,1]
*:Pos-1 L=%x[-1,1]
*:Pos-1 X=%x[ 0,1]
*:Pos-1 R=%x[ 1,1]
*:Pos-1RR=%x[ 2,1]
U:Pos-2 L=%X[-1,1]/%X[ 0,1]
U:Pos-2 R=%X[ 0,1]/%X[ 1,1]
*:Pre-1 X=%m[ 0,0,"^.?"]
*:Pre-2 X=%m[ 0,0,"^.?.?"]
*:Pre-3 X=%m[ 0,0,"^.?.?.?"]
*:Pre-4 X=%m[ 0,0,"^.?.?.?.?"]
*:Suf-1 X=%m[ 0,0,".?$"]
*:Suf-2 X=%m[ 0,0,".?.?$"]
*:Suf-3 X=%m[ 0,0,".?.?.?$"]
*:Suf-4 X=%m[ 0,0,".?.?.?.?$"]
*:Caps? L=%t[-1,0,"\u"]
*:Caps? X=%t[ 0,0,"\u"]
*:Caps? R=%t[ 1,0,"\u"]
*:AllC? X=%t[ 0,0,"^\u*$"]
*:BegC? X=%t[ 0,0,"^\u"]
*:Punc? L=%t[-1,0,"\p"]
*:Punc? X=%t[ 0,0,"\p"]
*:Punc? R=%t[ 1,0,"\p"]
*:AllP? X=%t[ 0,0,"^\p*$"]
*:InsP? X=%t[ 0,0,".\p."]
*:Numb? L=%t[-1,0,"\d"]
*:Numb? X=%t[ 0,0,"\d"]
*:Numb? R=%t[ 1,0,"\d"]
*:AllN? X=%t[ 0,0,"^\d*$"]
This diff is collapsed.
This diff is collapsed.
*
U:Wrd-1 X=%x[ 0,0]
U:wrd-1LL=%X[-2,0]
U:wrd-1 L=%X[-1,0]
U:wrd-1 X=%X[ 0,0]
U:wrd-1 R=%X[ 1,0]
U:wrd-1RR=%X[ 2,0]
U:wrd-2 L=%X[-1,0]/%X[ 0,0]
U:wrd-2 R=%X[ 0,0]/%X[ 1,0]
*:Pos-1LL=%x[-2,1]
*:Pos-1 L=%x[-1,1]
*:Pos-1 X=%x[ 0,1]
*:Pos-1 R=%x[ 1,1]
*:Pos-1RR=%x[ 2,1]
U:Pos-2 L=%X[-1,1]/%X[ 0,1]
U:Pos-2 R=%X[ 0,1]/%X[ 1,1]
*:Pre-1 X=%m[ 0,0,"^.?"]
*:Pre-2 X=%m[ 0,0,"^.?.?"]
*:Pre-3 X=%m[ 0,0,"^.?.?.?"]
*:Pre-4 X=%m[ 0,0,"^.?.?.?.?"]
*:Suf-1 X=%m[ 0,0,".?$"]
*:Suf-2 X=%m[ 0,0,".?.?$"]
*:Suf-3 X=%m[ 0,0,".?.?.?$"]
*:Suf-4 X=%m[ 0,0,".?.?.?.?$"]
*:Caps? L=%t[-1,0,"\u"]
*:Caps? X=%t[ 0,0,"\u"]
*:Caps? R=%t[ 1,0,"\u"]
*:AllC? X=%t[ 0,0,"^\u*$"]
*:BegC? X=%t[ 0,0,"^\u"]
*:Punc? L=%t[-1,0,"\p"]
*:Punc? X=%t[ 0,0,"\p"]
*:Punc? R=%t[ 1,0,"\p"]
*:AllP? X=%t[ 0,0,"^\p*$"]
*:InsP? X=%t[ 0,0,".\p."]
*:Numb? L=%t[-1,0,"\d"]
*:Numb? X=%t[ 0,0,"\d"]
*:Numb? R=%t[ 1,0,"\d"]
*:AllN? X=%t[ 0,0,"^\d*$"]
This diff is collapsed.
This diff is collapsed.
# Unigram
*1:%x[-2,0]
*2:%x[-1,0]
*3:%x[ 0,0]
*4:%x[ 1,0]
*5:%x[ 2,0]
# Bigram
*6:%x[-1,0]/%x[0,0]
*7:%x[ 1,0]/%x[0,0]
# Trigram
*8:%x[-1,0]/%x[0,0]/%x[1,0]
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment