Formats¶
Corpus Gesproken Nederlands¶
-
exception
pynlpl.formats.cgn.
InvalidFeatureException
¶
-
exception
pynlpl.formats.cgn.
InvalidTagException
¶
-
pynlpl.formats.cgn.
parse_cgn_postag
(rawtag, raisefeatureexceptions=False)¶
GIZA++¶
-
class
pynlpl.formats.giza.
GizaModel
(filename, encoding='utf-8')¶
-
class
pynlpl.formats.giza.
GizaSentenceAlignment
(sourceline, targetline, index)¶ -
getalignedtarget
(index)¶ Returns target range only if source index aligns to a single consecutive range of target tokens.
-
intersect
(other)¶
-
-
class
pynlpl.formats.giza.
IntersectionAlignment
(source2target, target2source, encoding=False)¶ -
reset
()¶
-
-
class
pynlpl.formats.giza.
MultiWordAlignment
(filename, encoding=False)¶ Source to Target alignment: reads source-target.A3.final files, in which each source word may be aligned to multiple target words (adapted from code by Sander Canisius)
-
reset
()¶
-
targetword
(index, targetwords, alignment)¶ Return the aligned targeword for a specified index in the source words. Multiple words are concatenated together with a space in between
-
targetwords
(index, targetwords, alignment)¶ Return the aligned targetwords for a specified index in the source words
-
-
class
pynlpl.formats.giza.
WordAlignment
(filename, encoding=False)¶ Target to Source alignment: reads target-source.A3.final files, in which each source word is aligned to one target word
-
reset
()¶
-
targetword
(index, targetwords, alignment)¶ Return the aligned targetword for a specified index in the source words
-
-
pynlpl.formats.giza.
parseAlignment
(tokens)¶
Moses¶
-
class
pynlpl.formats.moses.
PhraseTable
(filename, quiet=False, reverse=False, delimiter='|||', score_column=3, max_sourcen=0, sourceencoder=None, targetencoder=None, scorefilter=None)¶
-
class
pynlpl.formats.moses.
PhraseTableClient
(host='localhost', port=65432)¶
SoNaR¶
-
class
pynlpl.formats.sonar.
Corpus
(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
-
class
pynlpl.formats.sonar.
CorpusDocument
(filename, encoding='iso-8859-15')¶ This class represent one document/text of the Corpus (read-only)
-
paragraphs
(with_id=False)¶ Extracts paragraphs, returns list of plain-text(!) paragraphs
-
sentences
()¶ Iterate over all sentences (sentence_id, sentence) in the document, sentence is a list of 4-tuples (word,id,pos,lemma)
-
words
()¶
-
-
class
pynlpl.formats.sonar.
CorpusDocumentX
(filename, tree=None, index=True)¶ This class represent one document/text of the Corpus, loaded into memory at once and retaining the full structure
-
paragraphs
(node=None)¶ iterate over paragraphs
-
save
(filename=None, encoding='iso-8859-15')¶
-
sentences
(node=None)¶ iterate over sentences
-
validate
(formats_dir='../formats/')¶ checks if the document is valid
-
words
(node=None)¶ iterate over words
-
xpath
(expression)¶ Executes an xpath expression using the correct namespaces
-
-
class
pynlpl.formats.sonar.
CorpusFiles
(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
-
class
pynlpl.formats.sonar.
CorpusX
(corpusdir, extension='pos', restrict_to_collection='', conditionf=<function Corpus.<lambda>>, ignoreerrors=False)¶
-
pynlpl.formats.sonar.
ns
(namespace)¶ Resolves the namespace identifier to a full URL