Strategy for Grammar Construction
Continuity
In the realm of language, compound verb forms can be subject to interruptions, allowing the insertion of various elements, such as adverbials and nominals. However, during the initial phase of grammar development, the focus is on considering compound verb forms as predominantly continuous entities, rather than discontinuous Parsing Bulgarian Verb Forms. This perspective is influenced by the paradigmatic nature of thinking at the early modeling stage of the verb complex. Unsupervised use of regular expressions to predict the potential occurrence of “external” elements within compound verb forms is not very effective. To handle the discontinuity of these forms, a careful set of rules needs to be established, requiring a thorough examination of syntagmatic patterns.
Longest Versus Shortest Match Principle
The ClaRK system’s cascaded regular grammar engine allows the identification of segm
Parsing Bulgarian Verb Forms
The process of building a grammar initiates with an abstraction from grammar books, typically intended for human understanding and not tailored for direct practical applications in software. Grammar books and paper dictionaries lack comprehensive sets of data structures essential for real-life software applications. Despite this limitation, a grammar is constructed through deductive inference, utilizing grammar books and the writer’s language competence. The outcome serves as an initial attempt to address parsing challenges within the realm of language, particularly focusing on Bulgarian verb forms Sources of Linguistic Knowledge. The principles guiding the initial phase of grammar construction include:
Exhaustiveness
The grammar writer considers all constructs representing tense, mood, and voice, encompassing simplex forms, various combinations of finite and non-finite auxiliary and main verb forms. Positive form
Sources of Linguistic Knowledge
Sources of Linguistic Knowledge and Grammar Writing Facilities
When tasked with constructing a grammar for parsing compound verb forms, the BulTreeBank project team provides a special-purpose corpus of one million word tokens, sourced from newspapers and organized in XML documents with TEI-conformant markup at the paragraph level. These texts undergo processing by a morphological analyzer and manual disambiguation using the constraint system in ClaRK (Simov et al. 2002a) Strategy for Grammar Construction. The electronic lexicon (Popov et al. 1998) used for morphosyntactic analysis contains entries for single words, limiting information about verb tense, mood, and voice to those present in single verb forms.
The encoded information includes three verb tenses (present, aorist, and imperfect), imperative forms for mood, and certain special conditional forms for the auxiliary verb “sam” (‘to be’).