Friday, April 5, 2019
Linguistic Automatic Generation Natural Language
Linguistic automatic rifle Generation earthy lyric1. Introduction1.1. The Problem StatementThis thesis deals with the problem of rail look carmatic genesis of a UML Model from inwrought Language bundle destiny Specifications. This thesis describes the development of Auto warninger an machine-driven softwargon schema applied science tool that takes born(p) Language Softw are schema Requirement Specifications as In edit, performs an automated OO abstract and tries to call down an UML Model (a partial superstar and only(a) in its present state i.e. nonoperational Class diagrams l oneness(prenominal)) as output. The basis for Auto modeller is described in 23.1.2. MotivationWe conducted a compendious survey of the parcel Industry in Islamabad in beau monde to determine what human bodys of alter computer software Engineering Tools were required by the Software ho customs. The result of the Survey (see Appendix-I for the survey report) indicated that thither is d emand for such(prenominal) a tool as Auto Modeler. Since such tools i.e. 23 that take a leak already been developed are any not available in the market or are very expensive, and thus out of the go across of approximately packet houses. Therefore we decided to build our own tool that skunk be apply by the package package industry in tack to fascinateher to enable them to be more(prenominal) productive and competitive. that at present Auto Modeler is not ready for commercial use. But it is hoped that future versions of Auto Modeler will be able to cater to the needs of the Software Houses.1.3. Background1.3.1. The need for automatise Software Engineering Tools In this era of Information Technology great demands are placed on Software Systems and on all those that are involved in the SDLC. The developed software should not only be of high quality but it should excessively be developed in minimal cadence of condemnation. When it comes to Software quality, the software must be highly reliable and it should meet the customers needs and it should satisfy the customers expectations.Automated Software Engineering Tools push aside assist the Software Engineers and Software Developers in producing High Quality Software in minimal amount of time.1.3.2. Requirements Engineering Requirements engineering science consists of the following tasks 6 Requirements Elicitation Requirements Analysis Requirements Specification Requirements substantiation / Verification Requirements ManagementRequirements engineering is recognized as a critical task, since many software failures raise from inconsistent, incomplete or simply in make System Requirements preconditions.1.3.3. Natural Language Requirement Specifications Formal methods have been successfully used to express Requirements Specifications, but lots the customer cannot understand them and on that pointfore cannot validate them 4. Natural Language is the only common medium unsounded by both the Customer and the Analyst 4. So the System Requirements Specifications are often written in Natural Language.1.3.4. Object Oriented Analysis The System Analyst must manually sour The Natural Language Requirements Specifications Document and perform an OO Analysis and wee the results in the form of an UML Model, which has become a Standard in the Software Industry. The manual process is laborious, time consuming and often prone to errors. Some specified requirements might be left out. If there are problems or errors in the get- closing requirements specifications, they may not be discovered in the manual process.OOA applies the OO effigy to models of proposed organizations by defining classes, objects and the relationships between them. Classes are the most important building block of an OO system and from these we instantiate objects. Once an somebody object is created it inherits the aforesaid(prenominal) operations, relationships, semantics, and attributes identified in the class. Att ributes of classes, and hence objects, hold values of properties. Operations, also called methods, describe what can be done to an object/class.1A relationship between classes/objects can interpret various attributes such as aggregation, composition, generalization and dependency. Attributes and operations toy the semantics of the class, while relationships represent the semantics of the model 1. The KRB seven- misuse method, introduced by Kapur, Ravindra and Brown, proposes how to find classes and objects manually 1. wherefore, mention candidate classes (nouns in NL). Define classes (look for instantiations of classes). Establishing associations (capturing verbs to create association for each gallus of classes in 1 and 2). Expanding many-to-many associations. Identify class attributes. Normalize attributes so that they are associated with the class of objects that they truly describe. Identify class operations.From this process we can see that one goal of OOA is to identify NL concepts that can be alter into OO concepts which can then be used to form system models in particular notations. Here we shall rivet on UML 1.1.3.5. Natural Language Processing ( world talk language technology) If an automatic analysis of the NL Requirements Document is carried out then it is not only possible to quickly find errors in the Specifications but with the right methods we can quickly dedicate a UML model from the Requirements.Although, Natural language is inherently ambiguous, imprecise and incomplete often a indispensable language document is redundant, and several classes of terminological problems (e.g., jargon or specialist terms) can arise to pack communication difficult 2 and it has been proven that Natural Language processing with holistic objectives is a very multifactorial task, it is possible to extract sufficient marrow from NL times to produce reliable models. Complexities of language range from simple synonyms and antonyms to such complex issues as idioms, anaphoric relations or metaphors. Efforts in this particular area have had some success in generating static object models using some complex NL requirement excoriates.1.3.5.1. Linguistic analysis Linguistic analysis studies NL schoolbook from different linguistic levels, i.e. lyric, blame and meaning.1(i) Word-tagging analyses how a tidings is used in a sentence. In particular, rule books can be changeable from one sentence to another depending on context (e.g. light can be used as noun, verb, adjective and adverb and while can be used as preposition, conjunction, verb and noun). Tagging techniques are used to specify vocalize-form for each single word in a sentence, and each word is tagged as a Part Of Speech (POS), e.g. a NN1 tag would come to a singular noun, while VBB would signify the secondary form of a verb.1(ii) syntactic analysis applies artistic style marker, or labeled bracketing, techniques to segment NL as explicates, clauses and sentences, so t hat the NL is delineated by syntactical/grammatical annotations. Hence we can shows how words are classifyed and connected to each other in a sentence.1(iii) semantic analysis is the composition of the meaning. It uses discourse annotation techniques to analyze open-class or content words and closed-class words (i.e. prepositions, conjunctions, pronouns). The POS tags and syntactic elements mentioned previously can be linked in the NL text to create relationships.Applying these linguistic analysis techniques, born(p) language processing tools can carry out morphological processing, syntactic processing and semantic processing. The processing of NL text can be supported by semantic Network (SN) and corpora that narrow up a companionship base for text analysis.The difficulty of OOA is not just due to the ambiguity and complexity of NL itself, but also the gap in meaning between the NL concepts and OO concepts.11.3.6. From NLP to UML Model Creation. After NLP the sentences are simplified in order to make identification of UML model elements form NL elements easy. Simple Heurists are used to Identify UML Model elements from Natural Text (see Chapter 7)* Nouns indicate a class* Verb indicates an operation* Possessive relationships and Verbs uniform to have, identify, denote indicate attributes* Determiners are used to identify the multiplicity of roles in associations.1.5. Plan of the thesisIn Chapter 2 we present a brief survey of previous work and work similar to our work. Chapters 3, 4, 5, 6 and 7 describe the suppositional basis for Auto Modeler. Chapter 8 Describes the Architecture of Auto Modeler. In Chapter 9 we describe Auto Modeler in action with a case study. In Chapter 10 we present conclusions.2. Literature SurveyThe archetypical relevant published technique attempting to produce a systematic procedure to produce throw models from NL requirements was Abbot. Abbott (1983) proposes a linguistic based method for analyzing software requirements, expressed in English, to derive basic data types and operations. 1This admission was further developed by Booch (1986). Booch describes an Object-Oriented Design method where nouns in the problem description designate objects and classes of objects, and verbs suggest operations.1Saeki et al. (1987) describe a process of incrementally frameing software modules from object-oriented specifications obtained from informal natural language requirements. Their system analyses the informal requirements one sentence at a time. Nouns and verbs are automatically extracted from the informal requirements but the system cannot determine which words are relevant for the construction of the formal specification. Hence an important role is played by the gentlemans gentleman analyst who re becharms and refines the system results manually after each sentence is processed.1Dunn and Orlowska (1990) describe a natural language interpreter for the construction of NIAM (Nijssens, or Natural-language, Information Analysis Method ) conceptual schemas. The construction of conceptual schemas involves allocating out objects to entity types (semantic classes) and the identification of elementary incident types. The system accepts declarative sentences only and uses grammar rules and a dictionary for type tryst and the identification of elementary fact types.1Meziane (1994) implemented a system for the identification of VDM data types and simple operations from natural language software requirements. The system first generates an Entity-Relationship Model (ERM) from the input text and then generates VDM data types from the ERM.1Mich and Garigliano (1994) and Mich (1996) describe an NL-based prototype system, NL-OOPS, that is aimed at the generation of object-oriented analysis models from natural language specifications. This system demonstrated how a titanic scale NLP system called LOLITA can be used to support the OO analysis stage.1V. Ambriola and V. Gervasi.4 have developed CIR CE an environment for the analysis of natural language requirements. It is based on the concept of successive transformations that are applied to the requirements, in order to obtain concrete (i.e., rendered) views of models extracted from the requirements. CIRCE uses, CICO a domain-based, fuzzy matching, take apartr which parses the requirements document and converts it into an abstract parse tree. This parse tree is encoded as tuples and stored in a shared repository by CICO. A group of related tuples constitutes a T-Model. CIRCE uses internal tools to refine the encoded tuples called extensional knowledge and the knowledge around the basic behaviour of software systems called intentional knowledge derived from modelers to further enrich the Tuple space. When a specific concrete view on the requirements is desired, a projector is called to build an abstract view of the data from the tuple space. A translator then converts the abstract view to a concrete view. In 5 V. Ambriola a nd V. Gervasi describe their experience of automatic synthesis of UML diagrams from Natural Language Requirement Specifications using their CIRCE environment.Delisle et al., in their project DIPETT-HAIKU, capture candidate objects, linguistically differentiating between Subjects (S) and Objects (O), and processes, Verbs (V), using the syntactic S-V-O sentence structure. This work also suggests that candidate attributes can be found in the noun modifier in colonial nouns, e.g. reserved is the value of an attribute of reserved book.1Harmain and Gaizauskas developed a NLP based chemise tool, CM-Builder 23, which, automatically constructs an initial class model from NL text. It captures candidate classes, rather than candidate objects.Brstler constructs an object model automatically based on pre-specified key words in a use case description. The verbs in the key words are transformed to behaviors and nouns are transformed to objects.1Overmyer and Rambow developed NLP system to constru ct UML class diagrams from NL descriptions. Both these efforts require user interaction to identify OO concepts.1The prototype tool developed by Perez-Gonzalez and Kalita supports automatic OO modeling from NL problem descriptions into UML notations, and produces both static and dynamic views. The underlying methodology accommodates theta roles and semi-natural language.13. Software Requirements EngineeringSoftware requirements engineering is the science and plain concerned with establishing and documenting software requirements 6. It consists of* Software requirements elicitation- The process through which the customers (buyers and/or users) and the developer (contractor) of a software system discover, review, articulate, and understand the users needs and the constraints on the software and the development activity.* Software requirements analysis- The process of analyzing the customers and users needs to arrive at a definition of software requirements.* Software requirements specification- The development of a document that give-up the ghostly and precisely records each of the requirements of the software system.* Software requirements verification- The process of ensuring that the software requirements specification is in compliance with the system requirements, conforms to document standards of the requirements phase, and is an adequate basis for the architectural (preliminary) design phase.* Software requirements management- The planning and controlling of the requirements elicitation, specification, analysis, and verification activities.In turn, system requirements engineering is the science and discipline concerned with analyzing and documenting system requirements. It involves transforming an operational need into a system description, system performance parameters, and a system configurationThis is accomplished through the use of an iterative process of analysis, design, trade-off studies, and prototyping.Software requirements engineering has a similar definition as the science and discipline concerned with analyzing and documenting software requirements. It involves partitioning system requirements into major(ip) subsystems and tasks, then allocating those subsystems or tasks to software. It also transforms allocated system requirements into a description of software requirements and performance parameters through the use of an iterative process of analysis, design, trade-off studies, and prototyping. A system can be considered a collection of hardware, software, data, people, facilities, and procedures organised to accomplish some common objectives. In software engineering, a system is a set of software programs that provide the cohesiveness and control of data that enables the system to solve the problem.6The major difference between system requirements engineering and software requirements engineering is that the origin of system requirements lies in user needs while the origin of software requirements lies in the sys tem requirements and/or specifications. Therefore, the system requirements engineer works with users and customers, eliciting their needs, schedules, and available resources, and must produce documents understandable by them as well as by management, software requirements engineers, and other system requirements engineers.The software requirements engineer works with the system requirements documents and engineers, translating system documentation into software requirements which must be understandable by management and software designers as well as by software and system requirements engineers. Accurate and well-timed communication must be ensured all along this chain if the software designers are to begin with a valid set of requirements. 64. Automated Software Engineering ToolsSoftware engineering is concerned with the analysis, design, implementation, testing, and maintenance of large software systems. Automated software engineering focuses on how to automate or partially autom ate these tasks to win significant improvements in quality and productivity.Automated software engineering applies computation to software engineering activities. The goal is to partially or fully automate these activities, thereby significantly increasing both quality and productivity. This includes the study of techniques for constructing, rationality, adapting and modeling both software artifacts and processes. Automatic and collaborative systems are both important areas of automated software engineering, as are computational models of human software engineering activities. Knowledge commissions and artificial intelligence techniques applicable in this field are of particular interest, as are formal techniques that support or provide theoretical foundations.7Automated software engineering approaches have been applied in many areas of software engineering. These include requirements definition, specification, architecture, design and synthesis, implementation, modeling, testing and quality assurance, verification and validation, maintenance and evolution, configuration management, deployment, reengineering, reuse and visualization. Automated software engineering techniques have also been used in a wide range of domains and application areas including industrial software, infix and real-time systems, aerospace, automotive and medical systems, Web-based systems and computer games.7Research into Automated Software Engineering includes the following areas* Automated reasoning techniques* Component-based systems* Computer-supported cooperative work* Configuration management* Domain modeling and meta-modeling* Human-computer interaction* Knowledge achievement and management* Maintenance and evolution* Model-based software development* Modeling language semantics* Ontologies and methodologies* Open systems development* harvest-festival line architectures* Program understanding* Program synthesis* Program transformation* Re-engineering* Requirements engineerin g* Specification languages* Software architecture and design* Software visualization* Testing, verification, and validation* Tutoring, help, and documentation systems5. Natural Language ProcessingNatural language processing (NLP) is a subfield of artificial intelligence and linguistics. It studies the problems of automated generation and understanding of natural human languages. Natural language generation systems convert information from computer databases into normal-sounding human language, and natural language understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate.5.1. Language ProcessingLanguage processing can be divided into two tasks11* Processing written text, using lexical, syntactic, and semantic knowledge of the language as well as any required real world information.11* Processing spoken language, using all the information needed above, plus additional knowledge around phonology as we ll as enough additional information to handle the further ambiguities that arise in speech.115.2. Uses for NLP5.2.1. User interfaces. Better than obscure command languages. It would be nice if you could just tell the computer what you ask it to do. Of course we are talking about a textual interface not speech.105.2.2. Knowledge-Acquisition. Programs that could read books and manuals or the newspaper. So you dont have to explicitly encode all of the knowledge they need to solve problems or do whatever they do.105.2.3. Information Retrieval. Find articles about a given topic. Program has to be able in some expressive style to determine whether the articles match a given query.105.2.4. Translation. It sure would be nice if machines could automatically translate from one language to another. This was one of the first tasks they tried applying computers to. It is very hard.105.3. Linguistic levels of AnalysisLanguage obeys regularities and exhibits useful properties at a add of rea sonably separable levels.10Think of language as transfer of information. It is much more than that. But that is a good place to start.Suppose that the speaker has some meaning that they wish to obtain to some hearer.10Speech (or gesture) imposes a linearity on the signal. All you can play with is the properties of a chronological sequence of tokens. Actually, why tokens? Well for one matter that makes it possible to learn.10So the other thing to play with is the order the tokens can occur.So somehow, a meaning gets encoded as a sequence of tokens, each of which has some set of distinguishable properties, and is then interpreted by figuring out what meaning corresponds to those tokens in that order.10Another way to think about it is that the properties of the tokens and their sequence somehow elicits an understanding of the meaning. Language is a set of resources to enable us to share meanings, but isnt best thought of as a means for *encoding* meanings. This is a sort of philosop hical issue perhaps, but if this point of view is true, it makes much of the AI approach to NLP somewhat suspect, as it is really based on the encoded meanings view of language.10The lowest level is the actual properties of the signal flowingphonology speech sounds and how we make themmorphology the structure of wordssyntax how the sequences are structuredsemantics meanings of the drawThere are important interfaces among all of these levels. For example sometimes the meaning of sentences can determine how individual words are pronounced.10This many levels is obviously needed. But language turns out to be more sharp than this. For example, language can be more efficient by not having to say the same thing twice, so we have pronouns and other ways of making use of what has already been saidA entertain went into the woods. It found a tree.Also, since language is most often used among people who are in the same situation, it can make use of features of the situationthis/thatyou/ me/theyhere/therenow/thenThe mechanisms whereby features of the context, whether it is the context created by a sequence of sentences, or the actual context where the speaking happens is called pragmatics.10Another issue has to do with the fact that the simple model of language as information transfer is clealy not right. For one thing, we know there are at least the following three types of sentencesstatementsimperativesquestionsAnd each of them can be used to do a different kind of thing. The first *might* be called information transfer. But what about imperatives? What about questions? To some degree the analysis of such sentences can involve the ideas of a basic notion of meaning Speech acts.10There are other, higher-levels of structuring that language exhibits. For example there is conversational structure, where people know when they get to talk in a conversation, and what constitutes a valid contribution. There is narrative structure whereby stories are put together in ways t hat make sense and are interesting. There is expository structure which involves the way that informative texts (like encyclopedias) are arranged so as to usefully convey information. These issues blend off from linguistics into literature and library science, among other things.10Of course with hypertext and multi-media and virtual reality, these higher levels of structure are being explored in new ways.105.4. Steps in Natural Language UnderstandingThe maltreats in the process of natural language understanding are115.4.1. Morphological analysisIndividual words are analyzed into their components, and non-word tokens (such as punctuation) are separated from the words. For example, in the phrase Bills house the proper noun Bill is separated from the possessive postfix s.115.4.2. Syntactic analysis. Linear sequences of words are transformed into structures that show how the words relate to one another. This parsing step converts the flat list of words of the sentence into a structure that defines the units represented by that list. Constraints imposed include word order (manager the key is an illegal constituent in the sentence I gave the manager the key) number agreement case agreement.115.4.3. Semantic analysis. The structures created by the syntactic analyzer are assigned meanings. In most universes, the sentence Colorless green ideas sleep furiously Chomsky, 1957 would be rejected as semantically anomalous. This step must map individual words into appropriate objects in the knowledge base, and must create the correct structures to correspond to the way the meanings of the individual words combine with each other. 115.4.4. conference integration. The meaning of an individual sentence may depend on the sentences that precede it and may influence the sentences yet to come. The entities involved in the sentence must either have been introduced explicitly or they must be related to entities that were. The overall discourse must be coherent. 115.4.5. Pragmatic an alysis. The structure representing what was said is reinterpreted to determine what was actually meant. 115.5. Syntactic ProcessingSyntactic parsing determines the structure of the sentence being analyzed. Syntactic analysis involves parsing the sentence to extract whatever information the word order contains. Syntactic parsing is computationally less expensive than semantic processing.10A grammar is a declarative representation that defines the syntactic facts of a language. The most common way to represent grammars is as a set of doing rules, and the simplest structure for them to build is a parse tree which records the rules and how they are matched. 10Sometimes backtracking is required (e.g., The horse raced agone the barn fell), and sometimes multiple interpretations may exist for the beginning of a sentence (e.g., Have the students who miss the exam ). 10Example Syntactic processing interprets the difference between John hit bloody shame and Mary hit John.5.6. Semantic Ana lysisAfter (or sometimes in conjunction with) syntactic processing, we must slake produce a representation of the meaning of a sentence, based upon the meanings of the words in it. The following step are usually taken to do this 105.6.1. Lexical processing. Look up the individual words in a dictionary. It may not be possible to choose a single correct meaning, since there may be more than one. The process of determining the correct meaning of individual words is called word sense disambiguation or lexical disambiguation. For example, Ill meet you at the diamond can be understood since at requires either a time or a location. This usually leads to p alludeence semantics when it is not clear which definition we should prefer. 105.6.2. Sentence-level processing. There are several approaches to sentence-level processing. These include semantic grammars, case grammars, and conceptual dependencies. 10Example Semantic processing determines the differences between such sentences as The in k is in the pen and The ink is in the pen.5.6.3. Discourse and Pragmatic Processing. To understand most sentences, it is necessary to know the discourse and pragmatic context in which it was uttered. In general, for a program to participate intelligently in a dialog, it must be able to represent its own beliefs about the world, as well as the beliefs of others (and their beliefs about its beliefs, and so on).10The context of goals and plans can be used to aid understanding. Plan recognition has served as the basis for many understanding programs PAM is an early example. 105.7. Issues in SyntaxFor various reasons, a lot of attention in computational linguistics has been paid to syntax. part this has to do with the fact that real linguistics have spent a lot of work on it. Partly because it needs to be done before just about anything else can be done. I wont talk much about morphology. We will assume that words can be associated with a set of features or properties. For example the word dog is a noun, it is singular, its meaning involves a kind of animal. The word dogs is related, obviously, but has the property of being plural. The word eat is a verb, it is in what we might call the base form, it denotes a particular kind of action. The word ate is related, it is in the past tense form. You can presuppose Im sure that the techniques of knowledge representation that we have looked at can be applied to the problem of representing facts about the properties and relations among words. 11The key observation in the theory of syntax is that the words in a sentence can be more or less naturally grouped into what are called phrases, and those phrases can often be treated as a unit.So in a sentence The dog pursued the bear, the sequence the dog forms a natural unit. The sequence chased the bear is a natural unit, as is the bear.11Why do I say that the dog is a natural unit? Well one thing is that I can replace it by another sequence that has the same referent, or a r elated referent. For example I could replace it by 11Snoopy (a name)It (a pronoun)My brothers favorite pet (a more complex description)What about chased the bear? Again, I could replace it bydied (a single word)was hit by a truck (a more complex event)This basic structure, in English, is sometimes called the subject-predicate structure. The subject is a nominal, something that can refer to an object or thing, the predicate is a verb phrase, which describes an action or event. Of course, as in the example, the verb phrase can also contain other constituents, for example another nominal. 11These phrases also have structure. For example a noun phrase (a kind of nominal) can have a determiner, zero or more adjectives, and a noun, maybe followed by another phrase, likethe big dog that ate my topographic pointworkVerb phrases can have complicated verb groups likewill not be eatenSyntactic theories try to predict and explain what patterns are used in a language. Sometimes this involves fi guring out what patterns just dont work. For example the following sentences have something wrong with them 11* the dogs runs home* he died the book* she saw himself in the mirror* they told it to sheFiguring out exactly what is wrong with such sentences allows linguists to create theories that help understand the way that sentences
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment