In automatic text classification, a computer system has the job of assigning texts to defined categories. In some applications the system itself decides how to define the categories (classes) of texts. For example, if someone wants to classify the latest news reports, but is not certain what topics they will relate to, they may tell the artificial intelligence (AI) system to divide the set of reports into a specified number of classes (10, say). The system will then group the reports in such a way that each category contains texts that use similar vocabulary.
Artificial intelligence (AI) is a type of computer system whose task is to imitate actions performed by a human.
How can we evaluate whether an AI system performs its task effectively? I will try to answer that question in this blog.
Morphological analysis is used to determine how a word is built up. The result of such analysis might be the statement that, for example, the word classes is made up of the stem class and the ending -es, which is used in English to make the plural form of certain nouns and the third person singular form of certain verbs. From this information we may deduce that classes is likely the plural of a noun (or the third person singular of a verb) whose base form, or lemma1, is class.
How does a child normally learn to read in its native language? First it gets to know all the graphical symbols (letters, punctuation marks, accent marks, and so on) that are used to write that language. Next it learns the relationships between symbols and sounds, and learns to connect the sounds into words and phrases while reading. After some time the child is able to interpret even whole sentences at a single glance.
The idea of using numbers to represent words, or texts made up of words, is “as old as time”. Texts are converted into number sequences in a process of encryption; then in the process of decryption the reverse operation is performed, in a manner known only to the intended recipient. In this way, the encrypted message cannot fall into the wrong hands. Encrypting tools are reported to have been used even in ancient Greece: “A narrow strip of parchment or leather was wound onto a cane, and text was written along it on the touching edges. The addressee, having a cane of the same thickness, could quickly read the text of the message. When unfurled, to show meaningless scattered letters, it would be of no use to a third party; it was understandable only to the intended recipient, who would match it to his template” (https://pl.wikipedia.org/wiki/Skytale).
One of the main advantages of storing documents in digital form is the ease of searching them for particular words and phrases. If you had the paper version of the book A Game of Thrones, and you wanted to find the first time the character name “Daenerys” appeared, it would be like looking for the proverbial needle in a haystack. But in the digital version? Simply use the Find function. Replacing text is just as easy: using a global Replace command, you could, for example, change all instances of “Daenerys” to the alternative spelling “Denerys”.
It is commonly believed that the processing of text documents requires knowledge of a high-level programming language. A decade or so ago it was considered proper to have good knowledge of the Perl language, while today a specialist in the field “absolutely must” have mastery of Python. But is this knowledge really indispensable?
In this post I will show that even sophisticated tasks in text processing can be completed quickly and easily without knowledge of a single command in any programming language.
In a previous post on this blog I discussed the task of classification, which involves determining automatically which class a given object belongs to. A classification system assigns to each object a class label, where the number of possible labels is defined in advance.
A model is a representation of some entity, serving to make it easier to work with. A model is often a miniature version of an object. When playing with a miniature model of an aeroplane, a child can add or subtract various components – mount wings, for example, or remove an engine. In an atlas or on a globe, which represent the Earth, we can cover hundreds or thousands of miles with just one sweep of a finger. On the other hand, if we want to observe the motion of elementary particles, the model of the atom must be a great deal larger than the original. A model as a representation of something makes it easier for us to understand and get to know that thing, from the general concept to detailed properties.
A natural language is a language used by humans to communicate with each other, like English, Polish, etc. It contrasts, for example, with a programming language (Java, Python, etc.), which is used by humans to give instructions to a computer.
How can we determine whether an automatic translation system is doing its job – that is, translating texts correctly and preserving the original meaning? How should we compare the quality of two translation systems so as to choose the one that best meets our needs? I will be trying to answer these questions in this blog.