Jedną z głównych zalet reprezentowania dokumentów w postaci cyfrowej jest łatwość wyszukiwania w nich wyrazów i fraz. W wersji papierowej księgi “Gra o tron” próba odnalezienia pierwszego wystąpienia imienia bohaterki Daenerys może przypominać “poszukiwanie igły w stogu siana”. A w wersji cyfrowej? Wystarczy użyć funkcji “Znajdź”. Równie łatwo jest zrealizować proces zastępowania – stosując globalnie polecenie “Zamień”, możemy na przykład wszystkie wystąpienia imienia Daenerys podmienić na pisownię Denerys.
It is commonly believed that the processing of text documents requires knowledge of a high-level programming language. A decade or so ago it was considered proper to have good knowledge of the Perl language, while today a specialist in the field “absolutely must” have mastery of Python. But is this knowledge really indispensable?
In this post I will show that even sophisticated tasks in text processing can be completed quickly and easily without knowledge of a single command in any programming language.
In a previous post on this blog I discussed the task of classification, which involves determining automatically which class a given object belongs to. A classification system assigns to each object a class label, where the number of possible labels is defined in advance.
A model is a representation of some entity, serving to make it easier to work with. A model is often a miniature version of an object. When playing with a miniature model of an aeroplane, a child can add or subtract various components – mount wings, for example, or remove an engine. In an atlas or on a globe, which represent the Earth, we can cover hundreds or thousands of miles with just one sweep of a finger. On the other hand, if we want to observe the motion of elementary particles, the model of the atom must be a great deal larger than the original. A model as a representation of something makes it easier for us to understand and get to know that thing, from the general concept to detailed properties.
A natural language is a language used by humans to communicate with each other, like English, Polish, etc. It contrasts, for example, with a programming language (Java, Python, etc.), which is used by humans to give instructions to a computer.
How can we determine whether an automatic translation system is doing its job – that is, translating texts correctly and preserving the original meaning? How should we compare the quality of two translation systems so as to choose the one that best meets our needs? I will be trying to answer these questions in this blog.