Task: You have four documents: a fat cat sat on a mat and ate a fat rat, little funny fluffy cat, the cat, huge green

Автор Top-Urok.Ru

Task: You have four documents: a fat cat sat on a mat and ate a fat rat, little funny fluffy cat, the cat, huge green crocodile. And there are stop words: a, the, on, cat. A query comes in: funny fat cat. Find all the documents that have at least one word from the query. Take into account the stop words and do not include them in the search results. Approximate algorithm: Take a word from the query, checking that it is not a stop word; Search for the word in the container; Ask the container in which documents this word is found. The container already has the answer; Add the document indexes to the result; Repeat all the steps for each word in the query; Place all the results in a vector and send it to the user. The mentioned container in the algorithm is a map. The key in the map will be the word, and the value will be a certain container of documents in which this word is found. It is important to correctly write the elements to the dictionary when adding a document. The query funny fat cat should work as follows: Take the word funny, which is not in the stop word list; Look it up in the dictionary and find funny in it; The document 1 should already be written in the dictionary under the key funny; Add document 1 to the result vector; Take the word fat. It is also not in the stop word list; The document 0 is stored in the dictionary under the key fat; Add the document to the result; Cat is a stop word, so you are not interested in it; Return a vector with two elements, 0 and 1, to the user. Write the functions AddDocument and FindDocuments that would implement the solution to the described task. AddDocument should populate the word_to_documents index: void AddDocument(map >& word_to_documents, const set & stop_words, int document_id, const string& document); The function FindDocuments should search and provide the required document IDs in the form of a vector: vector FindDocuments(const map >& word_to_documents, const set & stop_words, const string& query); There should be no duplicates in the result vector. Use a set container as an intermediate to avoid duplicates. Do not change the signature of FindDocuments. STOP WORDS NUMBER OF DOCUMENTS DOCUMENT0 DOCUMENT1 DOCUMENT2 DOCUMENT3 QUERY a the on cat 4 a fat cat sat on a mat and ate a fat rat little funny fluffy cat the cat huge green crocodile funny fat cat Output (document IDs): 0 1

Детальное объяснение:

Задача: Поиск документов с использованием словаря

Описание:
Для решения данной задачи нам потребуется две функции: AddDocument и FindDocuments.

Функция AddDocument будет заполнять словарь word_to_documents, который будет содержать индексы документов, в которых найдены определенные слова. Функция принимает на вход словарь word_to_documents, множество stop_words (стоп-слова), идентификатор документа и сам документ.

Функция FindDocuments будет искать и предоставлять требуемые идентификаторы документов в виде вектора. Она принимает на вход словарь word_to_documents, множество stop_words и запрос (query).

Алгоритм решения следующий:
— Разделите запрос на отдельные слова.
— Для каждого слова в запросе:
— Проверьте, что слово не является стоп-словом.
— Проверьте, содержится ли слово в словаре word_to_documents.
— Если слово содержится в словаре, добавьте идентификаторы документов во временное множество.
— Верните результаты поиска в виде вектора без дубликатов.

Пример использования:
«`cpp
map<string, set> word_to_documents;
set stop_words

Ты знаешь ответ, а друзья - нет... Делись жмотяра!