استخراج مجموعات كلمات من النص

05/03/2024آخر تحديث: 05/03/2024

5 دقيقة واحدة

To extract n words from each line of a text containing sentences, you can use Python. Here’s a function that achieves this:

python
def extract_n_words(text, n):
    lines = text.split('\n')
    result = []
    for line in lines:
        words = line.split()
        for i in range(len(words) - n + 1):
            result.append(' '.join(words[i:i+n]))
    return '\n'.join(result)

# Example usage
text = 'he "he" went "go" to "to" school "school" with "with" his "his" freinds "freind"'
n = 3
result = extract_n_words(text, n)
print(result)

This function first splits the text into lines using split('\n'), then splits each line into words using split(). It then iterates over each line, extracting n-word groups using a sliding window of size n and joining the words back together. Finally, it joins the extracted groups into a string separated by newline characters (\n).

This should produce the desired output where each line contains n words.

المزيد من المعلومات

للقيام بذلك، يمكنك استخدام مكتبة nltk لعملية الـ Lemmatization وre للتعامل مع النص بشكل أفضل. إليك كيف يمكنك تعديل الكود ليقوم بذلك:

python
import re
from nltk.stem import WordNetLemmatizer

def extract_n_words(text, n):
    lemmatizer = WordNetLemmatizer()
    lines = text.split('\n')
    result = []
    for line in lines:
        words = re.findall(r'\b\w+\b', line)
        lemmatized_words = [lemmatizer.lemmatize(word) for word in words]
        for i in range(len(lemmatized_words) - n + 1):
            result.append(' '.join([f'"{words[j]}"' for j in range(i, i+n)]))
    return '\n'.join(result)

# Example usage
text = 'he "he" went "go" to "to" school "school" with "with" his "his" freinds "freind"'
n = 3
result = extract_n_words(text, n)
print(result)

هذا الكود يستخدم nltk لتحويل الكلمات إلى صيغتها الأصلية (Lemmatization) وre للبحث عن الكلمات في النص. يقوم الكود بتجاهل الرموز والأحرف غير الأبجدية، ويقوم بفصل الجمل إلى كلمات ومن ثم يقوم بالـ Lemmatization على تلك الكلمات.

05/03/2024آخر تحديث: 05/03/2024

5 دقيقة واحدة

المزيد من المعلومات

كيفية الوصول إلى حجم العنصر الأب قبل الرسم في React

بحث معقد في ElasticSearch: بحث عن :Feed: واسترجاع السجلات لليوم السابق

مقالات ذات صلة

فهم أساسيات البرمجة الكائنية في بايثون

حلول لتحقيق استقرار قيم non_negative_derivative في InfluxDB.

حلول لتقليل الأكواد الزائدة في تطوير مواقع HTML وCSS

استكشاف قوة SQLAlchemy في تكامل بايثون مع قواعد البيانات