Additional Features

In-memory cache🔗

WordHoard uses an in-memory cache, which helps prevent redundant queries to an individual resource for the same word. These caches are currently being erased after each session.

Logging🔗

This application also uses Python logging, which is written to the logfile wordhoard_error.yaml. The maintainers of WordHoard have attempted to catch any potential exception and write these error messages to that logfile. The logfile is useful in troubleshooting any issue with this package or with the sources being queried by WordHoard.

Rate limiting🔗

Some sources have rate limits, which can impact querying and parsing for that source. In some cases exceeding these rate limits will trigger a Cloudflare challenge session. Errors related to these blocked sessions are written the wordhoard_error.yaml file. Such entries can have either a status code of 521, which is a Cloudflare-specific message or a status code of 403.

The maintainers of WordHoard have added rate limits to multiple modules. These rate limits can be modified, but increasing these predefined limits can lead to querying sessions being dropped or blocked by a source.

Currently there are 2 parameters that can be set:

max_number_of_requests
rate_limit_timeout_period

These parameters are currently set to 30 requests every 60 seconds.

from wordhoard import Synonyms

synonym = Synonyms(search_string='mother', 
                   max_number_of_requests=30, 
                   rate_limit_timeout_period=60)

results = synonym.find_synonyms()

When a rate limit is triggered a warning message is written to both the console and the wordhoard_error.yaml file. The rate limit will automatically reset after a set time period. This reset time period cannot be modified using a parameter passed in a Class object.

Proxy usage🔗

WordHoard provides out of the box usage of proxies. Just define your proxies config as a dictionary and pass it to the corresponding module as shown below.

from wordhoard import Synonyms

proxies_example = {
    "http": "your http proxy if available", # example: http://149.28.94.152:8080
    "https": "your https proxy",  # example: https://128.230.60.178:3128
}

synonym = Synonyms(search_string='mother', proxies=proxies_example)
results = synonym.find_synonyms()

It is highly recommended that a reliable commercial proxy service is used over free ones, such as free-proxy.cz or Free Proxy List..

User Agents🔗

WordHoard has an embedded file that contains an array of common user agents for these platforms.

user_agent_keys = {'chrome macOS': 'chrome_mac_os_x', 
                   'chrome windows': 'chrome_windows_10',
                   'firefox macOS': 'firefox_mac_os_x', 
                   'firefox windows': 'firefox_windows_10',
                   'safari macOS': 'safari_mac_os_x', 
                   'safari iphone': 'safari_iphone',
                   'safari ipad': 'safari_ipad', 
                   'android': 'samsung_browser_android'}

Without any intervention WordHoard is designed to randomly selected one of these user agents when a module (e.g. Synonyms) is being called. An end user can override this global randomness by passing a string to the variable user_agent.

For example an end user wanting to use only Safari iPhone user agents would do the following. The example below would randomly select a Safari iPhone user agent with each Class call.

from wordhoard import Synonyms
from wordhoard.utilities.user_agents import get_specific_user_agent

user_agent = get_specific_user_agent('safari iphone')

synonym = Synonyms(search_string='mother', user_agent=user_agent)
results = synonym.find_synonyms()

If an end user wants to pass a specific user agent they would do the following.

from wordhoard import Synonyms

synonym = Synonyms(search_string='mother', user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 14_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0.1 Mobile/15E148 Safari/604.1')
results = synonym.find_synonyms()

Last update: February 13, 2023