-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tricks to make saving images quicker? #636
Comments
@bigmit2011 Just Try Concurrent Processing The Large Data Read Out More https://docs.python.org/3/library/concurrency.html |
For concurrent processing, I highly recommend using the It is essentially the same as the python standard library That said, there are potentially several sources of inefficiency here:
Another point, implied above, regarding items 1 and 4 above. If indeed it is the case that a ticker may be in |
Hi, Thank you so much for the detailed reply. Regarding 1 and 2: The historical data files depend on the ticker. https://finance.yahoo.com/quote/AAPL/history/
` |
@bigmit2011 too achieved try store historical in a local database or cached file format. keep updating the last candle open, high, low close data which significantly reduces request time from Yahoo data. for Better performance, I like to suggest please use API |
Hi, I actually have data saved locally as csv files and am not scraping during the time of creating charts. |
I recently wrote some code for saving mplfinance chart images to disk using concurrent processing. https://github.com/BennyThadikaran/stock-pattern/blob/main/src/init.py#L107 Below is the main outline of the code using concurrent.futures module. It assumes your data is already on disk. If using network to download the data see the second part. import concurrent.futures
import mplfinance as mpf
import matplotlib.pyplot as plt
def process(sym):
"""This runs in a child process"""
# load the file in a DataFrame, do some processing
df = pd.read_csv("symfile.csv")
# switch to non interactive backend when working inside child process
plt.switch_backend("AGG")
plt.ioff()
mpf.plot(df, type="candle", style="tradingview", savefig=f"{sym}.png")
# return something usefull
return f"success {sym}"
def main():
"""Main entry point of script"""
futures = []
sym_list = ["tcs", "infosys"] # your fairly long list of symbols
with concurrent.futures.ProcessPoolExecutor() as executor:
for sym in sym_list:
# Pass process function and any additional
# positional arguments and keyword arguments to executor.submit
future = executor.submit(process, sym)
futures.append(future)
for future in concurrent.futures.as_completed(futures):
# do something with the result
print(future.result())
if __name__ == "__main__":
# run the script
main() If you're making network requests for stock data, you can get a big performance boost using asyncio (stdlib) from and aiohttp (external package). The benefit is not having to wait for each stocks data to be downloaded. With the Make sure to use a throttler or you will exceed the server api limits. async def main():
sym_list = [] # your symbol list
async with aiohttp.ClientSession() as session:
tasks = []
for sym in sym_list:
# call your data fetch function with create_task
# data_fetch takes the sym and session argument and calls session.get(url)
task = asyncio.create_task(data_fetch(sym, session))
tasks.append(task)
loop = asyncio.get_event_loop()
executor = concurrent.futures.ProcessPoolExecutor
futures_list = []
async for futures_completed in asyncio.as_completed(tasks):
stock_data = futures_completed.result()
futures = loop.run_in_executor(executor, process, stock_data)
futures_list.append(futures)
results = await asyncio.gather(*futures_list)
if __name__ == "__main__":
# run the script
asyncio.run(main()) |
I am using a friends script (so I don't know all the details in this script), but I wonder if there are simple tricks I can do here to make it save quicker.
I want to be able to save around 10k images.
I plan to incorporate multiprocessing to make it even quicker.
Thank you.
The text was updated successfully, but these errors were encountered: