Usually, there will be more than one file within a project directory. The reason for this is that using having everything on a project in one Python file can be quite overwhelming. Therefore, it is conventional to split different types of codes and operations into categorical files. As such, when running a single file, Python does not know that another file also ‘exists’ and cannot use information from the other file in the file that it is trying to run. Therefore, it is a good idea to establish file(s) that hold all the variables that many files use. With this one file, many files can ‘import’ it and utilize the declared variables. This can especially be helpful in big projects that involve many files.
Hence, we will be using 5 different files for this HackerNews project:
- sharedvars.py – one single location for all shared variables such as the Postgres credential
- customFunctionsGeneral.py – functions for general use to be called in main.py. Includes initializing connections and managing Postgres database with Python SQLAlchemy.
- customFunctionsHackernews.py – functions to retrieve information from HackerNews (API requests)
- main.py – A file that conducts the ETL process, and what Task Scheduler will point to when executing weekly.
- runOnce.py – File that will be only run once, including creating Postgres tables and manipulating with database for only a one-time use.
WARNING: When using this Python structure approach, pycache is created as a “side-effect” so we must put it under the .gitignore file.