Title | Author |
---|---|
README |
David Budzynski |
Inspired by cookiecutter data science repo
.
├── data
│ └── raw <- The original, immutable data dump.
├── Dockerfile <- Dockerfile for building the docker image
├── .editorconfig <- Configuration for your editor
├── LICENSE <- License for this project
├── Makefile <- Makefile with commands like `make data` or `make train
├── output
│ ├── data <- Processed data
│ ├── plots <- Generated graphics and figures to be used in reporting
│ └── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
├── README.md <- The top-level README for developers using this project.
├── references <- Data dictionaries, manuals, and all other explanatory materials.
├── rstudio-project-file.Rproj <- RStudio project file
└── src <- Source code for use in this project.
To initialize the project, simply run make init
. This will rename the
rstudio-project-file.Rproj
to the name of the directory you are currently in.
After initializing the project, you can start developing your analysis, by
adding any code to the src
directory. You can also add any data to the
data/raw
directory.
By using a Makefile, you can define targets that will run certain commands. For
example, you can define a target that will run your analysis, or clean the
project. To be able to use it, you need to have GNU Make installed on your
system. Learn more about GNU Make here To
read more about the existing Makefile, see the Makefile
in the root of the
project.
Most likely the most common targets you will use are:
- lint
- analysis
- clean
- all (lint and analysis)
To run the analysis target, you need to provide the commands that will run the analysis.
Additionally, the Makefile contain docker-specific targets, that will allow to build image and run the analysis in the container.
The project contains a Dockerfile
that will allow you to build a docker image.
This image uses the Rocker project as base image, and installs all the necessary
packages and dependencies. You need to pay attention to the Dockerfile
as it
contains important information about the image, like the R version, and the
packages that are installed. Make sure to update the Dockerfile
with the
correct information. For example, if you want to use a different version of R,
you can change that, as well as the packages that are installed. In order to
make sure your entire project is reproducible, you need to add run the analysis
as part of the docker image build process. This can be done by adding the
analysis
target to the Dockerfile
.
RUN make analysis
Once you have built the image, you can run the analysis in the container. This will allow you to add volume mounts, and run the analysis in a controlled environment and be able to see the results of your analysis executed in the container. To run the image, you can use the following command:
make docker-run
Docker is a very powerful tool, and it is a complex topic. To learn more about it, specifically about using docker with R, you can read Building reproducible analytical pipelines with R.
This is the place where you can add any information that is specific to your project. For example, you can add information about the data, the analysis, or any other information that is relevant to the project that will help other developers understand it better.