Workflow Matlab with Git

Question

Sargondjani on 6 Jan 2022

0
Link

Direct link to this question

https://uk.mathworks.com/matlabcentral/answers/1623720-workflow-matlab-with-git

Edited: Bogdan Bodnarescu on 14 Jun 2022

I am new to Git. I mainly use Git to publish releases of Matlab packages. The packages consists of a core with some functions and folders. In addition to that I have support folders and files (with examples, documentation and some test scripts).

Before using git I would work on a new release by copying only the core, and some test scripts to a new folder. I would then update the core, and the test scripts until everything works. And then I would update the support folders and files such that they are consistent with the core, and publish the release.

My questions is: how would you organize this work flow with Git?

Intuitively, I would bascially keep doing the same: make a new branch say "update_main", and copy only the core and the test scripts into that branch. Make that branch work, and then update and add the support files to that branch. And finally merge (or simply overwrite) the "main" with the "update_main" and publish the new release.

This procedure ensures that the core and support files are consistent within each branch, which seems kind of important to me.

However, my suggested approach implies that I would hardly use Git for the main process, since I could just as well make a new directory, and build the new release there. Git would then only help with merging code (for example when I work on two updates at the same time).

One disadvantage of using branches in this case would be that they are not directly visible in directories, so I am not even sure using Git branches has that many benefits compared my old approach where I would build a new release in a new directory.

Does this make sense? Does anybody have any thoughts or advice on this?

2 Comments
Show NoneHide None

Ilya Gurin on 6 Jan 2022

First, welcome to Git! It's my all-time favorite productivity tool.

I'm not sure what exactly you're trying to do with your "packages." How many packages are we talking about? Do they have a common "core", with other elements that are unique to each package?

Sargondjani on 6 Jan 2022

@Ilya Gurin yeah, so far I love Git too! Just trying to get the hang of efficient workflows...

What I call a "package" is basically a toolbox (core) plus support files. So we may basically assume one package = one toolbox = one repository. Everything is unique for each toolbox (no overlap), so we can discuss as if there is only one toolbox.

Sign in to comment.

Sign in to answer this question.

Answer 1

Benjamin Kraus on 7 Jan 2022

1
Link

Direct link to this answer

https://uk.mathworks.com/matlabcentral/answers/1623720-workflow-matlab-with-git#answer_869825

Edited: Benjamin Kraus on 7 Jan 2022

Welcome to the world of version control. Git is an amazing and powerful tool, but it can take some getting used to.

I would suggest you take a step back from the specifics of your project (or even MATLAB) and just start with some Git (or even version control) basics.

The first basic is that if you find yourself copying an entire folder for a release, you are probably not using Git to it's full advantage. If you are a developer working alone on a project, you can still heavily benefit from Git without ever branching or copying your folder. I assume you've done this already, but start by creating a Git respository in your code directory, then checking in all the files. My suggestion would be to start with the latest release, check that code into Git, then immediately tag the current state (using git tag). By tagging this state, you can always restore your current working directory back to that state using git checkout). There is no need to manually make a copy of the folder. As long as you've committed any changes into Git, you can always use git checkout to switch to any other version of your files, all within the same directory.

As you work, whenever you complete a small chunk of work (it is up to you to decide what "small" and "chunk of work" mean), check that into Git. Every time you call "git commit" the code you've submitted is given a unique label, allowing you to restore that state. Git tag is just a way to give a friendly name (rather than a long and complicated automatically generated name). Once you are ready for a new release, use git tag again to name that specific version of the code.

The only reason to have two separate folders with to different copies of your code is if you want to be able to run two versions of your code at the same time (or perhaps open them side-by-side, but there are ways to do that in git as well). However, once your code is in Git, you shouldn't be copy/pasting an entire folder any more. You should be using "git clone", "git push", and "git pull" (and "git fetch") to create a clone of one directory into another directory. This isn't required, but this will work best if you pick a hosting service (like GitHub or GitLab), and then each copy of your code can synchronize with that server.

Once you've got those basics down, there are a few reasons to branch, such as:

You want to work on two independent features. You can create a branch for each feature, and then when the feature is done you can merge it back into the main branch.
You want to apply bug fixes to a past release, without incorporating all the new features into the past release. You create a branch based on the release (you can create a new branch from a git tag), and apply the fix to just that branch.
You are working in a team.

When you get to that point, you may want to look at some online articles and tutorials regarding different branching models for Git. There are a ton of different articles (and opinions) on this topic. I did some very very quick Google searching using search terms like "git branching models" or "git branching strategies" (and I am not endorcing any of these specific models), but to give you some specific examples of what I mean, here are some links:

This isn't required, but my personal recommendation would be to learn the command line versions of all of the above first, get a good solid understanding of how Git works and what it means to commit, branch, tag, merge, rebase, push, and pull. Once you've done that, you can start leveraging the tools built-in to MATLAB to make your life easier, but it will be easier to understand those tools if you've learned the command line versions.

I hope that helps get you started!

15 Comments
Show 13 older commentsHide 13 older comments

Benjamin Kraus on 7 Jan 2022

Issue 1:

If your goal is to just be able to arbitrarily pick which version to run, then you could do something like this:

addpath('C:\codeRepo')

cmd = sprintf('git check %s', opt_version);

system(cmd)

This just checks-out from git whatever version of the code you want to run, but only one directory is added to the path. Note this only works if you don't have to compile anything (and nothing is P-coded).

However, I agree, there are many reasons you want to have two directories with two different revisions of your code. The complication with this is keeping the two directories in-sync with one another, not just the code itself, but the underlying repositories. WIth git, every single copy of your repository has a copy of the entire git repository and history, so if you have two directories with code from your repository you need to do something to keep the two copies in-sync with each other.

If one of the two directories is static (for example, a past release that you are not changing any more), then I guess in this case the simplest approach would be to just copy the directory. At that point it may be considered best practice to delete the ".git" directory, to prevent you from accidentally committing new changes into that directory (and effectively making independent repositories that can now drift from each other). You could also look into git commands to do this for you (I've never used it, but "git archive" looks promising).

If you plan to actively develop in both directories, then you want to takes some action to allow you to keep git histories in-sync between the two copies. Ironically, this most likely involves a third copy of the repository. This third copy will be something called a "bare repository", which can either live on your local computer but much more commonly will live on a service such as GitHub or GitLab. The first hit I found on Google for bare repositories is this: https://www.saintsjd.com/2011/01/what-is-a-bare-git-repository/

Basically, the idea is you have two "working directories" and a single central respository. Each working directory has a copy of the repository but when you use "git push" and "git pull" you can synchronize the respository (git history) from each working directory with the main repository. The basic process is:

You start with a local git repository that exists exclusively in your working directory.
You create a "bare repository", either in a separate folder, or on a hosting service like GitHub. This repository starts with no history and no files.
You link your working directory with the bare repository using "git remote". Note that when you first create a new GitHub (or GitLab) repository, it prominently shows you exactly what code to run to complete this step, customized for your specific repository.
You "git push" from your local repository into the new central repository.
You use "git clone" to create the second working directory. When you use "git clone" the repository is already linked to the central repository that you cloned from.

I think technically you can do this with just two working directories (synchronizing between each other), but it is much more common to use the third bare repository. This process would be something like this:

You start with a local git repository that exists exclusively in your working directory.
You then run "git clone" to create a new repository based on the working directory.

Issue 2:

Deleting the examples may be a bit extreme. The examples should be checked-in to the same repository with the code, and then you can track changes to the code along with changes to the examples.

I think if you were going to follow a formal code development process, this is basically how you should do it:

Every example should have an associated test that runs the example against the current code base. You can take a look at the MATLAB Unit Test Framework for how to write MATLAB based tests.
When you start working on a new feature (not a new release, but a new feature) you create a new git branch (a "feature branch") for that new feature. A feature can be very small (one new input argument) or large (completely rewriting an entire function), that part is up to you.
As you develop the feature, before each commit you run the tests, if any of them fail you update the corresponding example immediately, so that each commit has examples that are consistent with the current code base.
Once the feature is done being developed, you can add any additional examples (and any corresponding tests) and commit those to the feature branch.
Once you are satisfied with the feature, you merge that feature branch back into the main branch.

Technically speaking, a very formal code development process will include something called "test driven development"... in other words, you update the examples and write the tests first, then you update the code until the tests pass.

The nice thing about this process is that when you are done with the feature, the examples are already done (and tested). And... be honest, how often have you discovered bugs in your code while writing examples? This process helps you fix those bugs while the code for the new feature is fresh in your memory, and prevent regressions. However, it can be a lot for a small project being developed by one person, so you can choose how much of this process to follow.

Perhaps a more casual variant of this approach would be to create a branch, develop several features, checking them in along the way, then just remember that before you merge that branch back into the main code you need to update the examples, then you just update the examples and commit those as a separate step.

Benjamin Kraus on 7 Jan 2022

I'm not aware of any mechanism built-in to git to flag a file as still needing to be updated.

One really simple idea would be to just add the following line of code to the top of every example:

error('This example has not been updated to reflect the new release yet.')

You can check-in that change instead of deleting the examples, so that the examples remain in the version control, but you have a clear way to search for and track the examples that have not been updated. Then, as you update examples, you can just delete that error line. Of course, you can use disp instead of error, or any other way of signaling the example is outdated.

Maybe another option that is less invasive, and perhaps even more useful: At the start or end of each exmaple, you have a statment to the effect of "This example was last updated for release number X" or "This example was last updated on December 24, 2021 for release number Y".

This provides your end-users with a clear statement that the example may be outdated, and also serves as a reference for you, you can quickly see what examples have not been updated.

There are ways to automate that with git (i.e. you can have git automatically update the date in the file whenever the file is modified), but that is probably more work that is necessary for your purposes.

If you went with the test based approach, there is a mechanism built-in to the testing framework for marking a test as not finished (or "expected to fail"). In this case, you would have a test file that runs all your examples, then instead of the error statement in the example, you would add this line of code to your test:

testCase.assumeFail('This example has not been updated to reflect the new release yet.')

This signals to the testing framework not to bother running this test because you know it will fail, but the test code remains, and the testing framework tracks "passed", "failed", and "filterd" tests, so you can keep track of tests (and therefore examples) that still need to be updated.

Benjamin Kraus on 7 Jan 2022

The common approach is to do all development on the same repository, and then tag releases. There are built-in mechanisms on GitHub to do release tracking, so users can always download the latest release, even if the code has been updated since then. Most projects on GitHub have the latest and greatest "bleeding edge" available with all the latest commits, but it will also have a tag with the latest release so you can download the version that is stable.

The important thing to note about git is that every copy of your repository is technically a different repository, and the repositories are kept in-sync using "git push" and "git pull". If you do all your work locally and never call "git push", the remote repository will never know.

Further, technically, each copy of the repository has it's own branches, but when you create branches you can tell git whether to mirror the branches across repositories. By default branches are not mirrored, unless you specificall tell it to push the branch to the remote server.

In addition, once you call "git push", what your end-users will see depends on your merging strategy: You can merge and preserve history, or you can "rebase" which will collapse all revisions into a single commit. Do a search for "merge vs. rebase" and I'm sure you can find some articles on the topic.

So, for instance, if your server (such as GitHub) has a repository, and you are doing local development, you may have dozens of local branches, but they will never be visible unless you push that branch to the server. You can accomplish your goal of hiding intermediate states by having release branches that are mirrored on the server, and development branches that are only local, and always use a rebase when merging from development branches into your release branch (which will "collapse" the history into a single commit).

Sargondjani on 12 Jan 2022

Yeah, its good you mention this testing thing. I also automated my testing a lot more. It helps a lot

Bogdan Bodnarescu on 14 Jun 2022

Edited: Bogdan Bodnarescu on 14 Jun 2022

Is there a similar topic for workflow Simulink with Git?

Until now I couldn't find a way to merge Simulink models that are modified on different branches, also merging the generated code files and all the other .mat files and other type of files that Embedded Coder generates is a nightmare.

For a simple branch merge it takes me around 1 day of work to solve all the conflicts and even after so much work I can still find bugs introduced by automatically merging of the simulink models.

Is there any way to tell git not to do anything with some specific file types, I saw that if I configure a Matlab Project then a .gitattribute file will be generated, but still the automatic merge will be triggered by using meld or modelmerge from Matlab so this doesn't seem to work either?

Should I make a new topic for Simulink and Git only?

Sign in to comment.

Workflow Matlab with Git

2 Comments
Show NoneHide None

Accepted Answer

15 Comments
Show 13 older commentsHide 13 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Workflow Matlab with Git

2 Comments Show NoneHide None

Accepted Answer

15 Comments Show 13 older commentsHide 13 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

2 Comments
Show NoneHide None

15 Comments
Show 13 older commentsHide 13 older comments