The Use of MATLAB in Open Science
Learn how to write transparent and comprehensible code using MATLAB®, create MATLAB code that is intra-operable with other languages, share research artifacts with your community, including those without a license, and use MATLAB on Science Gateways.
The different offerings for Open Science are divided into three categories aimed at making code transparent, available, and accessible.
- Making code more transparent deals with Live Scripts and using source control with MATLAB.
- The section on availability shows how to package software in projects and share them as MATLAB toolboxes. It introduces File Exchange as a platform for re-usable software and explains how you can share your code with non-MATLAB users through third-party access for academia or MATLAB integration with Python and other languages.
- Finally, the section on accessibility introduces MATLAB in a Docker container, Parallel Computing Toolbox™ for faster execution, and MATLAB Online™ and Simulink Online™. This section also shows what successful MATLAB integration on a Science Gateway looks like for researchers using online shared resources.
Published: 7 Feb 2021
My name is Shubo. And this is-- this webinar has a slight philosophical aspect to it, because it's not about an algorithm. But it's about how do you make your science more open, and what are the possibilities to do so. OK. So let's jump in. So before I dive into all of the details that you can use, I wanted to give you a very high level overview first of why MATLAB and Open Science. Because you might ask yourself, OK, MATLAB is a proprietary language. So how is that compatible with Open Science?
It turns out MATLAB is very well suited for open scientific principles. I'm going to show you how. So MATLAB might be a proprietary language, but it turns out that most of MATLAB's functions, most of MATLAB code, is actually open. It's transparent. You can open it irrespective of having a MATLAB license in any text editor and look at what was done, how it was done, and even get a reference. Most of our users, most of you, most of the four million people globally who use MATLAB do not actually care about what is inside the MATLAB code. What they care about, what they do is they use the MATLAB environment. This famous toolbar that you all know.
And they use it to take these pieces of code, pieces of proprietary code from MATLAB, which they can look in and see, and they use it. They use them. They trust them. They know they work. But they use them to build applications and analyses for their own data. And this environment, the MATLAB environment, makes this easy to do, because you can click and point and join things in a fairly easy manner. But all the time, it's not a black box. At every stage, you see the code that is behind this specific analysis. And definitely, all of the applications, the analyses you build, is not proprietary. It's yours to share.
And in fact, we encourage you strongly-- I'm going to move me out of the way here. We encourage you strongly to share your code. So if you go onto the MATLAB toolbar and you click on the community button there, this will take you to MATLAB Central, which is the MathWorks open exchange for the MATLAB and Simulink community. And this includes a free open-source tool exchange portal. You can test your coding capabilities there. You can answer challenges. You can join discussion forums with other MATLAB users. So it's really all about sharing and a lot of people on there.
And finally, we understand that as people move from local desktops to science gateways to supercomputing centers, MATLAB needs to be present everywhere. So it's about coverage. And as we were talking briefly before the webinar started, campus-wide and institution-wide licenses try to do that, get entire campuses under one license umbrella. So everybody on these campuses and across these campuses can easily share and collaborate. If you are not a part of campus-wide license-- we have 1,800 campus wide licenses at the moment. If you're not part of it, don't worry. I'll show you what you can do with your licenses. But it's about coverage, geographical coverage.
Then, MATLAB has strong integration capabilities now with several different languages, including Python, which means there is coverage across methods. If somebody you know is using a tool box in another language that you need to integrate, you can do that very easily from MATLAB. And finally, there is constant support. Our developers build constant support for various kinds of data formats in MATLAB. And this is quite important, because we know that in every single domain we have specific data file formats. And the more of these data file formats we support, the more domain coverage we have.
And you can get all of these different kinds of data in your favorite MATLAB environment. So that's a very high level overview of why MATLAB is compatible with Open Science. I'm now going to dive into specific
offering, specific features, that you can use to practice Open Science. So first, I'm going to tell you how you can use your data to tell a story. And I'm going to talk about developing software effectively, source control, talk a little bit about tips on sharing tools and software easily. I'm going to talk about downloading and reusing software for free from MATLAB Central, File Exchange.
I'm going to briefly-- in detail, talk about connecting with those users who are not MATLAB users, how you can share code with them, collaborators. I'm going to talk about building a portable software capsules with MATLAB, if you guys have heard of Docker, Docker containers. Then, I'm going to briefly touch upon big compute, big data, parallel computation, and talk about accessing MATLAB over the browser. And finally, I'm going to discuss having compute and data in the same place. This is actually my favorite part of the webinar, because it is directly related to what I do. I'm going to talk about science gateways. OK.
And I try to classify these into three bins, transparent, available, and accessible. Although I admit, it's not a fair classification. OK. So let's start off with how you can use your data to tell a story. Now here, I show you to pieces of MATLAB code. They do exactly the same thing. On the left, you have your traditional MATLAB code, well commented, but still static code. And on the right, you have what we call a Live Script. And which one do you think is more transparent and more easy to understand when somebody is trying to grasp your workflow? The one on the right, the Live Script. So a Live Script is nothing else than a digital notebook.
It enables you to take MATLAB code and other related outputs together, and this related output can be text explaining what you did, graphics, equations, and combine them into one document. You've got user control, so that the user-- UI controls, so that the user can interact with your code by changing some parameters. And you can export this into a bunch of different formats, so you can share easily. And I find this really useful when you want to communicate your research. So I did this a little bit in lab meetings, where people keep asking, what exactly did you do? How do you normalize? Collaborators can be great too. It allows others to actually interact with your code.
And sometimes when you need to upload your code, documentation, and results in one document, that can come in handy. So now I'm going to show you how to come up with a Live Script. It's a short video. So just to introduce my MATLAB Workspace, this is my Editor, Live Editor in this case. And this is my Workspace, just so you're familiar with how I have my IDE organized. OK. So to start a new Live Script, go to the Home tab and you can just click on New Live Script. There's more than way one to do this. And once you do that, you'll have a Live Script.
Just a second, I'll get rid of these. Live Scripts are not .m files but .mlx files. And then you can go to the Live Editor, which is the toolbar with all the tools to make your Live Script. So first of all, you can go to Text, and you can choose a heading. You can give your Live Script a heading, so create a simple script. And here in this case, we are going to make a very simple sine wave, nothing fancy. And you have some text explaining.
It'll take amplitude and frequency and construct a sine wave. And then, you go and click on Code, and that inserts a little code block. And there you can type in your code, which is basically an amplitude and frequency variable, a time axis, and plot the sine wave. Go ahead and click on Run. It gives you the output, and you can have this output on the side. You can also have an inline, like I showed in the previous example. Now comes the interactive part of it.
Imagine someone wants to change around the frequency of your sine wave. So they can obviously go change this value and rerun the code. But a much easier way, a much more useful way of doing it, is
going to control and replacing it with a UI controller, in this case a numeric slider. So now, you set the limits, 0 and 20, and by moving the slider along, the user can choose any frequency between 0 and 20 Hertz, and the output automatically updates. So it becomes quite interactive. Now we're going to do something else.
We are going to do some standard data pre-processing tasks. We're going to find maxima, local maxima. You can use any other thing, but this is just a standard one. To do these kind of tasks, go to the Task button. And you'll have a bunch of pre-programmed plug and play task modules, and we select Find Local Extrema from this palette and we just insert here. And what that does is give you a GUI. And you can just choose your signal, your input signal, from one of your Workspace variables, which would be y. And that immediately takes it and outputs the result of this maxima detection.
And this comes back to my point about transparency. At every point, you can go to those three dots, and you can look at the code behind this pre-programmed GUI. So you can learn how this was done and you can do it yourself. So there you have it. That's how you put together a Live Script. You can put in equations, links, and you can even put in a table of contents. And it's like a digital notebook, and you can go ahead and share this with your collaborators. Now, if you're interested in more Live Scripts, we've got a Live Script Gallery from MathWorks. I've shown you the link on the slide. It's got a bunch of Live Scripts from different disciplines.
So if you're interested, go ahead and take a look. And a neat thing about the Live Script Galleries is you can actually open these Live Script online on your software browser-- you don't need a license for this-- and run them, and download the code from File Exchange. OK. Next, let's talk about developing software effectively. You can use Source Control. And what Source Control does-- you've got a working copy on your desktop, on your local machine, and you've got another copy, the repository, either on a server or on the cloud somewhere. And at every stage, your local copy, when you say so, is synchronized with this repository with the result that any collaborator can just download the most updated version, the synchronized version of your code, from this remote repository.
So it keeps track of changes. and it helps you collaborate efficiently. And two of the most common source control functionalities are from Git and Subversion, Git from GitHub. So now you can integrate Git and SVN with your MATLAB code. You can access them from the IDE, as well as from the command line using this command exclamation point and then git. OK. How do you do that? So first, from the IDE, if you go to your folder view-- so you go to Home, Layout, Show Current Folder, and you right click. You see Souce Control in the menu.
And then you go and click on Manage Files. You do this before you start a new project. And then you get this pop up, and you can put in your repository path. That can be a GitHub address. It can be a local inside lab. It can be a location on your local server. And what will happen from that point onwards is that this folder will be under Source Control. And whenever you do something, you can go into the-- you can do it from this menu, but you can also go into the command line and just type, git add, which adds all your changes. Git commit, which commits them with a specific comment from you. Excuse me.
And then git push and what that does is take all of your changes and commit them, and send them to the remote repository. So now, let's talk a little bit more about sharing tools and software, using projects and tool boxes, sharing software as a project or tool box. So what a project does is it packages all the dependencies with your code inside that project. So you can try this out. You can try out an example,
matlab.project.e xample.timesTable. If you type that into a command window, then you should get the MATLAB Project toolbar to pop up.
And in that Project toolbar, you see a dependency analyzer. And if you click that dependency analyzer, you see such a graph. And what this graph shows you is which pieces of code in that package are dependent on which other pieces of code. And on the side, you even get a list of all of the products, all of the MATLAB toolboxes, that you need to run this package.
So you can check if you have everything. If you want to share a project, then a good way to share it is by using a toolbox. What a toolbox does is it takes everything, and it spits out an executable, so that the user can just install your entire package using that executable, pretty much like you do with any program.
And to do that, it's pretty simple. You go on to the Add-Ons button, and you click on Package Toolbox, and then that'll take you a very easy set of click steps that will spit out the toolbox at the end. So now, you've developed a code effectively using Git source control with your collaborators. You've made it into a project. Where do you share it?
Now, there are a number of platforms to share. One platform that maybe you don't know about is MATLAB File Exchange. File Exchange is one part of MATLAB Central. File Exchange is a platform for repositories, code repositories. You can get, share, and reuse tools for free.
There are about 40,000 repositories available at the moment. These are authored by experts in the field with many thousands of downloads. This is what File Exchange looks like. So each of these are toolboxes that users have uploaded on the site. You have a nice search bar by discipline and subdiscipline.
So you can hone in on maybe the tools in your specific area of research. And you can also filter tools by type, so Simulink models versus toolboxes versus apps. And many of you probably share your toolboxes on GitHub. And if you do that, you can actually link them to File Exchange now without having to double the work.
You can simply link GitHub repository on File Exchange. Users can download using this link from your GitHub repository, and it just gives you more visibility for your toolbox. So I strongly encourage the toolbox authors out there to go and try out File Exchange. OK. Now let's talk about connecting with non-MATLAB users. And this is really about sharing MATLAB. And let's face it, not everybody has a license.
MATLAB is not the tool of choice for everyone. So what you do when you're working in a project when you have collaborators that don't use MATLAB. So the first scenario is you develop software, or you are the center of the project, and you have a bunch of other people who depend on your code, or who want to use your code, and they don't have a MATLAB license.
There are a number of possibilities, and it depends a bit on the time frame of this sharing. So the first is short term, 30 days. If somebody somewhere doesn't have a MATLAB license, and simply needs to run your code on their data once, then I think the easiest way to do that would be to ask them to simply download a free MATLAB trial, 30 days, and run the code. OK.
However, this is of limited use. So the most common scenario is an extended project, where several universities together collaborate on an academic project. And what you can do now for academic projects is you can share your license. And this is open access basically for educational institutions.
And this is called third party access. What happens is you, as the owner of that license from a university that has that license or having an individual license, you can open up. You can give these you can give your collaborators credentials and then they can access your license and use MATLAB. The only thing you have to be careful about is that MATLAB use is on your hardware. So you have to give them some
sort of access to your hardware. Or they could be visiting professors at your side. But that enables a lot of flexibility when working on these collaborative projects.
And the third option is long term. Say you've developed an app or you've developed an analysis, and you want this to be accessible for everyone for a long time. One of the things you could try out there are MATLAB Web Apps. It's very simple. You go to App Designer. And then you just simply convert your MATLAB App into a Web App by clicking on Share and clicking on Web App.
And what that does is it converts your app into an app that can be hosted on a web server and everybody, irrespective of whether they have a license, can dial in the web address and interact with your server. Now let's talk about connecting with non-MATLAB users in a slightly different context, which is users with other software.
So the earlier icon I showed you about you being in the center and others not having licenses is one scenario. A more common scenario when you develop software in a project is this icon here, where everybody is working together. And somebody has MATLAB, somebody uses Python, somebody with Java, and somebody is a C++. So the good news here is that MATLAB has integrated-- has integration with a bunch of other languages.
You can call other language libraries from MATLAB, including Python. You can also do it the other way around. So you can call MATLAB from another language. And again, there's a bunch of different languages supported.
I should mention here that the RESTful API, for web Services and reading data off the web, is quite useful if you're accessing an online database. To share MATLAB on Python, there's actually two ways you can do this. You can directly interface with MATLAB, something I'm going to show you. You can also take your MATLAB code and pack it as a Python package and make it available within Python.
So this is a demo to show you how to access MATLAB from within Python. So here, I have my MATLAB workspace with one variable y, my MATLAB command window. And this is Windows PowerShell, but you can use any other command line interface or Python IDE.
So let's get into Python. And then the first thing you should do is to type in import matlab.engine, which imports the MATLAB engine. So you can use MATLAB functionality from within Python. Then let's call matlab.engine.start_matlab. And let's use eng, a variable, as an instance of this class. So now you have eng, and you can call MATLAB functions from eng.
So let's call a MATLAB function. Let's call randn. And let's give it normal input like you would do in MATLAB And if you do that, then you see that the data type is a MATLAB data type, double. Now let's quit this command. I'm going to show you how to connect to an ongoing MATLAB sessions.
To do that, in your command window, type in matlab.engine.shareEngine. And what you're doing is you're sharing your current MATLAB session-- and you give it a name-- with Python. And now from Python, you can access the session, including its workspace. So now instead of saying, start_matlab, let's go and say, connect_matlab.
And let's give it the name of the session that we want to connect to, which was my session. And when we do that, we will be connected to this particular session. So now we can access variables in the workspace. And we do that using eng again. And we use the method workspace to access the workspace.
And within square brackets, you can type in the name of the variable you want to access, y in this case. And let's put that in a variable in Python. And now we can go ahead and play with it. You can even plot in this way, using MATLAB plotting features from Python.
So you can call figure using this method, just like you normally would. So you give it a name. Let's call this CalledFromPython, so we know this was this figure was actually called from Python. And if you do that, you get a MATLAB window. And you can start plotting in this window.
So let's plot our variable y in this window, using standard MATLAB plotting commands, so color line with linestyle. And there you go you've actually done some MATLAB plotting from Python. So this is just the tip of the iceberg. There's a bunch of other functionality.
This is a cheat sheet putting together some of the important things you should know when you do this kind of MATLAB Python integration. If you found it really useful, check it out. Check out the product pages on that. There's a lot more information as well as videos. OK. Let's go now a little bit more into working with shared data and reuse-- reproducibility of research.
So first let's talk about creating portable software capsules, MATLAB in a Docker container. So the fundamental problem is no matter how when you write your code-- sorry. No matter how well you write your code, you often get this situation that at some point, many years later, when you want to reuse your code, you figure out your code doesn't work.
And this is a big issue in research, the reproducibility crisis and all of this. And then you could have other issues. You old version might not be supported, maybe your operating system changed, the DLL is missing, or dependencies are missing. And what you can do in that case is you can use a Docker container.
For those of you who don't know this, a Docker container is essentially a capsule or a box, in which your code, with the data it needs to run, with all of the dependencies, even some aspects of the operating system are packed inside this box. And you can just download this box, this container, from Docker Hub, and you can fire it up, and it'll run. It'll take some resources from your host operating system, and it will run. And the nice thing about that is now you have a portable container that runs anywhere that you go. And it's portable, it's reproducible, and reusable. And now it comes with MATLAB in it.
So if you want to have MATLAB in a Docker container, the link below gives you the link to the GitHub repository with the architecture to construct your own Docker file. And if you're interested in seeing how MATLAB and Docker runs online, I'm going to share an example with you from Code Ocean. Code Ocean is a science gateway. They encourage you, once you have a publication, to upload the code that is associated with that publication, with the idea that they're going to pack it in a container for you, and keep it on their website, and anyone can go and spin that container up, and run that code. So it's trying to tackle the reproducibility challenge.
So if you go to Code Ocean's website and you go to Explore, you see a bunch of these Docker containers, of capsules as they call it, with subject area, and sometimes, often the publication associated with it. And here you see the different capsules. You see some are C++, some are MATLAB and some are Python languages. And if you click on one of these capsules, then it takes you to an IDE for MATLAB on Code Ocean. And on the left, you have the MATLAB-- sorry this was a bit too fast. On the left here, you have the MATLAB m files. And this is the editor.
And on the right here, you have this thing called Reproducible Run. And if you click on Reproducible Run, then it fires up this container, and it spins up this container. And now you're going to see what happens. Once the container spins up, you're going to see MATLAB come in the command line.
And then it spits out the results of that particular computation. So it's actually very nice to have code in a reproducible fashion. Of course, if you don't want to run it online, the container, you can just simply download all of those files on your local machine and play with it. OK. So that's the idea of MATLAB in a Docker container. That's a practical example.
Now, I'd like to briefly touch upon big compute and big data. In many, many fields of research, data is increasing exponentially. 20 years ago, when I started my PhD, in neuroscience, we were recording from one electrode, maybe a few electrodes. People today have recording chips that have several hundred electrodes, and they put several of these chips in the brain.
And I'm sure this is true of many other fields as well. So as data becomes big, data handling becomes an issue. And even when we work on our local lab workstation, we are using more compute resources. We may be using a multi-core CPU, or we are using a GPU.
And many of us are using facilities like a cluster. It can be on-prem. It can be a supercomputing center or an HP center. It can be one of the clouds. It can be private clouds, like AWS and Azure. And we deal with big data. And what you can do with MATLAB Parallel Computing Toolbox and MATLAB Parallel Server is you have a way of easily scaling up your code for parallelization.
So there's a couple of changes you need to make. You don't need to recode it and rewrite your code again for deployment on a cluster. A couple of clicking points and you can actually deploy it. With MATLAB Parallel Server, which works on the server side of this, that requisitions the server resources in an intelligent so that your code can run in an optimal fashion.
You can access all available clusters now that particular service offers you. And you can do this with a few changes in code as I said. So if parallel computing is what you're interested in, we have a lot of information, including videos and demos on parallel computing, and you should check them out. Here are some links to get you started.
And now, I would like to talk about accessing MATLAB from your browser. So these days, it's becoming quite common that people do browser-based work. And we are all away from our workstations during the COVID crisis. So now, you can access MATLAB and even Simulink online through your browser via MATLAB and Simulink Online.
Any internet connection browser enables you to do that. If you don't have any installation, you can collaborate over your browser. You can synchronize your data with cloud storage called MATLAB Drive. You can even host your own MATLAB Online, which I will show you in a couple of slides. So here's how MATLAB Online works. You type in matlab.mathworks.com in the browser window.
Of course, you need to have a license to use MATLAB Online. And then you get the MATLAB IDE inside your browser. And you can do everything you can with normal MATLAB. You can play around with graphics, deploy apps, you can run Live Script. You can go on to your Add-Ons link and get access to File Exchange, and actually import File Exchange tools, and install them.
And you can put data from your local machine onto MATLAB Drive. And this is showing you that. The video now is showing MATLAB Drive. And then you can access MATLAB Drive from MATLAB Online and access your data. OK. So the last point I want to go over with you with having to compute next to data on science gateways.
So what is a science gateway? Science gateways provide access to resources for science and engineering. And these resources can be data, it can be code, it can be teaching resources, or it can be compute. And essentially, the way it works is groups of scientists all log in via an online portal.
And this online portal then is a conduit, is an interface, to different kinds of resources. It can be data banks, data repositories, it can be computational resources, or it can be repos of software tools. And all of this is the science gateway.
Advantages are A, you don't have to lug around huge amounts of data with you; B, it's affordable, accessible; and it really helps you with sharing things easily, reusing research output reproducing. What you can do is if you have a favorite science gateway where you would like MATLAB to be on-- first of all, get in touch with us. We'll help you. And what the science gateway can do is it can become a hosting provider, which means it has all of the MATLAB resources, and it enables its users to use MATLAB.
And I'll give you an example of a science gateway that's actually doing that. Hydroshare is the science gateway of CUAHSI, which is the North American Association of Hydrolysis. These are people who study water. And this is Hydroshare's website. And see, it's an online collaboration environment for sharing data, models, and code. Basically, if you go into Hydroshare, you get a bunch of data. And this is one example of a data resource. It does a particular analysis on a particular data file. You can go to Open with, and you can choose MATLAB Online for your data results.
And what that does is it spins an instance of MATLAB Online on the browser, and you can look at the resource and the analysis in MATLAB. And in this case, if you look at the web address, it says, matlab-online.cuahsi.org. So in this case, this science gateway is actually hosting MATLAB Online.
And your science gateway could do that too. So if you think this is something that would be interesting for you, please get in touch with us. OK. So in summary, I've gone through all of these different-- I brought this up in the beginning, to go to all of these different points. I have tried to cover them with offerings that are available to you that you can use optimally to do these kinds of things.
I hope there was some things in this list that you found useful, found new. And before we end, as a last slide, I would just like to say where I see in my vision your life or your research being affected by all of these tools. I think these help you collaborate globally. And I think the way I see-- I envision your workflow is it enables you to engage in global collaboration.
And there are several parts of this workflow where these tools can come in handy. You can create transparent code, share it, use it on File Exchange, do all of that stuff, a lot of features there. Institution-wide licensing helps cover a lot of institutions, so you can easily exchange information between people in these institutions.
If you're not on-- even if your collaborator doesn't actually have a license, flexible licensing means you can open up your license and third party access and share them-- share it with your collaborators. And you can even take your license with you when you access a remote facility. It can be a supercomputing center. It can be a science gateway. And finally, as you move around, MATLAB is accessible in more and more places, accessible on science gateways, accessible via the browser using MATLAB and Simulink Online. So hopefully, all of these things combined will make your life collaborating a little bit easier.