How Code Notebooks Enable Open Source Research
2024-3-6 23:34:28 Author: www.bellingcat.com(查看原文) 阅读量:15 收藏

Bellingcat has also published a repository of open source notebooks which you can find on our GitHub here The number of open source tools out there is growing rapidly, but technical bars to entry mean they remain inaccessible to many researchers. GitHub, a platform where developers share and discuss their code, is home to many of these tools. Searching the website for open source investigation tools can appear daunting to the uninitiated — there are more than six thousand results. Beyond this, many more of the platform’s over 300 million other projects, from social media scrapers to AI models, also have a useful application in open source research. But even many experienced researchers don’t use these tools. A 2022 survey by Bellingcat found that 45 per cent of researchers can’t use these tools, and in total 75 per cent have never used them. The core issue is accessibility: most tools are code scripts and command line interfaces. There’s no user interface to install, no web page to go to. While we encourage researchers to learn the command line and also teach it at our workshops, some tools require setup, debugging, and coding knowledge that limits who can effectively use them. If you are part of that 45 percent or that 75 percent, there is a way to unlock this world of open source tools for your own research — code Notebooks! These are widely known as Jupyter Notebooks.

Notebooks started off as a scientific tool, and they are still mostly used for Data Science and AI projects, however their application can be much broader. Simply put, they are files in the .ipynb format where you can store and test code. They allow you to run Python, a coding language known for its simplicity. This is important for our purposes as it’s also the most popular language for open source research tools. They are run through interactive coding environments composed of sequential blocks (or cells) where each cell contains a piece of code or documentation about it. Usually a Notebook is accessed via a specific application on your computer where you can store and test code. Under the hood, the Notebook connects to a computer or server, its running environment. This can be your personal computer, if you set it up accordingly. But there’s a much easier way to familiarise yourself with Notebooks and that’s using online services that can read, display, and run themNotebooks. Some of these have a more accessible user interface and, crucially, require less knowledge of the command line. The most prominent of these is Google Colab, a browser tool which displays Notebook files no differently than a normal Google Doc or Sheets document with an accessible interface to match. They can be organised in your Drive, shared with others, edited by multiple people and, most importantly, executed safely like in a virtual machine. For example, the cell in the screenshot below from Google Colab contains a simple Python code. Pressing the play button in the top left-hand corner tests the code. Below the cell you can see the result of the code as executed in a remote server hosted by the Notebook service you’re using.

There are also several alternative platforms like Kaggle or Binder. By using them, all you need to run a Notebook is the Notebook file itself (a file ending in .ipynb), while the running environment is hosted on a remote server.

Why are Notebooks useful?

Notebooks can hugely simplify experimentation with coding tools, scripts and data analysis. They offer the following advantages:

      1. Accessibility — You don’t need to know how to code to run a Notebook, it’s a simple click, observe, and scroll experience. All you need is a browser and internet connection, you can even run them on your mobile devices.
      1. Security — When running a Notebook on Google Colab you do so in an isolated environment. Even if you download and execute a malicious piece of code, its impact is limited to the information you have on the notebook and not to your local computer, much like a virtual machine.
      1. Replicability — Many tools are built and tested on a limited number of operating systems and software versions. When installing and using them it’s not uncommon to get errors that are specific to your system because it was not part of the original tests. Online Notebook platforms give you standard environments, ensuring that what “works on my machine” will actually work on yours too.
      1. Readability — Notebooks look and feel like a text document. Good Notebooks add not just “code” cells but also “text” cells with rich markdown explanations of the code. They can be self-documenting. In fact, Notebooks can be exported to PDF format and shared as a static product so you can assess the code without needing to run it.
      1. Customisation — With minimal changes to the Notebook you can tailor its functions specifically for your investigations. When instructing a Notebook to install a tool to download a Youtube video, you can add a code cell where the video URL is specified. If you wanted to download a different video you’d only need to change the URL in that cell and hit run. Most tools and methods can be replicated by simple input changes such as the hashtag you search for, the website you’re inspecting, the dates you want to limit the results to. 6. Flexibility — Although notebooks are originally designed to run only code, the “code” cells can actually be used to run general command line instructions. This means a Notebook becomes like a virtual machine, allowing you to install a program, create a folder, download files, zip a folder and so on.

    Interacting With Notebooks

    You can explore our sample notebook on Google Colab and run the code in its cells one at a time. Each cell is a self-contained block that instructs the running environment to perform an action.

    Cells can be executed many times and in any order, but ideally a Notebook should be consistent. What you do with code in cells has a cumulative effect on the virtual environment which tests it. For example, if you run a cell that installs a tool needed to download YouTube videos (like yt-dlp), the next cell can download a video, but not if you skip the installation cell.

    Schematics of a Notebook with two code cells. If you run Code cell 2 before Code cell 1 you will get an error saying that yt-dlp is not installed.

    In the sample Notebook linked to above, we show you how to run a simple Python program, how to provide input so the Notebook reacts to your needs. You’ll learn the difference between Python and the command line and how to download a file to your notebook and then to your own device.

    Open Source Research Notebooks

    In our GitHub repository, you will find an updated list with Notebooks that help you run both Bellingcat and community tools. One example is Bellingcat’s telegram-phone-number-checker, a command line tool to find Telegram accounts from a phone number. While that tool is relatively simple to install on your computer, you will need to check if it is compatible with your version of Python or your operating system. Once again, the advantage of using our Notebook in Google Colab is that you don’t need to worry about all that, all you need are valid Telegram API keys you can get online. Our hope is that this repository  expands to include more Notebooks for useful tools and methods that lack a visual interface. In fact, we encourage you to give us feedback on GitHub if there’s a popular tool you’d like to see covered and also to list your own Notebooks so others in the open source investigations community can benefit from them. Much like the open source tools themselves, Notebooks may be new to many researchers. However, they provide so much out of the box convenience that learning to use them is time well-spent — the tools you’ll be able to use may just unlock your next investigation.


    Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Instagram here, X here and Mastodon here.


    文章来源: https://www.bellingcat.com/resources/2024/03/06/how-code-notebooks-enable-open-source-research/
    如有侵权请联系:admin#unsafe.sh