It shouldn’t come as any surprise that I think Machine Learning/Artificial Intelligence (and more generally data analytics) is one of the greatest things to arise in the last twenty years. In this period we have seen rapid developments in autonomous robots/vehicles, healthcare, research and Generative AI. This last one in particular has launched AI into the public conversation.
While the greater public engaging in discussion, debate and use of AI is a good thing, the issue that arises from this is that as increasing amounts of content are produced from these systems, the result is that we end up with an increased amount of hallucinations. This will bring into question the validity of all content. For me the first step of this is transparency. We need to know what people are using AI for, how they are using it so that we can critically evaluate the output they produce.
To this end, this blog post is about how I use AI in my life and why I think using AI is great for this use case. I will keep this page up to date as I change my workflows and use different tools and find different approaches.
Coding
Using GenAI for coding has massive potential to increase the quantity of code written; however, this speed comes at the cost of code quality. Studies have shown that AI generated code presents a downwards pressure on code quality evidenced by more frequent rewrites and duplicated code blocks.
To mitigate this, I limit the amount of code I use AI to write to small purposeful functions and boilerplate. This means that I am responsible for the larger decisions of a project. These usually are architectural and design-oriented decisions. To reinforce this I have also started ensuring that all my code, especially that written by AI has a wide range of tests that I have written to ensure quality. Furthermore, if it is the first time using a library I will try to learn the basics from documentation and write the majority of code myself. Ultimately, continuous learning is the core benefit.
When coding “by hand” I find myself not wanting to break out of flow state and will use GenAI to quickly show me the arguments and methods of a function or class. I find this much faster than searching through documentation or stack overflow. Aside from the benefit of speed, I can also provide more context to assist in finding the right solution. I also use this opportunity to learn more about the library I am using by asking follow up questions. My only real policy is this: question every output, remain curious, and leverage GenAI as a catalyst for deeper learning.
Collecting data from the internet often requires parsing large documents of either HTML, JSON or XML. All of these formats require additional libraries and code to navigate trees or find tags. This can be a lengthy process. Using GenAI to process these files, and write the code speeds up this process! For example, recently I have been scraping Boardgamegeek’s website and API to create a dataset for Kaggle. In this project I find myself using both HTML and XML formats. GenAI has made creating the code for these much easier as I only need to provide a single file, and some context about what I am looking for, the library I am looking to use, which results in a functional code foundation to build on.
Learning from papers
Reading and understanding the techniques and methods from scientific papers is an essential part of being a data scientist. However, these papers are often lengthy and highly information-dense. To speed this process up I use NotebookLM by Google.
For those that don’t know NotebookLM is a research tool created by Google, which allows you to input your own sources and then through a chat interface ask questions about the sources. Colleagues have previously asked me why do I not just use ChatGPT or another LLM. In the past, I have used other LLMs for information retrieval tasks and it has often missed the mark or made up content. Since NotebookLM uses RAG, it can reference the inputed sources, allowing you to validate the response by reading the exact segment of text it was using.
Aside from interacting with source material, NotebookLM can generate podcasts and mind maps to help learning. The AI generated podcasts are personal favourite of mine as I often end up generating these before I go for a run.
Writing
At first instinct one might think writing becomes much easier with GenAI, as you can write out some notes and include them in a prompt and get some text outputted very quickly. Personally, getting an LLM to write large quantities of text is a practice I avoid.
The philosophical arguments about human connection and dead internet theory aside, LLMs currently are just not good writers. It’s slop. Since LLMs are predictors based on training data, they will output what they have seen most in their training set. Since most people are not writing great works of literature we can assume that the output is most likely going to be not great. Evidence of this is that they overuse certain phrases, words, and cliches. More anecdotally, I find that they never quite understand the aim, complexity or nuance of what I am trying to write.
Instead, I use LLMs for feedback on my writing. I find that LLMs are a much better reviewer than writer. By using them as reviewers, and asking for comment on flow, structure and grammar, I can produce a much stronger piece of writing in a much shorter time frame. Before LLMs, I would read the same text many times before asking colleagues to review when they had time.
Crucially, this approach preserves the human communication purpose of writing.
References
Code quality
Gitclear’s latest report indicates GenAI is having a negative impact on code quality by Rob Bowley (17th Feb 2025)
AI is eroding code quality states new in-depth report by Tim Anderson for DevClass (20th Feb 2025)
LLM writing
ChatGPT Most Overused Words & Expressions by Buchert Jean-marc for (6th Aug 2025)
Typical AI Words and Phrases Used by LLMs by unknown author for ai-text-humanizer.com (Date not known)

Leave a Reply