I remember reading the excellent Beej's Guide to Network Programming[0] and Beej's Guide to Unix IPC[1] as a teenager, which were incredibly approachable while still having depth—fantastic reads both and very influential on the programmer I ended up being.
I remember translating Beej's network guide to Italian while learning how to use select, which I wanted to learn to make some port scanner ("grabb' I think?) go faster. Fun times.
Came here to see if it was the same person, though I felt very sure with the throwback web design - back when each page had its character, and you had to save the page for offline reading so that Dad wasn't pissed at the phone bill!
And when the code worked - it was validation against all the previous failures (and rejections) in life! Oh the joy of sending message from one computer to the other!
Same here! I was also a teenager in the mid-90s. And I was amazed by IRCd server code and bots. I bought a used copy of the book Slackware Linux unleashed w/CD-ROM and it had some networking code examples in C. I found Beej's Networking site because I was confused by a lot of that networking code. Became even more obsessed and went a deep rabbit hole. I spent a lot of time visiting different book stores hoping they had programming books. Bought Richard Stevens' amazing reference books and never looked back. Thanks for enabling my passion all these years later Beej!
Not wrong, but since you’re mentioning vim in the context of git, might be worth adding :cq as a way to exit with a non-zero status to prevent git from finishing the commit / operation.
Actual Beej? Wow I remember absolutely loving reading your networking guide. It taught me so much and really showed me the depths and breadths of what can be done in code, how the net works (pun unintended), it was a great experience for me as a kid. Thanks! <3
Just a quick shout-out; I was one of the many many students you taught at Lambda School, and just wanted to say your instruction was one of the highlights of my time there. Thanks for doing what you do!
Beej, your Guide to Network Programming helped me through my early UNIX career. In fact, your guide was so influential to so many people, it very quickly became recommended reading in my university's network course.
I'm delighted to see that you're still active and still producing guides. Well done!
Along with many others here, your network programming guide helped me so much back in the early days of my education and career. So thanks for that too…
I found your networking guide as a kid with only some scripting experience, and it served to get me into C programming in general, so I have a special fondness for it.
Appreciate the work! Neat to see you still writing pieces like this all these years later!
I didn't even know git switch existed, let alone git checkout was considered the old alternative. I feel old.
To be fair I started learning git a little less than 10 years ago but woah, I can't express how it feels that someone learning git today will be confused of why I use git checkout. Like using old fashioned language.
More on topic, this guide would've been super useful when I was learning. It is really easy to follow and covers common FAQs.
I fondly remember being intimidated by my first merge conflict, aborting it and just doing some workarounds to prevent the conflict.
Git switch is fairly new, it first shipped in 2019.
Here's, respectively, a discussion from 2021, and a discussion from a few weeks ago. In the latter, it's brought up that `git switch` is still considered experimental by the docs:
Skimming over it, it looks like it's just been expanded out way more than what most guides would do. Like other guides would use a paragraph or two for what this one has spread over several sections with extra examples.
I think it's probably the opposite, Git has amassed a lot of complexity because it's been adapted to being a tool that is able to can satisfy the majority of requirements.
I've never found that I need to touch most of it in the 15 or so years I've been using it, but it's there if your project needs it.
Git was always confusing to use. There's a reason it has gained a "switch" command, and that's because the "checkout" command was confusing, while being there from the beginning.
Probably you've been using it for ten years or more at this point and have internalized it, but when it came out git felt way more confusing than other VCSs.
I've never been confused by git checkout. git checkout <branch> switches to a branch. git checkout <commit> switches to a commit in a detached head state. git checkout <file> switches just the content of a file. You can also combine these ofc but it all works pretty much as expected. The -b switch for creating new branches is someting you need to look up once and then just remember - and it does make sense - you are switching to a new branch.
Nope. It was initially built for the use-case most people will never have: multiple remotes that work on a project with multiple similar products each with its own set of differences from the others, with emphasis and dependency on Unix tools and text-only emails.
Most Git users will never have more than one remote per project, and so will only have a single product built from their source code. Probably wouldn't even know how to configure their mua to send text-only emails, in case that option is even available to them, and would struggle with basic Unix utilities like Vim and diff.
I don't know why Git won the VCS contest. But, I'm afraid, as with many such wins, there wasn't a clear rational reason why it should have won. It didn't make some of the obvious bad decisions which would disqualify it, but so did a few others. My take on this is that communication tools naturally gravitate towards monopoly, so, if one starts to win even slightly, the win will become a landslide win.
Because GitHub offered free git hosting, and heroku came along and offered free hosting that was pretty much point and go.
Combined, you all of a sudden went from needing a sysadmin and two servers (this was pre containers), and the sysadmjn skills to operate SVN and your web app, to “it’s now free and it auto deploys when I commit”.
No. Git is a complex program but version control is an inherently complex problem that requires powerful tools. There's certain set of problems where, as a programmer, you're going to have to sit down and actually read the book.
The universe doesn't owe you an easy 10 minute video solution to everything, it's an annoying educational expectation that people seem to have developed. Some things are just that difficult and you have to learn them regardless.
No. Source control is not that complicated. Git is just bad. As an existence proof: Mercurial is much better and simpler.
I can teach someone who has never even heard of source control how to use Perforce in about 10 minutes. They will never shoot themselves in the foot and they will never lose work. There are certainly more advanced techniques that require additional training. But the basics are very easy.
Git makes even basic things difficult. And allows even experts to shoot their face off with a rocket launcher.
Git sucks. The best tool doesn't always win. If MercurialHub had been founded instead of GitHub we'd all be used a different tool. Alas.
How do switch to a branch? (Note that you need to fetch before you switch. Also switch is experimental but it’s not really)
How do I undo a change and get it to other people on the team?
- follow up, What happens if someone has made an unrelaydx change since?
- someone has committed an enormous change and I want the commit immediately after it but the enormous change doesn’t affect me. How do I get that single file without needing the mega change.
- someone has committed a mega change and pushed it to main, and it’s busted. How do I put my perfectly functioning feature in while i wait for the west coast to wake up and fix it?
I don’t need an explanation on how to solve these issues, I am perfectly able to. But these are daily workflow issues that hit requires you to use external tools, and change entire development processes to work around. And trying to skirt them is a guaranteed one way ticket to a busted workspace
Messing up conflicts during a rebase, thinking I did it right, and then finalizing the rebase and losing work that accidentally disappeared. That's my most common mistake at least.
1. it’s possible to get into a bad state
2. it’s not clear what exactly that state is
3. it’s not clear how you got into that state
4. it’s not clear how to get out of it
I understand Git reasonably well. I know a good bit how it works under the hood. When I have a gitastrophe I rarely understand what I did wrong and how to avoid it in the future.
Here’s a recent post from a friend:
“ 0) clicked fetch all to make sure I had latest everything
1) right clicked on master and selected "merge master into branch"
2) made sure there were no merge errors
3) get on the master branch
4) clicked pull again because sometimes switching to branches doesn't work without it
5) right clicked on my branch and selected "merge branch into master"
6) clicked check in and push buttons
About an hour later, someone noticed that everyone's work from the past week was gone. I mean the checkins were still there in the graph, but all their code was basically gone in latest. And because my branch had many commits in it, apparently no one could just revert my merge and it took someone an hour to work out how to fix everything during which no one could touch git”
Somewhere along the way he didn’t do what he thought he did. No one could figure out what he actually did wrong. No lessons were learned from this Gitastrophe.
Well first, that's not git, that's some other GUI giving its own interface to git. The majority of the time my co-workers have a git problem, it's because their GUI tool has done something weird without telling them exactly what it did - one of theirs has a "Sync Branch" button which he'd click on whenever the IDE highlighted it, and I have no idea what that's even supposed to do, but I think it was some sort of rebase.
Without knowing for sure what was going on and whether your friend was describing it using the right verbs, I'm thinking (0) didn't pull in the changes ("fetch" instead of "pull") so (1) didn't merge in any new commits, but (4) did advance master, causing it to diverge from what was on the server. Then (6) probably decided to be helpful and did a force-push instead of a regular push so it wouldn't fail on the user. That would cause the server to have your friend's changes, but be missing anything that had been pushed to master after they started working on their branch.
You've just described computers. It's possible to get into a bad state because git can't read your mind, and, at the end of the day, it is incumbent upon you, the programmer, to make the computer do what you want. That is our responsibility as practitioners.
You need to think about what you're actually trying to accomplish, and that requires having a mental model of how the tool works. And no, I don't mean under the hood, I mean stuff like "what does a rebase do?" and "how do branches work?"
The Git Book is a great resource for this. I recommend reading it and trying the examples over and over until they stick. I promise, git is not inscrutable.
And yet only git has these problems. I work with artists and designers - non technical people - all day. 10 minutes with p4v is all they need to be able to check in, update, roll back bad changes, and shelve to share with others, and even these people who manage to get their computers into the most unbelievable of states can do that without breaking their workspace and needing help.
Why I agree with many points you make... let's keep Perforce out of it. The amount of damage that program done to my source code, and the amount of pain caused by it in daily use tells me that 10 minutes will not cut it.
Here's a simple example of how people shoot themselves in the foot with Perforce all the time: it makes files you edit read-only, and then you run with your pants on fire trying to figure out how to save your changes, because the system won't let you do that. And then you do something dumb, like copy the contents of the file you aren't able to save someplace else, and then try to incorporate those changes back once you've dealt with the file that Perforce wouldn't save. And then end up with a merge conflict just working with your one single change that you are trying to add to your branch.
I never regretted never having to touch Perforce ever again. Just no.
Why does git get a free pass for shitty defaults but perforce doesn’t? Perforce long predates git, and the checkout operation (which can be done on save with any modern editor) fixes that immediately.
You could have marked the file as not read-only and later reconciled. Or you could have checked the file out of Perforce. You would have had a merge conflict either way.
I mean I haven't even talked about how Git can't handle large files. And no Git LFS doesn't count. And Git doesn't even pretend to have a solution to file locking.
I'm not saying Perforce is perfect. There's numerous things Git does better. But Perforce is exceedingly simple to teach someone. And it doesn't require a 193 page guide. I can teach artists and designers how to use Perforce and not lose their work. A senior engineer who is a Git expert can still shoot themselves in the foot and get into really nasty situations they may or may not be able to get out of.
There's a reason that like 105% of AAA game dev uses Perforce.
So, there's a difference between how problems are common, how many of them are there, and how hard it is to deal with them.
So, is it possible to deal with Perforce marking files read-only? -- Yes. And it's not complicated, but the whole idea that that how the system should work is stupid. The problem is, however, exceptionally common. In my days working with Perforce there hadn't been a day when this problem didn't rear its ugly head.
So, maybe Perforce is scoring better on some metric, but in the day-to-day it generates so much hatred towards itself that I don't care if it can handle big files better than Git does. I only need to handle big files maybe a few times a year. And I prefer to get frustrated only twice or three times a year than to be fuming every time I have to touch files in my project.
I appreciate your hatred of Perforce. But I think you've let your hatred blind yourself to the argument I actually made. In my original comment I made two arguments:
1. Git is complicated
2. Perforce is so simple to use that I can teach an artist or designer who has never even heard of source control how to use it in 10 minutes.
Then you came in and said the way Perforce handles read-only files is stupid. You know what, I agree! That's a solvable problem. If Perforce wasn't acquired by a private equity firm maybe they'd actually work to make it better. Alas.
This isn't about Git vs Perforce. I post in all of these Git HN threads because I desperately want people to realize that Git sucks ass and is actually really bad. WE COULD HAVE SOURCE CONTROL THAT DOESN'T SUCK. IT'S POSSIBLE. I'm not saying that Perforce is that solution. I'm saying that Mercurial and Perforce are existence proofs that certain things are possible. Source control doesn't have to be complicated! It doesn't have to have footguns! It doesn't need a 189 page guide!
I've been using Jujutsu a little and over the weekend lost a bunch of files. I'd been working happily in an anonymous branch. I had a bunch of content in a thirdparty folder that was hidden by .gitignore. I couldn't figure out how to merge my anonymous branch into master. Somehow wound up on the old master and it deleted all of my gitignored files. Then jj status got totally fubar and couldn't complete in under 5 minutes because something something 7500 files (everything that got deleted, it was compiler toolchains).
It was a disaster. Literally the most important thing for any VCS tool is to never ever delete file I don't want deleted. No more Jujutsu for me.
Someday someone will invent a VCS tool that doesn't suck. Today is not that day.
One of the modern curiosities is that people have somehow made using Git to be part of their identity. Dear HN user ‘gitgood’ who has a one hour old account, I suggest you take a step back and reevaluate things.
Git porcelain stuff's plenty good for probably 95% of users. `rebase -i` comes with a guide on which commands do what, and you could write a couple of paragraphs about how to format `git log`'s output with your own preferences and tradeoffs -- and porcelain usually includes stuff as eclectic as `git gc`, `git fsck`, and `git rev-parse` by most accounts.
Git plumbing's definitely a bit more obscure, and does a bunch of stuff on its own that you can't always easily do with porcelain commands because they're optimized for the common use cases.
TL;DR: while Git's big (huge even), a lot of what it provides is way off the beaten path for most devs.
not my experience - almost always some edge case leads me to a git rabbit hole
tldr: even if you never plan to use anything advanced, you’ll end up in some weird situation where you need to do something even if you’re in the “95% of the users”
no shade, yes ofc you “could this, could that” to make things work and we have been stuck with this for so long that an alternative doesn’t even seem plausible
I can't remember the last time I ended up in a weird situation, I stick to basic options with init,clone,fetch,checkout,branch,commit,rebase,remote,log,stash,cherry-pick,blame,config.
It did take maybe a year or so to develop the mental model of the how commands map to the underlying structure of commits, and another few years to avoid footguns (like always "push --force-with-lease").
So I think it is probably too complicated and would be happy to switch to a better alternative if one comes up, but what seems really implausible to me is going back to the bad old days of SVN.
Maybe you’re young, but git is better than all of the other shit before it.
Try to come up with something simpler than git, and you’ll end up with something like SVN or CVS that struggled with more than a couple of people working on the same files.
Try to make something that is more content aware, and you’ll find out how git got its name in the first place.
For some people. Back when I only knew how to use subversion, I tried out both git and mercurial, and found mercurial confusing while git clicked immediately.
Unfortunately it's been long enough I don't remember details why, just that it was something with how it handled branches.
You are using Git in a non-automated way, basically, as a substitute for rsync (you never edit history, you only append to it, you don't deal with a possibility of multiple remotes, you don't deal with modular projects and you don't automate anything).
At least, this is what it looks like from your own description. This is, probably, what most people do with it most of the time. And the weird corners will be found when they need to automate things, or when they want to deal with modular repositories, or history rewrites. These aren't everyday tasks / not for everyone on the team, but, these things do happen, especially in smaller teams, where there's no dedicated infra person or group these can be delegated to.
If the tool is designed to support the use case of the 1% with concessions for the other 99%, the tool is badly designed.
Git is designed for the case where you have multiple remotes with no central authority. Except that’s not how any project I’ve _ever_ worked on functions in reality. It makes sense for some applications, but if I say that I run Linux, there’s an assumption that I’m running something compiled from https://github.com/torvalds/linux - I.e. there is a central space.
I’ve used git and perforce in anger for a decade, in teams of 1 to 150+ (with a brief blip in the middle where I tried plasticscm which was a mistake), and I’ve been the “git guy” on teams during that time. If git’s defaults were tweaked for “one origin, probably authoritative” and it had better authentication support out of the box it would be a significantly better experience for 99% of people. Those 1% of people who are left over are going to customise their config anyway, so make them add the distributed-defaults=true flag and the rest of us can get on with our work.
I'm not totally sure what you mean by "non-automated" here, can you clarify? I have managed repos for small teams, that's actually the majority of my experience with it.
I do deal with multiple remotes quite often and haven't encountered issues. You're right about submodules, I avoid setting up projects with them, even at the expense of more manual work or complicated automation.
I'm definitely not using it as a substitute for rsync - I do prefer to put rules in place to avoid editing (shared) history, for obvious reasons.
Honestly, 99% of the pain of git is simply because people use it through the CLI. If you use tortoisegit or a visual tool, you don't need to worry about any of this because its self explanatory, and it becomes trivial to use
Learning git like this is honestly just hampering yourself
I’ve seen tortoise users break their repo, struggle to understand the issue and then push it through, making it everyone’s problem. Git language is screwed, you cannot unscrew it with a right-click gui because you basically click some latin-looking hieroglyphs that you don’t know either way.
I highly doubt tortoise or any tool can "break" a repo. This might be a sign that you don't understand git either. Now I'm sure it can lead to people who don't know what they're doing doing the wrong thing, but if they're allowed to push somewhere and make it someone else's problem, that's not their fault. They've been forced to use git, so there should be someone else who actually understands git.
I disagree. Version control is kind of a pain, you need to understand some of the underlying concepts or you'll break your git repo in spectacular ways.
The command line isn't that hard to use if you've ever used the command line before. Beginners trying to learn git and command line at the same time (which is very common) will get utterly confused, though, and for a lot of beginners that's the case. The only difficult part with git over the command line is fixing merge conflicts, I'd recommend anyone to use any IDE rather than doing that manually.
No IDE will be of any help for getting back to normal when you get into a detached HEAD state, which IDEs will gladly let you do if you click the right button.
Learning it like this makes one learn the concepts though and build something closer to an actual understanding. I have seen people struggle with understanding what git does or with making fine grained commits or mostly atomic commits a lot, especially GUI users, because many of them do not have the underlying concepts understood well enough.
The worst part about Git is the bad defaults. Seconded only by mismanaged storage. Or maybe being designed for the use-case most of its users will never have. Or maybe horrible authentication mechanism. Or maybe the lack of bug-tracker or any sensible feedback from its developers.
None of this can be helped by the GUI. In fact, beside Magit, any sort of front-end to Git I've seen is hands down awful and teaches to do the wrong thing, and is usually very difficult to use efficiently, and mystifies how things actually work. But, even with Magit, I'd still advise to get familiar with CLI and configuration files prior to using it: it would make it easier to understand what operations is it trying to improve.
I use the IntelliJ family of IDEs and the number of times I’ve had to reach for the cli for day to day use is incredibly close to 0. It handles GitHub auth, PR’s, branching and merging, rebase, and local branch switching pretty much effortlessly
My sense, bluntly, is that if people spent half the effort learning git that they do whining about it, no one would bother making a 30+ part guide just explaining stuff you could find in a man page.
Commits are snapshots of a tree. They have a list of ancestors (usually, but not always, just one). Tags are named pointers to a commit that don't change. Branches are named pointers to a commit that do change. The index is a tiny proto-commit still in progress that you "add" to before committing.
There. That's git. Want to know more? Don't read the guide, just google "how to I switch to a specific git commit without affecting my tree?", or "how do I commit only some of my changed files?", or "how to I copy this commit from another place into my current tree?".
The base abstractions are minimalist and easy. The things you want to do with them are elaborate and complicated. Learn the former, google the latter. Don't read guides.
Commits are sets of files. They form a tree. A branch is a named location in this tree. The index aka staging area is a pre-commit that has no message. Workdir is just workdir, it doesn’t go in the repo unless you stage it. HEAD is whereafter commit will put new changes.
Do I understand git? Seems like yes. Let’s run a quiz then! Q? A.
How to make a branch? Git branch -a? Git checkout -b --new? Idk.
How to switch to a branch? Git switch <name>, but not sure what happens to a non-clean workdir. Better make a copy, probably. Also make sure the branch was fetched, or you may create a local branch with the same name.
How to revert a file in a workdir to HEAD? Oh, I know that, git restore <path>! Earlier it was something git reset -hard, but dangerous wrt workdir if you miss a filename, so you just download it from git{hub,lab} and replace it in a workdir.
How to revert a file to what was staged? No idea.
How to return to a few commits back? Hmmm… git checkout <hash>, but then HEAD gets detached, I guess. So you can’t just commit further, you have to… idfk, honestly. Probably move main branch “pointer” to there, no idea how.
If you have b:main with some file and b:br1 with it, and b:br2 with it, and git doesn’t store patches, only files, then when you change b:main/file, then change and merge+resolve b:br1/file, then merge that into b:br2 to make it up-to-date, will these changes, when merged back to already changed b:main become conflicted? Iow, where does git keep track of 3-way diff base for back-and-forth reactualization merges? How does rebase know that? Does it? I have no idea. Better make a copy and /usr/bin/diff [—ignore-pattern] the trees afterwards to make sure the changes were correct.
As demonstrated, knowing the base abstractions doesn’t make you know how to do things in git.
I don’t even disagree, just wanted to say fuck git, I guess. Read guides or not, google or reason, you’re screwed either way.
Facetiousness aside, the things you do often, you learn once and you don't really have to remember/think when doing them. Most of the esoteric operations are mostly unnecessary to burden yourself with until you actually have to do them, when you just read the documentation.
Literally every one of those questions can be trivially googled. (In previous generations and fora, this is where you'd be mocked with LMGTFY links). You just, to continue to embrace the frame, don't want to do the work.
If you insist on memorizing commands for all these tasks (of which there are many), indeed, you're going to struggle and decide you need a 30 section guide. But you don't, and want to whine about it.
> I don’t even disagree, just wanted to say fuck git, I guess.
Not the parent commenter. Git the version control system is superb, fast, robust, well-thought out. Git the CLI tool is by far one of the worst CLIs I have ever had the misfortune of using. I think the one-dimensional, string-y command-line massively complicates mental and reasoning models for a tool with a fundamental data structure—a tree—that is multi-dimensional.
A powerful Git GUI makes even moderately-complicated actions like cherry-picking, interactive rebasing, and even ref-logging absolutely trivial. In fact it was precisely said GUI tool that allowed me to develop an intuition for how Git worked internally; the CLI does no such thing.
> Literally every one of those questions can be trivially googled. (In previous generations and fora, this is where you'd be mocked with LMGTFY links). You just, to continue to embrace the frame, don't want to do the work.
I find this an odd statement. I mean, no, I don't want to do the work! Not if it isn't necessary in the first place.
Take staging (or the index, because another of Git's foibles is poor naming conventions that stick around and confuse newcomers). It's kind of a commit, right? In the sense that it's a snapshot of work that represents a change to the code. Except we can't really make it behave like a commit, we have to interact with it using special commands until we turn it into a commit. Are these special commands really doing much differently than we might do with another commit? Not really.
Or take stashes. Stashes are more like commits — they even appear in the reflog! But they also aren't real commits either, in the sense that you can't check them out, or rebase them, or manipulate them directly. Again, you need a bunch of extra commands for working with the stash, when really they're just free-standing anonymous commits.
Or take branches. Branches are, as everyone knows, pointers to commits. So why is it important whether I'm checking out a branch or a commit? Why is one of these normal, but the other produces hideous warnings and looks like it loses me data if I haven't learned what a reflog is yet? And if a branch is just a pointer, why can't I move it around freely? I can push it forwards, but I can't move it to another arbitrary commit without fiddling around a lot. Why?
Or take tags, which are like branches, but they don't move. Is "moves" and "doesn't move" such a deeply important distinction that Git needs branches and two different kinds of tag?
---
To be clear, I think Git is a good tool, and I agree that once you've started to familiarise yourself with it, it's not so complicated in day-to-day usage. Yes, you'll probably need to Google a few specific commands every so often, but the general UX of the tool has significantly improved since the early days, and it is getting better.
That said, I also don't like the idea of settling with Git just because it's good. If there are alternatives out there that can do everything that Git can do, but with a simpler conceptual model, then I want to try them! (And, spoiler alert, I think there are better alternatives out there — in particular, I think Jujutsu is just as powerful as Git, if not more so, while simplifying and removing unnecessarily duplicated concepts.)
> Commits are snapshots of a tree. They have a list of ancestors (usually, but not always, just one). Tags are named pointers to a commit that don't change. Branches are named pointers to a commit that do change. The index is a tiny proto-commit still in progress that you "add" to before committing.
This is about as useful as "A monad is just a monoid in the category of endofunctors."
It's basically a lot of words which make zero sense for a user starting to use git -- even if it happens to be the most succinct explanation once they've understood git.
> The base abstractions are minimalist and easy. The things you want to do with them are elaborate and complicated. Learn the former, google the latter.
You can't really learn the former -- you can't even see it till you've experienced it for a while. The typical user groks what it means after that experience. Correction, actually: the typical user simply gives up in abject frustration. The user who survived many months of using a tool they don't understand might finally be enlightened about the elegant conceptual model of git.
Pre commit hooks aren’t enforcable. People need to opt in to them, and the people who opt in to them are the people who will check for passwords before they commit.
Or do read books and guides. But in an exploratory manner. So when you do have a need for a specific operation (which happens rarely) you have a mental map that can give you directions.
I think the trickiness with the simple abstraction is that you end up looking at a commit graph and thinking "I would like to make a couple new nodes in this in a very specific shape, but one that many people have likely done in the past. Is there a trick?"
Like so much of the porcelain is those kinds of tricks, and make otherwise tedious work much simpler.
Imagine if you didn't have interactive rebases! You could trudge through the work that is done in interactive rebases by hand, but there's stuff to help you with that specific workflow, because it is both complicated yet common.
I think jujutsu is a great layer over git precisely because you end up with much simpler answers to "how do I change up the commit graph", though.... the extra complication of splitting up changes from commits ends up making other stuff simpler IMO. But I still really appreciate git.
Sigh. Another git thread, another pile of posts telling me that if I would _just do the work_ to understand the underlying data structure I could finally allow myself to be swept up in the _overwhelming beauty_ of the something something something.
The evidence that the git UI is awful is _overwhelming_. Yes, yes, I’m sure the people that defend it are very very very very smart, and don’t own a TV, and only listen to albums of Halloween sounds from the 1950s and are happy to type the word “shrug“ and go on to tell us how they’ve always found git transparent and easy. The fact is that brilliant people struggle with git every single day, and would almost certainly be better served by something that makes more sense.
GP isn't describing the underlying data structures, they're describing the basic interface of commits, branches, and tags. The 101 stuff you have to learn regardless, for any version control, not just git. Dismissing it like this just sounds like someone who refuses to hold scissors by the handles.
You’re right, of course, and I apologize to GP for conflating what they were saying with what I, to be fair, do often see in these threads.
Like others in these comments, I can use it just fine right up until I can’t. Then it’s back to the mini, many, many posts and questions and tutorials, sprawled across the Internet to try and solve whatever the issue is. JJ has shown that a better chrome can be put over the underlying model, And it’s frustrating to me that we are all collectively, apparently, expected to put up with a tool that generates so much confusion seemingly regardless of brilliance or expertise
Which brilliant people, who have put in an appropriate amount of time into learning any (D)VCS, are struggling with having a day to day working knowledge/familiarity with git? Can you point to some? Brilliant people is of course a definition question. But one of the defining qualities I would ascribe to a brilliant person, is the ability to quickly grasp concepts and ideas and reason about them. That seems to me to be the core quality one needs to use git, since it requires one to have a mental model, whether actually correct (which I think few people have) or just close enough to be useful.
There are tools for the UI part. Most people I know only use command line git for doing stuff where GUIs give up (i.e. fixing repos in weird states). Usually, checking out a clean clone and switching to that will do the same without the GUI, just takes a bit longer if you know the command line fixes.
The issues most people seem to have with git are common version control issues. Version control is actually hard, even if it's just "what has changed", once you start going beyond two users editing a file at once. When three people edit a file at the same time, there's going to be complexity when those changes need to be applied back, and that's where you start getting into branching/merging/rebasing.
Just like some people simply cannot get functional programming/object oriented programming/imperative programming to click in their head, others will never truly grasp version control. It's a paradigm of its own. People who know lots of data structures like to trivialise version control into data structures ("it's just a list of ...") but the data structures are the chosen solution, not the problem.
Another complexity issue is that git is actually pretty smart, and will fix most problems automatically in the background. Often, when you need to manually operate on a git repo, you're in a situation where git doesn't know what to do either, and leaves it up to you as the expert to fix whatever is going on. And frankly, most people who use git are nowhere close to experts. The better Git tooling gets at fixing these situations for you, the worse your situation will be once you need to manually correct anything, and the worse your perception might get.
I have no good advice for you on how to work Git better. All I can say is that I'm very productive with Jetbrains' IDE integration, others seem to prefer Visual Studio Code's git integration, and then there's the Tortoise people. Find whatever tool works best for you and hope you'll have a random epiphany one day.
I don't struggle with git, and I can assure you, I am not brilliant. I do, however, refuse to give up when something seems hard, and I refuse to ask the computer to be easier for me. (Understandably, I started programming computers to make them do what I wanted them to do, not to sit and whine when they didn't.)
Pretty much, yeah. Just do the work. It's not nearly as hard as whatever it is you're committing into it, I promise. Continuing to mock it via florid metaphor doesn't help anyone at this point.
I'm always kind of aghast at the number of people who not only don't know git, but who cannot or will not learn it over years, or even decades.
Listen, I'm not that smart, and I managed to figure out how to solve even gnarly git issues one summer during an internship... 11 years ago? Ish? Now, I know git well, and not just "the three commands". I would be, honestly, so ashamed if it were a decade on and I still hadn't committed to learning this fundamental tool.
Version control is a hard problem, fundamentally, and a tool for experts will always take more effort to understand. I mean, aren't we supposed to be the software experts? If people can't learn git, I wouldn't trust them with the even harder parts of software development.
But this is a common attitude in industry now, unfortunately: a petulant demand for things to be easier, and for someone else to do the learning. Is it any wonder software today is so bad?
If people can't learn git, I wouldn't trust them with the even harder parts of software development.
This idea breaks under pressure. People have limited concentration and the more you demand for daily routine, the less there’s left for the actual job. This argument only makes sense in a relaxed setting with lots of time and coffee breaks. But all these problems tend to happen at friday evening when you’re expected to get your kids in an hour or something and this damn repo got broken again.
Yes, things should be easier. Cause you get what you get. If you want people who have no issues with git, feel free to enjoy the greatly reduced hiring pool and stop whining about someone not being able to juggle fifty things at once in their mind - focus on your hiring process and where to get the budget for inflated compensation instead.
Is it any wonder software today is so bad?
I remember delphi and vb time, when people - who were unable to understand or use CVS and SVN - made full-blown apps for real sectors, and it worked. Because it was easy. Nowadays all we have is important dudes with pseudo-deep knowledge of git, css, framework-of-the-month and a collection of playbooks, who cannot make a db-enabled hello username message box in less than a day. I don’t think you’re moving in the right direction at all with this. This paradigm is going further and further from good software, actually.
There is one fundamental piece missing in your description of git that I think is the main reason people don't understand it. You have described a single DAG, but in git there are multiple DAGs. This is what it means to be a distributed version control system.
In my experience people come to git and start using it with the centralised paradigm in their heads: that there is one repo and one DAG etc. They think that their master branch is the same as "the" master branch. You just can't get good at git with this wrong understanding.
I'll just chime in with congrats on the new book. I was a huge fan of the Network Programming book that I first read in 2013, and which I still consider as having the best balance of approachability and rigor. Looking forward to checking the new one out. :)
Hey Beej, can you talk about what tool you use to create your guides? I'm assuming something like pandoc is involved for supporting your various formats?
I'm decent with git (usual flow, merging, rebasing, etc). I'm seriously considering switching over to jujutsu instead of becoming "better" at Git. jj is compatible with git and you can use it while your teammates can also just use git.
I regularly conduct 2 hr long "Intro to the Git Data Model" courses at my workplace (1-2 times a year). I literally take them into the .git directory and unzip the files to show how everything is just plain text representation of basic data structures. It's honestly cool to see it click in their heads.
We have a basic Git cookbook we share with any new joinees so that they start committing code, but most of them just follow it religiously and don't understand what's going on (unsurprisingly).
However, literally everyone who attends the course comes out with a reasonable working understanding of Git so that they know what's actually happening.
That does NOT mean that they know all the commands well, but those can be trivially Googled. As long as your mental model is right, the commands are not a big deal. And yet, the vast majority of the discussion on HN on every single Git post is about the command line.
Funnily enough the class sounds a lot like the alt text of https://xkcd.com/1597/ (Just think of branches as...), the difference is that that is unironically the right way to teach Git to a technical audience, and they will come out with a fundamental understanding of it that they will never forget.
I honestly think it's such a high ROI time investment that it's silly to not do it.
> As long as your mental model is right, the commands are not a big deal.
A priori, I would have assumed this was one of those "just understand how every layer and every part of Linux works, and using Linux is easy" type arguments people used to make in the 90s - i.e. theoretically true, practically infeasible for most people.
Thankfully, I was lucky enough to come across a video explaining (some of) the git internal model early on, and it really doesn't take that much or that deep a knowledge of the internals for it to make a big difference. I'd say I know maybe 5% of how git works, and that already gave me a much better understanding of what the commands do and how to use them.
I did it once, I was indeed really nice, and the discussion that we did after was very cool. I put in the last slide of the presentation some questions for my colleagues answer based on the Git data model, e.g.: "Can we move a commit to another branch?" or "What guarantees that we don't have cycles in the commit graph". I was really satisfying that people came out thinking Git, not only using it!
This is precisely why it enrages me when all HN discussion about Git devolves to the same stuff about how it's complex and this and that.
A technical person who has general sense about basic data structures (Leetcode nonsense not needed) can be taught Git in under 2 hours and they will retain this knowledge forever.
If you can't invest that little time to learning a tool you will use everyday and instead will spend hours Googling and blindly copy-pasting Git commands, that's on you, not on Git.
> I regularly conduct 2 hr long "Intro to the Git Data Model" courses at my workplace (1-2 times a year)
Does the course material (and perhaps any recordings) have any proprietary information or constraints to prevent you from sharing it publicly? Is this based on something that’s publicly available yet concise enough to fit within two hours? If yes, please share (perhaps in this thread and as a post submission on HN).
I’m asking because I believe that there can never be enough variety of training materials that handle a topic with different assumptions, analogies, focus, etc.
Just discovered this all for the first time, and these guides are incredible! I've downloaded the GIT and networking ones. The humour is the best. I wish all textbooks were filled with jokes...
i feel that the big problem with git is how it applies names to procedures that are MUCH easier to understand unnamed. you can have a model of the current repo state, and the state you wish to reach. instead of just coding this difference on the data structure level, as imperative statements or functional expressions, we’re forced to translate them into a sequence of weird names and flags representing conversions into intermediate states.
Love seeing this. Beej is one of the greatest in our industry. His educational content is top notch and always free...an increasingly rare thing in the age where everyone tries to monetize their knowledge via paid courses and newsletters.
I think the biggest problem with CVS is the lack of consensus on what and how to push to the repo.
On one hand you have the ideal world scenario when each and every change is granular and you can annotate and blame every single line of code with description. On the other hand you have a real world where teams are encouraged to squash changes so that every commit corresponds to a business requirement and you have to engage a whole cabal to smuggle a refactor.
A long time ago I've implemented a routine to use both SVN and GIT, so that I could use GIT on file save, and SVN on feature release. I think it was inspired by Eclipse workflow. Definitely not something I would recommend these days.
On the promise of going back in time, I’m finding myself getting more utility of VS Codes timed snapshots than my own commits.
I find it hard to judge when things are in a good enough state to commit and especially good enough to have a title.
I might start writing a new function, decide that I want it to be a class only to give up the class and wanting to return to my almost complete function. Snapshot works pretty well for that, but got isn’t really centered around snapshots and doing good snapshots is not straightforward, at least to me.
When working together with other people using Git, I commit fast and often. My commit messages can be anything from "jdwqidqwd" to "add widget frubble() method" while I'm working. Sometimes repeated several times over, sometimes I remember to hit the "amend" checkbox. Basically, whenever I'm somewhat satisfied with the state of my program, I commit, finished or not. Everything in a nice, local, separate branch, pushed occasionally to make sure I don't lose any data.
And then when everything works, compress commits into a few big commits with squash, and actually try to merge that back into the main branch.
> I might start writing a new function, decide that I want it to be a class only to give up the class and wanting to return to my almost complete function.
For me, that would easily be three commits in my dev branch (one with a first implementation of the function, one with a refactor to a class, then another one back to a single function) and when the function is finished, one squashed commit in a merge request. If everything goes right, it's as if the class file was never there.
It has to be said, relying on squashing doesn't work well when you're working in a team that doesn't pay too close attention to merge requests (accidentally merging the many tiny commits). You also have to be careful not to squash over merges/use rebase wherever possible so your squashed commits don't become huge conflicts during merge trains.
When I work on my own stuff that I don't share, I don't bother squashing and just write tons of tiny commits. Half of them leave the code in a non-compiling state but I don't necessarily care, I use them as reference points before I try something that I'm not sure works.
There is something to be said for carefully picking commit points, though. While finding the source of a bug, git becomes incredibly powerful when you can work git bisect right, and for that you need a combination of granularity and precision. Every commit needs to have fully working code, but every commit should also only contain minimal changes. If you can find that balance, you can find the exact moment a bug was introduced in a program within minutes, even if that program is half a decade old. It rarely works perfectly, but when it does, it's a magical troubleshooting tool.
A commit is literally a snapshot :) It is also very easy to make.
Stop worrying about titles and content and commit to your heart’s content.
When ready, restructure those snapshots into a coherent story you want to tell others by squashing commits and giving the remaining ones proper titles and commit messages. I use interactive rebase for that, but there are probably other ways too.
> I find it hard to judge when things are in a good enough state to commit
Work in a feature branch. Commit often. Squash away the junk commits at the end.
> ...and especially good enough to have a title.
Who needs a title? It's perfectly fine to rapid-fire commits with no comment, to create quick save points as you work. Bind to a key in your editor.
I treat commits in a private branch the same as the undo log of the text editor. No one cares about the undo log of your editor as they never see it. The same should be true of your private feature branch commits. They are squashed away never to be seen by human eyes again.
I have nothing but fond memories of reading Beej's guides.
It's also this sort of work that's becoming less necessary with AI, for better or worse. This appears to be a crazy good guide, but I bet asking e.g. Claude to teach you about git (specific concepts or generate the whole guide outline and go wide on it) would be at least as good.
Seems more efficient to have one reference book rather than generating entire new 20 chapter books for every person.
I also think if you are at the “don’t know what you don’t know” point of learning a topic it’s very hard to direct an AI to generate comprehensive learning material.
> Seems more efficient to have one reference book rather than generating entire new 20 chapter books for every person.
The main advantage of LLMs is that you can ask specific questions about things that confuse you, which makes iterating to a correct mental model much faster. It's like having your own personal tutor at your beck and call. Good guidebooks attempt to do this statically... anticipate questions and confusions at the right points, and it's a great skill to do this well. But it's still not the same as full interactivity.
I think a mix is the right approach. I’ve used LLMs to learn a variety of topics. I like having a good book to provide structure and a foundation to anchor my learning. Then use LLMs to explore the topics I need more help with.
When it’s just a book. I find myself having questions like you mentioned. When it’s just LLMs I feel like I don’t have any structure for my mind to hold on to.
I also feel like there is an art to picking the right order to approach learning a topic, which authors are better at than LLMs.
A good book by an expert is still better than LLMs at providing high-level priorities, a roadmap for new territory, and an introduction to the way practitioners think about their subject (though for many subjects LLMs are pretty good at this too). But the LLMs boost a book's effectiveness by being your individualized tutor.
This is a bit of a stretch, but it's a little like distillation, where you are extracting from the vast knowledge of the LLM and inserting those patterns into your brain. Where you have an incomplete or uncertain mental model and you ask a tutor to fill in the blanks.
True although the don't know aspect is where LLMs will be magic. I envy today's youth for having them (and I'm not that old at all)
I remember fumbling around for ages when I first started coding trying to work out how to save data from my programs. Obviously I wanted a file but 13 year old me took a surprisingly long time to work that out.
Almost impossible to imagine with AI on hand but we will see more slop-merchants.
Definitely more efficient in terms of power consumed, not so in terms of human effort to build such guides across nearly every topic one could think of. But you're right, we shouldn't ignore the power consumption.
I have found that asking AI "You are an expert teacher in X. I'd like to learn about X, where should I start?" is actually wildly effective.
Whoever, or whatever, is creating the thing that needs reference materials would have to seed the initial set (just as they/it seeded the thing itself) and then go from there.
If you didn't, then you won't be included the training set (obviously) and the AI would not easily know about you. Sort of how if you start a really cool company but don't make a website Google doesn't know about you and can't return you in their search results. It's valuable for Google (AI) to know about you, so it's valuable to build the sites (docs) to get indexed (trained on).
I don't disagree, but since the quality of AI is largely a function of the quality of human content, there's always going to be value in well-written human content. If humans stop producing content, I think the value of AI/LLMs drop significantly as well.
Haven’t checked out the article, I’m sure its great. But another reco is boot.dev’s git course taught by Primeagen. It’s interactive and He goes real deep down to manipulating files in the .git directory. Came out of that course with a whole new mental model of how git works.
I feel like a lot of the problems with Git UI come from needing to interact with a "normal" filesystem. I'd much rather have a FUSE/9p mount with all commits, index, etc. available as directories, where they can be read, written, diffed, etc.
I am not a git fan. After many years (following use of RCS, SCCS, CVS, SVN) I tried it and found that its whole mental model was weird and awkward. I can get around in it but any complicated merge is just painful.
Anyway, the comment I really wanted to make was that I tried git lfs for the first time. I downloaded 44TB (https://huggingface.co/datasets/HuggingFaceFW/fineweb/tree/m...) over 3-4 days which was pretty impressive until I noticed that it seems to double disk space (90TB total). I did a little reading just to confirm it, and even learned a new term "git smudge". double disk space isn't an issue, except when you're using git to download terabytes.
I'm really interested and really hoping this is something I can sink my teeth into. I've always had frustrating experiences with trying to wrap my head around git and have to regularly use it at my job.
Branching, making commits, and creating pull requests come easy, but beyond that, I know utterly nothing about it.
One mistake that I see people making about Git is trying to learn more commands, more flags, more tricks, but not trying to really understand how it works. Perhaps it's your case. You know Git enough to use in your daily basis, so maybe it's time to dive into a lower level and then everything else will be natural.
I strongly suggest reading Pro Git, the official Git book by Scott Chacon and Ben Straub, available for free here: https://git-scm.com/book/en/v2.
I find it very pleasant to read and it really changed my perspective not only about Git but about how to write code in general. You don't need to read it entirely, but suggest at least these sections:
- 1.3 Getting Started - What is Git?: explains a little about snapshots and the three states
- 10.1 ~ 10.3 Plumbing and Porcelain, Git Objects and Git References: this explains Git in its lowest level, which is surprisingly simple but powerful. Those sections were enough for me to write my own "Git" (you can see it here: https://github.com/lucasoshiro/oshit)
This is partially a question and the rest is shameful confession: I had haltingly used cvs as a solo programmer, and when I was suddenly no longer a solo programmer and had to use git, everything went haywire.
I am an Old and we never were taught anything about coding with other people who were also working on the same project. I have had many successful projects but never with another person.
With that as a background, does your guide cover things like:
1) Merging. I was told that merging happens "automagically" and I cannot, for the life of me, understand how a computer program manages to just ... blend two functions or whatever and it "works." Does your guide make sense of this?
2) Apparently there are holy wars (see also vi versus emacs) about the One True Way to ... decide on branches and whatnot. Are there pros and cons laid out anywhere?
3) Everything seems broken down into teensy tiny functions when I look at someone's git repository, just skillions of files all over the place. Is this a git thing, a code repository thing, or simply that, in order for multiple people to work on the same project, everything must be atomized and then reassembled later? What's your opinion?
1) I say it happens automagically. :) I'm not really familiar with merging strategies, but I would imagine that it happens in a way similar to a diff. But it only works if the changes are "distant" from one another. If they are happening too close in the file, Git punts and tells you there's a conflict you have to manually resolve.
2) I try to stay out of holy wars. Use the right tool for the job, I say, and sometimes that could even be Emacs. ;) I do talk about a few of the more common options for branching in teams (mostly focused on the small teams we tend to have here in university). And I talk about their pros and cons. But I stop short of declaring one the winner since the best option depends on the circumstances. IMHO.
3) I've seen repos with big functions and small functions, so I don't think it's a Git thing. Students tend to write functions that do too much and are too large, but it's certainly possible to break things down into pieces that are prohibitively tiny. Overall this seems more of a software engineering design choice than anything Git-related.
Jumping in on 3: This isn't a git thing, this is a "bad design" thing. I'm thinking it looks like a git thing because two things happened at the same time: git got popular right as there was a huge influx of juniors from a mix of bootcamps and being self-taught, who never learned how to architect their code.
I feel like there is a trick that is missed by many guides (including this one) and most git GUIs I've looked at (with notable exception being magit).
That is, to set your upstream branch to the branch you want to merge into, aka the integration branch. So instead of setting upstream of "feature/foo" to "origin/feature/foo", you would set it to "master" or "origin/master".
This simplifies a lot of things. When you run `git status` it will now tell you how far you have diverged from the integration branch, which is useful. When you run `git rebase` (without any arguments), it will just rebase you on to upstream.
Setting `origin/feature/foo` to upstream is less useful. Developers tend to "own" their branches on the remote too, so it's completely irrelevant to know if you've diverged from it and you'll never want to rebase there.
If you set `push.default` to "current", then `git push` will do what you expect too, namely push `feature/foo` to `origin/feature/foo`.
As a cloud security analyst that is thinking of going back to coding or DevSecOps, if I'm honest with myself, there is nothing new here that I have not seen before... (This is not a criticism or anything. If anything the problem is myself: if I can allocate time to learn this or use Anki to retain this).
I remember reading the excellent Beej's Guide to Network Programming[0] and Beej's Guide to Unix IPC[1] as a teenager, which were incredibly approachable while still having depth—fantastic reads both and very influential on the programmer I ended up being.
[0] https://beej.us/guide/bgnet/ [1] https://beej.us/guide/bggit/
[1] https://beej.us/guide/bgipc/
mispasted, thanks!
I remember translating Beej's network guide to Italian while learning how to use select, which I wanted to learn to make some port scanner ("grabb' I think?) go faster. Fun times.
Came here to see if it was the same person, though I felt very sure with the throwback web design - back when each page had its character, and you had to save the page for offline reading so that Dad wasn't pissed at the phone bill! And when the code worked - it was validation against all the previous failures (and rejections) in life! Oh the joy of sending message from one computer to the other!
Thank you Beej.
Same here! I was also a teenager in the mid-90s. And I was amazed by IRCd server code and bots. I bought a used copy of the book Slackware Linux unleashed w/CD-ROM and it had some networking code examples in C. I found Beej's Networking site because I was confused by a lot of that networking code. Became even more obsessed and went a deep rabbit hole. I spent a lot of time visiting different book stores hoping they had programming books. Bought Richard Stevens' amazing reference books and never looked back. Thanks for enabling my passion all these years later Beej!
Indeed, my first steps in network programming years ago were with the help of this excellent guide.
I had no idea about IPC! I better go read it!
+1, I have almost exactly the same story!
(I didn't read the IPC guide.)
Hey all--if you find things wrong, post 'em. I'll clean 'em up. :)
Love, Beej
Not wrong, but since you’re mentioning vim in the context of git, might be worth adding :cq as a way to exit with a non-zero status to prevent git from finishing the commit / operation.
This is a fantastic mention! I've been commenting out my commit message lines and then saving as a way to trigger this. Feeling like a caveman...
Beej you are a legend. We all love you! You were a beacon of light for us in the 90s
Actual Beej? Wow I remember absolutely loving reading your networking guide. It taught me so much and really showed me the depths and breadths of what can be done in code, how the net works (pun unintended), it was a great experience for me as a kid. Thanks! <3
I read your guide to C programming as a teen, and as a firmware dev today I'm forever indebted to you.
I really appreciate you offering the content as a single page.
Thanks for all your guides over the years. Truly invaluable.
Just a quick shout-out; I was one of the many many students you taught at Lambda School, and just wanted to say your instruction was one of the highlights of my time there. Thanks for doing what you do!
Beej, your Guide to Network Programming helped me through my early UNIX career. In fact, your guide was so influential to so many people, it very quickly became recommended reading in my university's network course.
I'm delighted to see that you're still active and still producing guides. Well done!
Thank you for this Beej!
Along with many others here, your network programming guide helped me so much back in the early days of my education and career. So thanks for that too…
Loved your Network programming guide :) !
There's an issue for the IPC guide on GitHub that's almost a year old with zero reaction.
Your network programming guide really saved my bacon back when I was taking a networking class, I appreciate all your hard work!
Doing Chico proud!
I found your networking guide as a kid with only some scripting experience, and it served to get me into C programming in general, so I have a special fondness for it.
Appreciate the work! Neat to see you still writing pieces like this all these years later!
I am just happy and thankful that people like you exists.
[dead]
> The Old Command: git checkout
I didn't even know git switch existed, let alone git checkout was considered the old alternative. I feel old.
To be fair I started learning git a little less than 10 years ago but woah, I can't express how it feels that someone learning git today will be confused of why I use git checkout. Like using old fashioned language.
More on topic, this guide would've been super useful when I was learning. It is really easy to follow and covers common FAQs.
I fondly remember being intimidated by my first merge conflict, aborting it and just doing some workarounds to prevent the conflict.
Git switch is fairly new, it first shipped in 2019.
Here's, respectively, a discussion from 2021, and a discussion from a few weeks ago. In the latter, it's brought up that `git switch` is still considered experimental by the docs:
https://news.ycombinator.com/item?id=28024972
https://news.ycombinator.com/item?id=42649858
Could you add dark mode for html please?
Thanks for a lot for publishing Beej.
Well, what's terrifying is that the guide is so long.
I am aware that beej's guides are typically quite comprehensive, but the vast nuances of git truly eluded me until this.
I guess Jujitsu would wind up being a much slimmer guide, or at least one that would be discoverable largely by humans?
With most of my guides I try to make it so you can quit reading when you feel you've read enough. No need to read the whole thing.
And on that note, I feel like the guide covers maybe 10% of Git :), but hopefully 90% of common usage.
can I repost this on leddit
>And on that note, I feel like the guide covers maybe 10% of Git
guh
I'm just going to be emailing myself versions of files with MyFile.Final.RealFinal2.txt from now on
Skimming over it, it looks like it's just been expanded out way more than what most guides would do. Like other guides would use a paragraph or two for what this one has spread over several sections with extra examples.
[flagged]
Please learn how to understand humor and sarcasm.
The guide is comprehensive, on the other extreme, this one-pager contains 90% of git commands you'll ever need: https://wizardzines.com/git-cheat-sheet.pdf
It tells me that git is the wrong tool for the majority of people but it just happened to stick.
I think it's probably the opposite, Git has amassed a lot of complexity because it's been adapted to being a tool that is able to can satisfy the majority of requirements.
I've never found that I need to touch most of it in the 15 or so years I've been using it, but it's there if your project needs it.
Git was always confusing to use. There's a reason it has gained a "switch" command, and that's because the "checkout" command was confusing, while being there from the beginning.
Probably you've been using it for ten years or more at this point and have internalized it, but when it came out git felt way more confusing than other VCSs.
Compare git diff with hg diff for example.
I've never been confused by git checkout. git checkout <branch> switches to a branch. git checkout <commit> switches to a commit in a detached head state. git checkout <file> switches just the content of a file. You can also combine these ofc but it all works pretty much as expected. The -b switch for creating new branches is someting you need to look up once and then just remember - and it does make sense - you are switching to a new branch.
Nope. It was initially built for the use-case most people will never have: multiple remotes that work on a project with multiple similar products each with its own set of differences from the others, with emphasis and dependency on Unix tools and text-only emails.
Most Git users will never have more than one remote per project, and so will only have a single product built from their source code. Probably wouldn't even know how to configure their mua to send text-only emails, in case that option is even available to them, and would struggle with basic Unix utilities like Vim and diff.
I don't know why Git won the VCS contest. But, I'm afraid, as with many such wins, there wasn't a clear rational reason why it should have won. It didn't make some of the obvious bad decisions which would disqualify it, but so did a few others. My take on this is that communication tools naturally gravitate towards monopoly, so, if one starts to win even slightly, the win will become a landslide win.
> I don't know why Git won the VCS contest
Because GitHub offered free git hosting, and heroku came along and offered free hosting that was pretty much point and go.
Combined, you all of a sudden went from needing a sysadmin and two servers (this was pre containers), and the sysadmjn skills to operate SVN and your web app, to “it’s now free and it auto deploys when I commit”.
No. Git is a complex program but version control is an inherently complex problem that requires powerful tools. There's certain set of problems where, as a programmer, you're going to have to sit down and actually read the book.
The universe doesn't owe you an easy 10 minute video solution to everything, it's an annoying educational expectation that people seem to have developed. Some things are just that difficult and you have to learn them regardless.
No. Source control is not that complicated. Git is just bad. As an existence proof: Mercurial is much better and simpler.
I can teach someone who has never even heard of source control how to use Perforce in about 10 minutes. They will never shoot themselves in the foot and they will never lose work. There are certainly more advanced techniques that require additional training. But the basics are very easy.
Git makes even basic things difficult. And allows even experts to shoot their face off with a rocket launcher.
Git sucks. The best tool doesn't always win. If MercurialHub had been founded instead of GitHub we'd all be used a different tool. Alas.
Out of curiosity, what are the most common foot guns, in your opinion?
How do switch to a branch? (Note that you need to fetch before you switch. Also switch is experimental but it’s not really)
How do I undo a change and get it to other people on the team?
- follow up, What happens if someone has made an unrelaydx change since?
- someone has committed an enormous change and I want the commit immediately after it but the enormous change doesn’t affect me. How do I get that single file without needing the mega change.
- someone has committed a mega change and pushed it to main, and it’s busted. How do I put my perfectly functioning feature in while i wait for the west coast to wake up and fix it?
I don’t need an explanation on how to solve these issues, I am perfectly able to. But these are daily workflow issues that hit requires you to use external tools, and change entire development processes to work around. And trying to skirt them is a guaranteed one way ticket to a busted workspace
Not realizing that git branches are cheap.
1. Create a branch from the intended destination.
2. Merge/rebase/whatever complex operation you need to do into that branch.
3. If successful merge this branch (fast forward merge) into the intended destination.
4. If unsuccessful delete the branch and start over at step 1.
Messing up conflicts during a rebase, thinking I did it right, and then finalizing the rebase and losing work that accidentally disappeared. That's my most common mistake at least.
Find your last entry before the rebase using the reflog and reset your local branch to that entry.
The work isn’t lost, it is sitting right there.
> Find your last entry before the rebase using the reflog and reset your local branch to that entry.
You shouldn’t ever need to go to the reflog unless you’re in an exceptional case, and fit makes it very very easy to get into that exceptional case.
Honestly? I’m not sure. Here’s the problem:
1. it’s possible to get into a bad state 2. it’s not clear what exactly that state is 3. it’s not clear how you got into that state 4. it’s not clear how to get out of it
I understand Git reasonably well. I know a good bit how it works under the hood. When I have a gitastrophe I rarely understand what I did wrong and how to avoid it in the future.
Here’s a recent post from a friend:
“ 0) clicked fetch all to make sure I had latest everything 1) right clicked on master and selected "merge master into branch" 2) made sure there were no merge errors 3) get on the master branch 4) clicked pull again because sometimes switching to branches doesn't work without it 5) right clicked on my branch and selected "merge branch into master" 6) clicked check in and push buttons
About an hour later, someone noticed that everyone's work from the past week was gone. I mean the checkins were still there in the graph, but all their code was basically gone in latest. And because my branch had many commits in it, apparently no one could just revert my merge and it took someone an hour to work out how to fix everything during which no one could touch git”
Somewhere along the way he didn’t do what he thought he did. No one could figure out what he actually did wrong. No lessons were learned from this Gitastrophe.
Well first, that's not git, that's some other GUI giving its own interface to git. The majority of the time my co-workers have a git problem, it's because their GUI tool has done something weird without telling them exactly what it did - one of theirs has a "Sync Branch" button which he'd click on whenever the IDE highlighted it, and I have no idea what that's even supposed to do, but I think it was some sort of rebase.
Without knowing for sure what was going on and whether your friend was describing it using the right verbs, I'm thinking (0) didn't pull in the changes ("fetch" instead of "pull") so (1) didn't merge in any new commits, but (4) did advance master, causing it to diverge from what was on the server. Then (6) probably decided to be helpful and did a force-push instead of a regular push so it wouldn't fail on the user. That would cause the server to have your friend's changes, but be missing anything that had been pushed to master after they started working on their branch.
You've just described computers. It's possible to get into a bad state because git can't read your mind, and, at the end of the day, it is incumbent upon you, the programmer, to make the computer do what you want. That is our responsibility as practitioners.
You need to think about what you're actually trying to accomplish, and that requires having a mental model of how the tool works. And no, I don't mean under the hood, I mean stuff like "what does a rebase do?" and "how do branches work?"
The Git Book is a great resource for this. I recommend reading it and trying the examples over and over until they stick. I promise, git is not inscrutable.
And yet only git has these problems. I work with artists and designers - non technical people - all day. 10 minutes with p4v is all they need to be able to check in, update, roll back bad changes, and shelve to share with others, and even these people who manage to get their computers into the most unbelievable of states can do that without breaking their workspace and needing help.
Why I agree with many points you make... let's keep Perforce out of it. The amount of damage that program done to my source code, and the amount of pain caused by it in daily use tells me that 10 minutes will not cut it.
Here's a simple example of how people shoot themselves in the foot with Perforce all the time: it makes files you edit read-only, and then you run with your pants on fire trying to figure out how to save your changes, because the system won't let you do that. And then you do something dumb, like copy the contents of the file you aren't able to save someplace else, and then try to incorporate those changes back once you've dealt with the file that Perforce wouldn't save. And then end up with a merge conflict just working with your one single change that you are trying to add to your branch.
I never regretted never having to touch Perforce ever again. Just no.
Why does git get a free pass for shitty defaults but perforce doesn’t? Perforce long predates git, and the checkout operation (which can be done on save with any modern editor) fixes that immediately.
You could have marked the file as not read-only and later reconciled. Or you could have checked the file out of Perforce. You would have had a merge conflict either way.
I mean I haven't even talked about how Git can't handle large files. And no Git LFS doesn't count. And Git doesn't even pretend to have a solution to file locking.
I'm not saying Perforce is perfect. There's numerous things Git does better. But Perforce is exceedingly simple to teach someone. And it doesn't require a 193 page guide. I can teach artists and designers how to use Perforce and not lose their work. A senior engineer who is a Git expert can still shoot themselves in the foot and get into really nasty situations they may or may not be able to get out of.
There's a reason that like 105% of AAA game dev uses Perforce.
So, there's a difference between how problems are common, how many of them are there, and how hard it is to deal with them.
So, is it possible to deal with Perforce marking files read-only? -- Yes. And it's not complicated, but the whole idea that that how the system should work is stupid. The problem is, however, exceptionally common. In my days working with Perforce there hadn't been a day when this problem didn't rear its ugly head.
So, maybe Perforce is scoring better on some metric, but in the day-to-day it generates so much hatred towards itself that I don't care if it can handle big files better than Git does. I only need to handle big files maybe a few times a year. And I prefer to get frustrated only twice or three times a year than to be fuming every time I have to touch files in my project.
I appreciate your hatred of Perforce. But I think you've let your hatred blind yourself to the argument I actually made. In my original comment I made two arguments:
1. Git is complicated 2. Perforce is so simple to use that I can teach an artist or designer who has never even heard of source control how to use it in 10 minutes.
Then you came in and said the way Perforce handles read-only files is stupid. You know what, I agree! That's a solvable problem. If Perforce wasn't acquired by a private equity firm maybe they'd actually work to make it better. Alas.
This isn't about Git vs Perforce. I post in all of these Git HN threads because I desperately want people to realize that Git sucks ass and is actually really bad. WE COULD HAVE SOURCE CONTROL THAT DOESN'T SUCK. IT'S POSSIBLE. I'm not saying that Perforce is that solution. I'm saying that Mercurial and Perforce are existence proofs that certain things are possible. Source control doesn't have to be complicated! It doesn't have to have footguns! It doesn't need a 189 page guide!
You're welcome to switch back to CVS or RCS at any time. You're also welcome to deal with their specific tradeoffs.
Why not switch forward to Jujutsu? It's simpler and more powerful than Git.
I've been using Jujutsu a little and over the weekend lost a bunch of files. I'd been working happily in an anonymous branch. I had a bunch of content in a thirdparty folder that was hidden by .gitignore. I couldn't figure out how to merge my anonymous branch into master. Somehow wound up on the old master and it deleted all of my gitignored files. Then jj status got totally fubar and couldn't complete in under 5 minutes because something something 7500 files (everything that got deleted, it was compiler toolchains).
It was a disaster. Literally the most important thing for any VCS tool is to never ever delete file I don't want deleted. No more Jujutsu for me.
Someday someone will invent a VCS tool that doesn't suck. Today is not that day.
One of the modern curiosities is that people have somehow made using Git to be part of their identity. Dear HN user ‘gitgood’ who has a one hour old account, I suggest you take a step back and reevaluate things.
I can't help but feel that Git has completely missed the forest through the trees that you can make a 30+ part guide explaining how to use it.
And still shoot yourself in the foot.
Eh, yes and no.
Git porcelain stuff's plenty good for probably 95% of users. `rebase -i` comes with a guide on which commands do what, and you could write a couple of paragraphs about how to format `git log`'s output with your own preferences and tradeoffs -- and porcelain usually includes stuff as eclectic as `git gc`, `git fsck`, and `git rev-parse` by most accounts.
Git plumbing's definitely a bit more obscure, and does a bunch of stuff on its own that you can't always easily do with porcelain commands because they're optimized for the common use cases.
TL;DR: while Git's big (huge even), a lot of what it provides is way off the beaten path for most devs.
With 'gitk' I'm not sure I'll ever have to learn how to use 'git log'. A Good Enough preinstalled GUI is too convenient
not my experience - almost always some edge case leads me to a git rabbit hole
tldr: even if you never plan to use anything advanced, you’ll end up in some weird situation where you need to do something even if you’re in the “95% of the users”
no shade, yes ofc you “could this, could that” to make things work and we have been stuck with this for so long that an alternative doesn’t even seem plausible
I can't remember the last time I ended up in a weird situation, I stick to basic options with init,clone,fetch,checkout,branch,commit,rebase,remote,log,stash,cherry-pick,blame,config.
It did take maybe a year or so to develop the mental model of the how commands map to the underlying structure of commits, and another few years to avoid footguns (like always "push --force-with-lease").
So I think it is probably too complicated and would be happy to switch to a better alternative if one comes up, but what seems really implausible to me is going back to the bad old days of SVN.
>It did take maybe a year or so
we have normalized this for git - a tool to store versions of text. That’s the problem
Maybe you’re young, but git is better than all of the other shit before it.
Try to come up with something simpler than git, and you’ll end up with something like SVN or CVS that struggled with more than a couple of people working on the same files.
Try to make something that is more content aware, and you’ll find out how git got its name in the first place.
Mercurial is simpler than git. It is (or was) just too damn slow
For some people. Back when I only knew how to use subversion, I tried out both git and mercurial, and found mercurial confusing while git clicked immediately.
Unfortunately it's been long enough I don't remember details why, just that it was something with how it handled branches.
Mercurial and perforce are a better solution for anything with a centralised repo IMO
A folder with a list of files is a tool to store versions of text. Git is somewhat more useful.
You are using Git in a non-automated way, basically, as a substitute for rsync (you never edit history, you only append to it, you don't deal with a possibility of multiple remotes, you don't deal with modular projects and you don't automate anything).
At least, this is what it looks like from your own description. This is, probably, what most people do with it most of the time. And the weird corners will be found when they need to automate things, or when they want to deal with modular repositories, or history rewrites. These aren't everyday tasks / not for everyone on the team, but, these things do happen, especially in smaller teams, where there's no dedicated infra person or group these can be delegated to.
If the tool is designed to support the use case of the 1% with concessions for the other 99%, the tool is badly designed.
Git is designed for the case where you have multiple remotes with no central authority. Except that’s not how any project I’ve _ever_ worked on functions in reality. It makes sense for some applications, but if I say that I run Linux, there’s an assumption that I’m running something compiled from https://github.com/torvalds/linux - I.e. there is a central space.
I’ve used git and perforce in anger for a decade, in teams of 1 to 150+ (with a brief blip in the middle where I tried plasticscm which was a mistake), and I’ve been the “git guy” on teams during that time. If git’s defaults were tweaked for “one origin, probably authoritative” and it had better authentication support out of the box it would be a significantly better experience for 99% of people. Those 1% of people who are left over are going to customise their config anyway, so make them add the distributed-defaults=true flag and the rest of us can get on with our work.
I'm not totally sure what you mean by "non-automated" here, can you clarify? I have managed repos for small teams, that's actually the majority of my experience with it.
I do deal with multiple remotes quite often and haven't encountered issues. You're right about submodules, I avoid setting up projects with them, even at the expense of more manual work or complicated automation.
I'm definitely not using it as a substitute for rsync - I do prefer to put rules in place to avoid editing (shared) history, for obvious reasons.
Honestly, 99% of the pain of git is simply because people use it through the CLI. If you use tortoisegit or a visual tool, you don't need to worry about any of this because its self explanatory, and it becomes trivial to use
Learning git like this is honestly just hampering yourself
I’ve seen tortoise users break their repo, struggle to understand the issue and then push it through, making it everyone’s problem. Git language is screwed, you cannot unscrew it with a right-click gui because you basically click some latin-looking hieroglyphs that you don’t know either way.
I highly doubt tortoise or any tool can "break" a repo. This might be a sign that you don't understand git either. Now I'm sure it can lead to people who don't know what they're doing doing the wrong thing, but if they're allowed to push somewhere and make it someone else's problem, that's not their fault. They've been forced to use git, so there should be someone else who actually understands git.
Ah, we're holding it wrong. Got it.
I disagree. Version control is kind of a pain, you need to understand some of the underlying concepts or you'll break your git repo in spectacular ways.
The command line isn't that hard to use if you've ever used the command line before. Beginners trying to learn git and command line at the same time (which is very common) will get utterly confused, though, and for a lot of beginners that's the case. The only difficult part with git over the command line is fixing merge conflicts, I'd recommend anyone to use any IDE rather than doing that manually.
No IDE will be of any help for getting back to normal when you get into a detached HEAD state, which IDEs will gladly let you do if you click the right button.
Learning it like this makes one learn the concepts though and build something closer to an actual understanding. I have seen people struggle with understanding what git does or with making fine grained commits or mostly atomic commits a lot, especially GUI users, because many of them do not have the underlying concepts understood well enough.
I think you could use a CLI and still not really understand the core concepts.
depends, you could live on a UI for a start, but script git cli gives you very high speed.. it's kind of a timeline database for your code
Not at all. Not in the least.
The worst part about Git is the bad defaults. Seconded only by mismanaged storage. Or maybe being designed for the use-case most of its users will never have. Or maybe horrible authentication mechanism. Or maybe the lack of bug-tracker or any sensible feedback from its developers.
None of this can be helped by the GUI. In fact, beside Magit, any sort of front-end to Git I've seen is hands down awful and teaches to do the wrong thing, and is usually very difficult to use efficiently, and mystifies how things actually work. But, even with Magit, I'd still advise to get familiar with CLI and configuration files prior to using it: it would make it easier to understand what operations is it trying to improve.
I use the IntelliJ family of IDEs and the number of times I’ve had to reach for the cli for day to day use is incredibly close to 0. It handles GitHub auth, PR’s, branching and merging, rebase, and local branch switching pretty much effortlessly
It is still better than (As)SVN
My sense, bluntly, is that if people spent half the effort learning git that they do whining about it, no one would bother making a 30+ part guide just explaining stuff you could find in a man page.
Commits are snapshots of a tree. They have a list of ancestors (usually, but not always, just one). Tags are named pointers to a commit that don't change. Branches are named pointers to a commit that do change. The index is a tiny proto-commit still in progress that you "add" to before committing.
There. That's git. Want to know more? Don't read the guide, just google "how to I switch to a specific git commit without affecting my tree?", or "how do I commit only some of my changed files?", or "how to I copy this commit from another place into my current tree?".
The base abstractions are minimalist and easy. The things you want to do with them are elaborate and complicated. Learn the former, google the latter. Don't read guides.
This doesn’t work. Look:
Commits are sets of files. They form a tree. A branch is a named location in this tree. The index aka staging area is a pre-commit that has no message. Workdir is just workdir, it doesn’t go in the repo unless you stage it. HEAD is whereafter commit will put new changes.
Do I understand git? Seems like yes. Let’s run a quiz then! Q? A.
How to make a branch? Git branch -a? Git checkout -b --new? Idk.
How to switch to a branch? Git switch <name>, but not sure what happens to a non-clean workdir. Better make a copy, probably. Also make sure the branch was fetched, or you may create a local branch with the same name.
How to revert a file in a workdir to HEAD? Oh, I know that, git restore <path>! Earlier it was something git reset -hard, but dangerous wrt workdir if you miss a filename, so you just download it from git{hub,lab} and replace it in a workdir.
How to revert a file to what was staged? No idea.
How to return to a few commits back? Hmmm… git checkout <hash>, but then HEAD gets detached, I guess. So you can’t just commit further, you have to… idfk, honestly. Probably move main branch “pointer” to there, no idea how.
If you have b:main with some file and b:br1 with it, and b:br2 with it, and git doesn’t store patches, only files, then when you change b:main/file, then change and merge+resolve b:br1/file, then merge that into b:br2 to make it up-to-date, will these changes, when merged back to already changed b:main become conflicted? Iow, where does git keep track of 3-way diff base for back-and-forth reactualization merges? How does rebase know that? Does it? I have no idea. Better make a copy and /usr/bin/diff [—ignore-pattern] the trees afterwards to make sure the changes were correct.
As demonstrated, knowing the base abstractions doesn’t make you know how to do things in git.
I don’t even disagree, just wanted to say fuck git, I guess. Read guides or not, google or reason, you’re screwed either way.
Skill issue, it seems.
Facetiousness aside, the things you do often, you learn once and you don't really have to remember/think when doing them. Most of the esoteric operations are mostly unnecessary to burden yourself with until you actually have to do them, when you just read the documentation.
Literally every one of those questions can be trivially googled. (In previous generations and fora, this is where you'd be mocked with LMGTFY links). You just, to continue to embrace the frame, don't want to do the work.
If you insist on memorizing commands for all these tasks (of which there are many), indeed, you're going to struggle and decide you need a 30 section guide. But you don't, and want to whine about it.
> I don’t even disagree, just wanted to say fuck git, I guess.
Pretty much.
Not the parent commenter. Git the version control system is superb, fast, robust, well-thought out. Git the CLI tool is by far one of the worst CLIs I have ever had the misfortune of using. I think the one-dimensional, string-y command-line massively complicates mental and reasoning models for a tool with a fundamental data structure—a tree—that is multi-dimensional.
A powerful Git GUI makes even moderately-complicated actions like cherry-picking, interactive rebasing, and even ref-logging absolutely trivial. In fact it was precisely said GUI tool that allowed me to develop an intuition for how Git worked internally; the CLI does no such thing.
> Literally every one of those questions can be trivially googled. (In previous generations and fora, this is where you'd be mocked with LMGTFY links). You just, to continue to embrace the frame, don't want to do the work.
I find this an odd statement. I mean, no, I don't want to do the work! Not if it isn't necessary in the first place.
Take staging (or the index, because another of Git's foibles is poor naming conventions that stick around and confuse newcomers). It's kind of a commit, right? In the sense that it's a snapshot of work that represents a change to the code. Except we can't really make it behave like a commit, we have to interact with it using special commands until we turn it into a commit. Are these special commands really doing much differently than we might do with another commit? Not really.
Or take stashes. Stashes are more like commits — they even appear in the reflog! But they also aren't real commits either, in the sense that you can't check them out, or rebase them, or manipulate them directly. Again, you need a bunch of extra commands for working with the stash, when really they're just free-standing anonymous commits.
Or take branches. Branches are, as everyone knows, pointers to commits. So why is it important whether I'm checking out a branch or a commit? Why is one of these normal, but the other produces hideous warnings and looks like it loses me data if I haven't learned what a reflog is yet? And if a branch is just a pointer, why can't I move it around freely? I can push it forwards, but I can't move it to another arbitrary commit without fiddling around a lot. Why?
Or take tags, which are like branches, but they don't move. Is "moves" and "doesn't move" such a deeply important distinction that Git needs branches and two different kinds of tag?
---
To be clear, I think Git is a good tool, and I agree that once you've started to familiarise yourself with it, it's not so complicated in day-to-day usage. Yes, you'll probably need to Google a few specific commands every so often, but the general UX of the tool has significantly improved since the early days, and it is getting better.
That said, I also don't like the idea of settling with Git just because it's good. If there are alternatives out there that can do everything that Git can do, but with a simpler conceptual model, then I want to try them! (And, spoiler alert, I think there are better alternatives out there — in particular, I think Jujutsu is just as powerful as Git, if not more so, while simplifying and removing unnecessarily duplicated concepts.)
> Commits are snapshots of a tree. They have a list of ancestors (usually, but not always, just one). Tags are named pointers to a commit that don't change. Branches are named pointers to a commit that do change. The index is a tiny proto-commit still in progress that you "add" to before committing.
This is about as useful as "A monad is just a monoid in the category of endofunctors."
It's basically a lot of words which make zero sense for a user starting to use git -- even if it happens to be the most succinct explanation once they've understood git.
> The base abstractions are minimalist and easy. The things you want to do with them are elaborate and complicated. Learn the former, google the latter.
You can't really learn the former -- you can't even see it till you've experienced it for a while. The typical user groks what it means after that experience. Correction, actually: the typical user simply gives up in abject frustration. The user who survived many months of using a tool they don't understand might finally be enlightened about the elegant conceptual model of git.
The deal killer for me, the inescapable aspect of my users, is that they insist upon checking passwords into revision control.
Because the C and PL/SQL people are on CVS, I can fix this with vi on the ,v archive.
First on TFS repositories, and now with git grep I can easily find exposed passwords for many things. But it's just SQL Server!
We will never be able to use git responsibly, so I will peruse this guide with academic interest.
Don't even get me started on secrecy management.
I am looking forward to retirement!
The devs shouldn’t have access to prod credentials in the first place. That’s the real issue.
Then you need to hire someone else to manage the deployment of services though.
Internal audit said the same thing.
Quelle surprise!
Commiting credentials is also a real issue, best to avoid doing both.
Sounds like you need a pre-commit hook to check.
Pre commit hooks aren’t enforcable. People need to opt in to them, and the people who opt in to them are the people who will check for passwords before they commit.
Server-side pre-recieve hooks are better
Or do read books and guides. But in an exploratory manner. So when you do have a need for a specific operation (which happens rarely) you have a mental map that can give you directions.
I think the trickiness with the simple abstraction is that you end up looking at a commit graph and thinking "I would like to make a couple new nodes in this in a very specific shape, but one that many people have likely done in the past. Is there a trick?"
Like so much of the porcelain is those kinds of tricks, and make otherwise tedious work much simpler.
Imagine if you didn't have interactive rebases! You could trudge through the work that is done in interactive rebases by hand, but there's stuff to help you with that specific workflow, because it is both complicated yet common.
I think jujutsu is a great layer over git precisely because you end up with much simpler answers to "how do I change up the commit graph", though.... the extra complication of splitting up changes from commits ends up making other stuff simpler IMO. But I still really appreciate git.
Sigh. Another git thread, another pile of posts telling me that if I would _just do the work_ to understand the underlying data structure I could finally allow myself to be swept up in the _overwhelming beauty_ of the something something something.
The evidence that the git UI is awful is _overwhelming_. Yes, yes, I’m sure the people that defend it are very very very very smart, and don’t own a TV, and only listen to albums of Halloween sounds from the 1950s and are happy to type the word “shrug“ and go on to tell us how they’ve always found git transparent and easy. The fact is that brilliant people struggle with git every single day, and would almost certainly be better served by something that makes more sense.
GP isn't describing the underlying data structures, they're describing the basic interface of commits, branches, and tags. The 101 stuff you have to learn regardless, for any version control, not just git. Dismissing it like this just sounds like someone who refuses to hold scissors by the handles.
You’re right, of course, and I apologize to GP for conflating what they were saying with what I, to be fair, do often see in these threads.
Like others in these comments, I can use it just fine right up until I can’t. Then it’s back to the mini, many, many posts and questions and tutorials, sprawled across the Internet to try and solve whatever the issue is. JJ has shown that a better chrome can be put over the underlying model, And it’s frustrating to me that we are all collectively, apparently, expected to put up with a tool that generates so much confusion seemingly regardless of brilliance or expertise
Which brilliant people, who have put in an appropriate amount of time into learning any (D)VCS, are struggling with having a day to day working knowledge/familiarity with git? Can you point to some? Brilliant people is of course a definition question. But one of the defining qualities I would ascribe to a brilliant person, is the ability to quickly grasp concepts and ideas and reason about them. That seems to me to be the core quality one needs to use git, since it requires one to have a mental model, whether actually correct (which I think few people have) or just close enough to be useful.
There are tools for the UI part. Most people I know only use command line git for doing stuff where GUIs give up (i.e. fixing repos in weird states). Usually, checking out a clean clone and switching to that will do the same without the GUI, just takes a bit longer if you know the command line fixes.
The issues most people seem to have with git are common version control issues. Version control is actually hard, even if it's just "what has changed", once you start going beyond two users editing a file at once. When three people edit a file at the same time, there's going to be complexity when those changes need to be applied back, and that's where you start getting into branching/merging/rebasing.
Just like some people simply cannot get functional programming/object oriented programming/imperative programming to click in their head, others will never truly grasp version control. It's a paradigm of its own. People who know lots of data structures like to trivialise version control into data structures ("it's just a list of ...") but the data structures are the chosen solution, not the problem.
Another complexity issue is that git is actually pretty smart, and will fix most problems automatically in the background. Often, when you need to manually operate on a git repo, you're in a situation where git doesn't know what to do either, and leaves it up to you as the expert to fix whatever is going on. And frankly, most people who use git are nowhere close to experts. The better Git tooling gets at fixing these situations for you, the worse your situation will be once you need to manually correct anything, and the worse your perception might get.
I have no good advice for you on how to work Git better. All I can say is that I'm very productive with Jetbrains' IDE integration, others seem to prefer Visual Studio Code's git integration, and then there's the Tortoise people. Find whatever tool works best for you and hope you'll have a random epiphany one day.
I don't struggle with git, and I can assure you, I am not brilliant. I do, however, refuse to give up when something seems hard, and I refuse to ask the computer to be easier for me. (Understandably, I started programming computers to make them do what I wanted them to do, not to sit and whine when they didn't.)
Pretty much, yeah. Just do the work. It's not nearly as hard as whatever it is you're committing into it, I promise. Continuing to mock it via florid metaphor doesn't help anyone at this point.
I'm always kind of aghast at the number of people who not only don't know git, but who cannot or will not learn it over years, or even decades.
Listen, I'm not that smart, and I managed to figure out how to solve even gnarly git issues one summer during an internship... 11 years ago? Ish? Now, I know git well, and not just "the three commands". I would be, honestly, so ashamed if it were a decade on and I still hadn't committed to learning this fundamental tool.
Version control is a hard problem, fundamentally, and a tool for experts will always take more effort to understand. I mean, aren't we supposed to be the software experts? If people can't learn git, I wouldn't trust them with the even harder parts of software development.
But this is a common attitude in industry now, unfortunately: a petulant demand for things to be easier, and for someone else to do the learning. Is it any wonder software today is so bad?
If people can't learn git, I wouldn't trust them with the even harder parts of software development.
This idea breaks under pressure. People have limited concentration and the more you demand for daily routine, the less there’s left for the actual job. This argument only makes sense in a relaxed setting with lots of time and coffee breaks. But all these problems tend to happen at friday evening when you’re expected to get your kids in an hour or something and this damn repo got broken again.
Yes, things should be easier. Cause you get what you get. If you want people who have no issues with git, feel free to enjoy the greatly reduced hiring pool and stop whining about someone not being able to juggle fifty things at once in their mind - focus on your hiring process and where to get the budget for inflated compensation instead.
Is it any wonder software today is so bad?
I remember delphi and vb time, when people - who were unable to understand or use CVS and SVN - made full-blown apps for real sectors, and it worked. Because it was easy. Nowadays all we have is important dudes with pseudo-deep knowledge of git, css, framework-of-the-month and a collection of playbooks, who cannot make a db-enabled hello username message box in less than a day. I don’t think you’re moving in the right direction at all with this. This paradigm is going further and further from good software, actually.
There is one fundamental piece missing in your description of git that I think is the main reason people don't understand it. You have described a single DAG, but in git there are multiple DAGs. This is what it means to be a distributed version control system.
In my experience people come to git and start using it with the centralised paradigm in their heads: that there is one repo and one DAG etc. They think that their master branch is the same as "the" master branch. You just can't get good at git with this wrong understanding.
I'll just chime in with congrats on the new book. I was a huge fan of the Network Programming book that I first read in 2013, and which I still consider as having the best balance of approachability and rigor. Looking forward to checking the new one out. :)
Hey Beej, can you talk about what tool you use to create your guides? I'm assuming something like pandoc is involved for supporting your various formats?
I'm decent with git (usual flow, merging, rebasing, etc). I'm seriously considering switching over to jujutsu instead of becoming "better" at Git. jj is compatible with git and you can use it while your teammates can also just use git.
I regularly conduct 2 hr long "Intro to the Git Data Model" courses at my workplace (1-2 times a year). I literally take them into the .git directory and unzip the files to show how everything is just plain text representation of basic data structures. It's honestly cool to see it click in their heads.
We have a basic Git cookbook we share with any new joinees so that they start committing code, but most of them just follow it religiously and don't understand what's going on (unsurprisingly).
However, literally everyone who attends the course comes out with a reasonable working understanding of Git so that they know what's actually happening.
That does NOT mean that they know all the commands well, but those can be trivially Googled. As long as your mental model is right, the commands are not a big deal. And yet, the vast majority of the discussion on HN on every single Git post is about the command line.
Funnily enough the class sounds a lot like the alt text of https://xkcd.com/1597/ (Just think of branches as...), the difference is that that is unironically the right way to teach Git to a technical audience, and they will come out with a fundamental understanding of it that they will never forget.
I honestly think it's such a high ROI time investment that it's silly to not do it.
> As long as your mental model is right, the commands are not a big deal.
A priori, I would have assumed this was one of those "just understand how every layer and every part of Linux works, and using Linux is easy" type arguments people used to make in the 90s - i.e. theoretically true, practically infeasible for most people.
Thankfully, I was lucky enough to come across a video explaining (some of) the git internal model early on, and it really doesn't take that much or that deep a knowledge of the internals for it to make a big difference. I'd say I know maybe 5% of how git works, and that already gave me a much better understanding of what the commands do and how to use them.
I did it once, I was indeed really nice, and the discussion that we did after was very cool. I put in the last slide of the presentation some questions for my colleagues answer based on the Git data model, e.g.: "Can we move a commit to another branch?" or "What guarantees that we don't have cycles in the commit graph". I was really satisfying that people came out thinking Git, not only using it!
Exactly, and it's such a high success rate!
This is precisely why it enrages me when all HN discussion about Git devolves to the same stuff about how it's complex and this and that.
A technical person who has general sense about basic data structures (Leetcode nonsense not needed) can be taught Git in under 2 hours and they will retain this knowledge forever.
If you can't invest that little time to learning a tool you will use everyday and instead will spend hours Googling and blindly copy-pasting Git commands, that's on you, not on Git.
That last question is a cryptography question in disguise; the answer is "the fact that SHA-1 collisions are still impractical for most people".
> I regularly conduct 2 hr long "Intro to the Git Data Model" courses at my workplace (1-2 times a year)
Does the course material (and perhaps any recordings) have any proprietary information or constraints to prevent you from sharing it publicly? Is this based on something that’s publicly available yet concise enough to fit within two hours? If yes, please share (perhaps in this thread and as a post submission on HN).
I’m asking because I believe that there can never be enough variety of training materials that handle a topic with different assumptions, analogies, focus, etc.
Is there a copy/video of your talk, or a similar one you recommend?
Share your video!
Just discovered this all for the first time, and these guides are incredible! I've downloaded the GIT and networking ones. The humour is the best. I wish all textbooks were filled with jokes...
i feel that the big problem with git is how it applies names to procedures that are MUCH easier to understand unnamed. you can have a model of the current repo state, and the state you wish to reach. instead of just coding this difference on the data structure level, as imperative statements or functional expressions, we’re forced to translate them into a sequence of weird names and flags representing conversions into intermediate states.
Love seeing this. Beej is one of the greatest in our industry. His educational content is top notch and always free...an increasingly rare thing in the age where everyone tries to monetize their knowledge via paid courses and newsletters.
Keep on fighting the good fight, Beej.
I think the biggest problem with CVS is the lack of consensus on what and how to push to the repo.
On one hand you have the ideal world scenario when each and every change is granular and you can annotate and blame every single line of code with description. On the other hand you have a real world where teams are encouraged to squash changes so that every commit corresponds to a business requirement and you have to engage a whole cabal to smuggle a refactor.
A long time ago I've implemented a routine to use both SVN and GIT, so that I could use GIT on file save, and SVN on feature release. I think it was inspired by Eclipse workflow. Definitely not something I would recommend these days.
On the promise of going back in time, I’m finding myself getting more utility of VS Codes timed snapshots than my own commits.
I find it hard to judge when things are in a good enough state to commit and especially good enough to have a title.
I might start writing a new function, decide that I want it to be a class only to give up the class and wanting to return to my almost complete function. Snapshot works pretty well for that, but got isn’t really centered around snapshots and doing good snapshots is not straightforward, at least to me.
What do you guys do?
When working together with other people using Git, I commit fast and often. My commit messages can be anything from "jdwqidqwd" to "add widget frubble() method" while I'm working. Sometimes repeated several times over, sometimes I remember to hit the "amend" checkbox. Basically, whenever I'm somewhat satisfied with the state of my program, I commit, finished or not. Everything in a nice, local, separate branch, pushed occasionally to make sure I don't lose any data.
And then when everything works, compress commits into a few big commits with squash, and actually try to merge that back into the main branch.
> I might start writing a new function, decide that I want it to be a class only to give up the class and wanting to return to my almost complete function.
For me, that would easily be three commits in my dev branch (one with a first implementation of the function, one with a refactor to a class, then another one back to a single function) and when the function is finished, one squashed commit in a merge request. If everything goes right, it's as if the class file was never there.
It has to be said, relying on squashing doesn't work well when you're working in a team that doesn't pay too close attention to merge requests (accidentally merging the many tiny commits). You also have to be careful not to squash over merges/use rebase wherever possible so your squashed commits don't become huge conflicts during merge trains.
When I work on my own stuff that I don't share, I don't bother squashing and just write tons of tiny commits. Half of them leave the code in a non-compiling state but I don't necessarily care, I use them as reference points before I try something that I'm not sure works.
There is something to be said for carefully picking commit points, though. While finding the source of a bug, git becomes incredibly powerful when you can work git bisect right, and for that you need a combination of granularity and precision. Every commit needs to have fully working code, but every commit should also only contain minimal changes. If you can find that balance, you can find the exact moment a bug was introduced in a program within minutes, even if that program is half a decade old. It rarely works perfectly, but when it does, it's a magical troubleshooting tool.
A commit is literally a snapshot :) It is also very easy to make.
Stop worrying about titles and content and commit to your heart’s content.
When ready, restructure those snapshots into a coherent story you want to tell others by squashing commits and giving the remaining ones proper titles and commit messages. I use interactive rebase for that, but there are probably other ways too.
> I find it hard to judge when things are in a good enough state to commit
Work in a feature branch. Commit often. Squash away the junk commits at the end.
> ...and especially good enough to have a title.
Who needs a title? It's perfectly fine to rapid-fire commits with no comment, to create quick save points as you work. Bind to a key in your editor.
I treat commits in a private branch the same as the undo log of the text editor. No one cares about the undo log of your editor as they never see it. The same should be true of your private feature branch commits. They are squashed away never to be seen by human eyes again.
If I had 5 cents for every commit in a feature branch with the commit message "wip"...
I have nothing but fond memories of reading Beej's guides.
It's also this sort of work that's becoming less necessary with AI, for better or worse. This appears to be a crazy good guide, but I bet asking e.g. Claude to teach you about git (specific concepts or generate the whole guide outline and go wide on it) would be at least as good.
Seems more efficient to have one reference book rather than generating entire new 20 chapter books for every person.
I also think if you are at the “don’t know what you don’t know” point of learning a topic it’s very hard to direct an AI to generate comprehensive learning material.
> Seems more efficient to have one reference book rather than generating entire new 20 chapter books for every person.
The main advantage of LLMs is that you can ask specific questions about things that confuse you, which makes iterating to a correct mental model much faster. It's like having your own personal tutor at your beck and call. Good guidebooks attempt to do this statically... anticipate questions and confusions at the right points, and it's a great skill to do this well. But it's still not the same as full interactivity.
I think a mix is the right approach. I’ve used LLMs to learn a variety of topics. I like having a good book to provide structure and a foundation to anchor my learning. Then use LLMs to explore the topics I need more help with.
When it’s just a book. I find myself having questions like you mentioned. When it’s just LLMs I feel like I don’t have any structure for my mind to hold on to.
I also feel like there is an art to picking the right order to approach learning a topic, which authors are better at than LLMs.
Agree and didn't mean to imply otherwise.
A good book by an expert is still better than LLMs at providing high-level priorities, a roadmap for new territory, and an introduction to the way practitioners think about their subject (though for many subjects LLMs are pretty good at this too). But the LLMs boost a book's effectiveness by being your individualized tutor.
This is a bit of a stretch, but it's a little like distillation, where you are extracting from the vast knowledge of the LLM and inserting those patterns into your brain. Where you have an incomplete or uncertain mental model and you ask a tutor to fill in the blanks.
Altho maybe I'm stretching the analogy too far.
Wow this is really well stated, thanks.
True although the don't know aspect is where LLMs will be magic. I envy today's youth for having them (and I'm not that old at all)
I remember fumbling around for ages when I first started coding trying to work out how to save data from my programs. Obviously I wanted a file but 13 year old me took a surprisingly long time to work that out.
Almost impossible to imagine with AI on hand but we will see more slop-merchants.
Definitely more efficient in terms of power consumed, not so in terms of human effort to build such guides across nearly every topic one could think of. But you're right, we shouldn't ignore the power consumption.
I have found that asking AI "You are an expert teacher in X. I'd like to learn about X, where should I start?" is actually wildly effective.
> not so in terms of human effort to build such guides across nearly every topic
How will LLMs be trained if no humans are making learning materials?
Whoever, or whatever, is creating the thing that needs reference materials would have to seed the initial set (just as they/it seeded the thing itself) and then go from there.
If you didn't, then you won't be included the training set (obviously) and the AI would not easily know about you. Sort of how if you start a really cool company but don't make a website Google doesn't know about you and can't return you in their search results. It's valuable for Google (AI) to know about you, so it's valuable to build the sites (docs) to get indexed (trained on).
This is the sort of work that makes it possible for AI to be useful at all.
I view AI as a challenge. It means I have to really raise the bar. :)
I don't disagree, but since the quality of AI is largely a function of the quality of human content, there's always going to be value in well-written human content. If humans stop producing content, I think the value of AI/LLMs drop significantly as well.
Haven’t checked out the article, I’m sure its great. But another reco is boot.dev’s git course taught by Primeagen. It’s interactive and He goes real deep down to manipulating files in the .git directory. Came out of that course with a whole new mental model of how git works.
I feel like a lot of the problems with Git UI come from needing to interact with a "normal" filesystem. I'd much rather have a FUSE/9p mount with all commits, index, etc. available as directories, where they can be read, written, diffed, etc.
I love how there's a section for exiting vim
I am not a git fan. After many years (following use of RCS, SCCS, CVS, SVN) I tried it and found that its whole mental model was weird and awkward. I can get around in it but any complicated merge is just painful.
Anyway, the comment I really wanted to make was that I tried git lfs for the first time. I downloaded 44TB (https://huggingface.co/datasets/HuggingFaceFW/fineweb/tree/m...) over 3-4 days which was pretty impressive until I noticed that it seems to double disk space (90TB total). I did a little reading just to confirm it, and even learned a new term "git smudge". double disk space isn't an issue, except when you're using git to download terabytes.
Git is absolutely terrible for large files, especially binary files. That's why git LFS rarely ever uses git as a storage mechanism.
I know programmers like everything to be in version control, but AI models and git just aren't compatible.
This guide looks really nice, targeting novice Git users.
I've also written a guide, targeting devs with basic Git experience. It is much shorter, maybe you or your team can benefit from it [1]
[1] https://www.augmentedmind.de/2024/04/07/ultimate-git-guide-f...
I'm really interested and really hoping this is something I can sink my teeth into. I've always had frustrating experiences with trying to wrap my head around git and have to regularly use it at my job.
Branching, making commits, and creating pull requests come easy, but beyond that, I know utterly nothing about it.
One mistake that I see people making about Git is trying to learn more commands, more flags, more tricks, but not trying to really understand how it works. Perhaps it's your case. You know Git enough to use in your daily basis, so maybe it's time to dive into a lower level and then everything else will be natural.
I strongly suggest reading Pro Git, the official Git book by Scott Chacon and Ben Straub, available for free here: https://git-scm.com/book/en/v2.
I find it very pleasant to read and it really changed my perspective not only about Git but about how to write code in general. You don't need to read it entirely, but suggest at least these sections:
- 1.3 Getting Started - What is Git?: explains a little about snapshots and the three states
- 10.1 ~ 10.3 Plumbing and Porcelain, Git Objects and Git References: this explains Git in its lowest level, which is surprisingly simple but powerful. Those sections were enough for me to write my own "Git" (you can see it here: https://github.com/lucasoshiro/oshit)
If you do check it out and there are parts that are confusing, I'd love to hear about it.
This is partially a question and the rest is shameful confession: I had haltingly used cvs as a solo programmer, and when I was suddenly no longer a solo programmer and had to use git, everything went haywire.
I am an Old and we never were taught anything about coding with other people who were also working on the same project. I have had many successful projects but never with another person.
With that as a background, does your guide cover things like:
1) Merging. I was told that merging happens "automagically" and I cannot, for the life of me, understand how a computer program manages to just ... blend two functions or whatever and it "works." Does your guide make sense of this?
2) Apparently there are holy wars (see also vi versus emacs) about the One True Way to ... decide on branches and whatnot. Are there pros and cons laid out anywhere?
3) Everything seems broken down into teensy tiny functions when I look at someone's git repository, just skillions of files all over the place. Is this a git thing, a code repository thing, or simply that, in order for multiple people to work on the same project, everything must be atomized and then reassembled later? What's your opinion?
1) I say it happens automagically. :) I'm not really familiar with merging strategies, but I would imagine that it happens in a way similar to a diff. But it only works if the changes are "distant" from one another. If they are happening too close in the file, Git punts and tells you there's a conflict you have to manually resolve.
2) I try to stay out of holy wars. Use the right tool for the job, I say, and sometimes that could even be Emacs. ;) I do talk about a few of the more common options for branching in teams (mostly focused on the small teams we tend to have here in university). And I talk about their pros and cons. But I stop short of declaring one the winner since the best option depends on the circumstances. IMHO.
3) I've seen repos with big functions and small functions, so I don't think it's a Git thing. Students tend to write functions that do too much and are too large, but it's certainly possible to break things down into pieces that are prohibitively tiny. Overall this seems more of a software engineering design choice than anything Git-related.
Jumping in on 3: This isn't a git thing, this is a "bad design" thing. I'm thinking it looks like a git thing because two things happened at the same time: git got popular right as there was a huge influx of juniors from a mix of bootcamps and being self-taught, who never learned how to architect their code.
I feel like there is a trick that is missed by many guides (including this one) and most git GUIs I've looked at (with notable exception being magit).
That is, to set your upstream branch to the branch you want to merge into, aka the integration branch. So instead of setting upstream of "feature/foo" to "origin/feature/foo", you would set it to "master" or "origin/master".
This simplifies a lot of things. When you run `git status` it will now tell you how far you have diverged from the integration branch, which is useful. When you run `git rebase` (without any arguments), it will just rebase you on to upstream.
Setting `origin/feature/foo` to upstream is less useful. Developers tend to "own" their branches on the remote too, so it's completely irrelevant to know if you've diverged from it and you'll never want to rebase there.
If you set `push.default` to "current", then `git push` will do what you expect too, namely push `feature/foo` to `origin/feature/foo`.
Why isn't this a more common setup?
Something like 28 years ago I created my first TCP/IP sockets in C by reading one of Beej's tutorials. Thanks Beej!
I love the beej guides, and I'm OK with git, but I'm sad there isn't any really viable hosting for mercurial any more.
The Beej books are awesome. I remember reading these and watching the 3DBuzz video tutorials when I was first learning how to program.
In case you missed it also on the front page Tuesday, 20+ anecdotes recommending a newer git-compatible VCS:
Jujutsu VCS: Introduction and patterns
https://news.ycombinator.com/item?id=42934427
Initial impressions: Looks great.
As a cloud security analyst that is thinking of going back to coding or DevSecOps, if I'm honest with myself, there is nothing new here that I have not seen before... (This is not a criticism or anything. If anything the problem is myself: if I can allocate time to learn this or use Anki to retain this).
I'm a fan of Beej's writing style.
Extremely well-written!
Thanks!
I’ve been using Git for years, but I bet that I’ll learn something from this.
Always been delighted enjoyer of beejs guides specially python and networking
Wow 200+ pages, will def have fun sifting thru this one. Thanks!
Why are there so many guides to git?
Because it's challenging to use, and writing effectively about challenging-to-use things is a really fun challenge.
At least, that's why I'm in it. That, and to do my best to help students succeed.
Wake up babe. A new Beej’s guide just dropped.
A new Beej guide? In this economy!?
> GitHub is a web-based front-end to Git.
¯\_(ツ)_/¯
208 mentions of GitHub.
4 mentions of Gitea.
3 mentions of GitLab.
Why is it so biased and why is it helping to continue to teach people to centralized git.
Wow 200 page guide on git.
I just tell chatgpt what i want and it gives me a command to do it.
Wowza a new Beej guide!