Intro

Not long ago I posed a challenge for those of us learning rust: https://lemmy.ml/post/12478167.

Basically write an equivalent of git diff --no-index A B … a file differ.

While it’s never too late to attempt it, I figured it’d be a good time to check in to see what anyone thought of it, in part because some people may have forgotten about it and would still like to have a shot, and also because I had a shot and am happy with what I wrote.

Check In

I’ll post where I got up to below (probably as a comment), but before that, does anyone have anything to share on where they got up to … any general thoughts on the challenge and the general idea of these?

My experience

My personal experience was that I’d not kept up with my rust “studies” for a bit and used this as a good “warm up” or “restart” exercise and it worked really well. Obviously learning through doing is a good idea, and the Rust Book is a bit too light, IMO, on good exercises or similar activities. But I found this challenge just difficult enough to make me feel more comfortable with the language.

Future Challenges

Any ideas for future challenges??

My quick thoughts

  • A simple web app like a todo app using axtix_web and diesel and some templating crate.
  • Extend my diffing program to output JSON/HTML and then to diff by characters in a string
  • A markdown parser??
    • maegul (he/they)@lemmy.mlOPM
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 months ago

      I didn’t know about John Crickett! Thanks!! For those that don’t know: their home page … and their recommended challenges for rust.

      Yea that seems like a good resource for sure. There’s been talk of running through Advent of code here but no one has picked it up yet.

      The trick with such things, AFAICT, is to not get too stuck in the details of the abstract problem but instead find a problem that’s solvable enough so that you can flex your rust muscles. I personally found this diff challenge well balanced in that regard but I can imagine it might be either too hard or easy for others. Some of Crickett’s problems seem like they might be too involved, at least for some, and so some curation would make sense.

      In the end, the reason why I ask people if they have any ideas, is that writing something that you want to write is likely to work well. I wanted to write a diff, so I did, and my quick thoughts in the OP are also things I’d be happy trying to do. So finding challenges that are interesting to people, from Cricket’s or anywhere else would probably be best.

      From the reccomended list of challenges for rust (linked above), the Redis server and git client strike me as appropriate for here … ?

  • maegul (he/they)@lemmy.mlOPM
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    6 months ago

    Please provide any feedback or thoughts on the approach and the posted code below!

    My Approach

    So my approach was a straight forward approach (no real algorithm) that was implemented basically as a state machine. I new ahead of time that using match statements with rust was likely a good approach so I first proto-typed my approach in python and debugged a bit there before implementing the idea in rust.

    My approach was basically to go through both files simultaneously line-by-line , looking for where lines matched or differed. When lines matched/differed in the same way, they were grouped into “chunks”, each of a specific kind depending on the nature of the matching/differing. They key piece to sort out was how to detect the various ways in which the two files could differ from each other (eg, addition, deletion, modification).

    I settled on this idea which seems to work: A particular kind of changed-chunk is identified by (and ends with) how its final line matches with one of the initial lines of the changed-chunk. The diagram below illustrates.

    Beyond this the rest was sorting out the various states the process can enter and how to move between them.

    Linked Lines are found to match or be identical
    
    Addition:
    
     A           B
    
     -           -
    |-| <-\     |-| <--- highlighted lines = current chunk of lines identified as changed
    |-|    \    |-|
     -      \--> -  <--- Current line
    
    Deletion:
    
     -           -
    |-|     /-> |-|
    |-|    /    |-|
     -  <-/      -
    
    Modification:
    
     -           -
    |-|         |-|
    |-|         |-|
     -  <------> -
    

    Rust programming

    As for the details of using rust for this, I stumbled on a state machine pattern:

    let mut state = State::Start;
    loop {
        state = match state {
            ...
            State::Finished => break
        }
    }
    

    … which I found pretty nice.

    I also found using an enum for the state variable along with associating data with each state a useful, flexible and elegant way to go. The program basically becomes just a single enum and the looping pattern above, along with a few data structures and some methods. Is this normal/idiomatic for rust??

    The only hiccough I ran into, apart from not knowing how to do things like read standard input or a file was thinking I could trivially match on two variables at a time. I wrote about that recently here. Once the compiler got angry at me about that I got worried that I didn’t know what I was doing at all, but AFAICT, it was a silly idea in the first place and best avoided for the sort of the thing I was trying to do.

    Another minor issue I ran into was that I was tracking line numbers throughout the program, which were used as indices into a vector of strings, and I both wanted to do arithmetic on these numbers but also compare to the lengths of the vectors of lines. I settled on sticking with usize types for all such numbers, which required some conversions … but writing this now I think I was just scared and should have just converted and stored the vector lengths to a default i32 from the beginning and made things easier … oh well.

    I’ll post the code in a separate comment below to keep things clean.

    • JayjaderM
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 months ago

      I also found using an enum for the state variable along with associating data with each state a useful, flexible and elegant way to go. The program basically becomes just a single enum and the looping pattern above, along with a few data structures and some methods. Is this normal/idiomatic for rust??

      In my experience, this is the sweet spot for Rust programming. If you can formulate your approach as a state machine like this, then you almost always should - especially when writing in Rust.

      The only times I’d pass on the pattern is if stuffing all of the ambiant state of a problem into different states of a state machine just to make the giant loop work correctly introduces too much cognitive complexity (aka makes the code too hard to read / follow).

      • maegul (he/they)@lemmy.mlOPM
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 months ago

        Thanks for the feedback! What you say tracks exactly with my feelings after writing this.

        I personally did encounter enough issues with the borrow checker while writing this, largely because I’m not on top of ownership and language enough, that I’m still somewhat wary of moving off of clean data flow patterns like this state machine one.

        Maybe I should try re-writing it using “dumb” globals and a procedural/imperative style and see how I go?

        What would you be reaching for, in terms of patterns/approaches, when you know your data flow is too messy for a state machine?

        • JayjaderM
          link
          fedilink
          English
          arrow-up
          3
          ·
          6 months ago

          What would you be reaching for, in terms of patterns/approaches, when you know your data flow is too messy for a state machine?

          Bare if and loop expressions. Basically what you described as “dumb globals and a procedural/impérative style”. Of course, defining some custom types is almost always useful if not needed for “properly” handling ownership.

    • maegul (he/they)@lemmy.mlOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      Code

      use std::env;
      use std::fs::read_to_string;
      
      // > Data Types
      
      #[derive(Debug)]
      struct FileLens {
          a: usize,
          b: usize
      }
      impl FileLens {
          fn new(a_lines: &Vec<&str>, b_lines: &Vec<&str>) -> FileLens {
              FileLens{
                  a: a_lines.len(),
                  b: b_lines.len()
              }
          }
      }
      
      
      #[derive(Debug)]
      struct Cursors {
          a: usize,
          b: usize,
      }
      impl Cursors {
          fn increment_cursor(&self, chunk: Option<&Chunk>) -> Cursors {
              let (a_inc, b_inc) = match chunk {
                  Some(chnk) => {
                      match chnk.chunk_type {
                          ChunkType::Addition => (
                              -1 * (i32::try_from(chnk.b_lines.len())
                                  .expect("Lines vector too big for i32!")),
                              0,
                          ),
                          ChunkType::Deletion => (
                              0,
                              -1 * (i32::try_from(chnk.a_lines.len())
                                  .expect("Lines vector too big for i32!")),
                          ),
                          ChunkType::Match | ChunkType::Modification => (1, 1),
                      }
                  }
                  None => (1, 1),
              };
      
              let new_a_cursor = (i32::try_from(self.a).expect("opps")) + a_inc;
              let new_b_cursor = (i32::try_from(self.b).expect("opps")) + b_inc;
      
              // negative cursors should be captured here (despite bad error msg)
              Cursors {
                  a: usize::try_from(new_a_cursor).expect("oops"),
                  b: usize::try_from(new_b_cursor).expect("oops"),
              }
          }
      
          fn clone(&self) -> Cursors {
              Cursors{a: self.a, b: self.b}
          }
      }
      
      #[derive(Debug)]
      enum ChunkType {
          Match,
          Addition,
          Deletion,
          Modification,
      }
      #[derive(Debug)]
      struct Chunk {
          chunk_type: ChunkType,
          a_lines: Vec<String>,
          b_lines: Vec<String>,
          cursor: Cursors,
      }
      #[derive(Debug)]
      #[derive(Clone)]
      enum FileSpec {
          A,
          B,
          AB
      }
      #[derive(Debug)]
      enum CompType {
          Match,
          Change,
          FileEnd(FileSpec)
      }
      // No Union for LineComp and FileSpec, so fold together
      #[derive(Debug)]
      struct LineComp {
          // don't store the string, as copying/borrowing gets involved, instead the cursor
          // ... we can get all the strings at the end when necessary
          cursor: Cursors,
          kind: CompType
      }
      
      
      // > States
      
      #[derive(Debug)]
      struct ChunkData {
          lines: Vec<LineComp>,
          cursor: Cursors,
          start_cursor: Cursors
      }
      
      impl ChunkData {
          fn set_cursor(&mut self, cursor: Cursors) {
              self.cursor = cursor;
          }
      }
      
      enum State {
          NewChunk {cursor: Cursors},
          NewLine {chunk_data: ChunkData},
          ContinuingChunk {chunk_data: ChunkData, line_read: LineComp},
          ContinuingChangeChunk {chunk_data: ChunkData, line_read: LineComp},
          EndingChunk {chunk_data: ChunkData, line_read: LineComp},
          EndingChangedChunk {chunk_data: ChunkData, chunk_type: ChunkType},
          // hmmm, LineComp type is actually LineComp with FileEnd as kind ... how type?
          FileEnd {chunk_data: ChunkData, line_read: FileSpec},
          FileEndChange {chunk_data: ChunkData},
          End
      }
      
      
      fn read_line(
          cursor: &Cursors, file_lens: &FileLens,
          a_lines: &Vec<&str>, b_lines: &Vec<&str>) -> LineComp {
      
          let a_end = ! (cursor.a < file_lens.a);
          let b_end = ! (cursor.b < file_lens.b);
      
          match (a_end, b_end) {
              (false, false) => {
                  if a_lines[cursor.a] == b_lines[cursor.b] {
                      LineComp{cursor: cursor.clone(), kind: CompType::Match}
                  }
                  else {
                      LineComp{cursor: cursor.clone(), kind: CompType::Change}
                  }
              },
              (false, true) => LineComp{cursor: cursor.clone(), kind: CompType::FileEnd(FileSpec::B)},
              (true, false) => LineComp{cursor: cursor.clone(), kind: CompType::FileEnd(FileSpec::A)},
              (true, true) => LineComp{cursor: cursor.clone(), kind: CompType::FileEnd(FileSpec::AB)},
          }
      
      }
      
      ...
      
      
      • maegul (he/they)@lemmy.mlOPM
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 months ago

        … continued

        fn main() {
            // > Getting args
            let args: Vec<String> = env::args().collect();
        
            if args[1..].len() != 2 {
                panic!(
                    "Must provide two paths. Instead provided {:}",
                    args[1..].len()
                );
            }
        
            println!("Args:");
            for a in args[1..].iter() {
                println!("{}", a);
            }
        
            // > Reading files and splitting into lines
        
            let a = read_to_string(&args[1]).expect("Failed to read file");
            let b = read_to_string(&args[2]).expect("Failed to read file");
            let a_lines: Vec<&str> = a.split("\n").collect();
            let b_lines: Vec<&str> = b.split("\n").collect();
        
            // > Initialising globals
        
            let file_lengths = FileLens::new(&a_lines, &b_lines);
        
            let cursor = Cursors { a: 0, b: 0 };
            let mut chunks: Vec<Chunk> = vec![];
        
            // mut necessary as overwriting in each loop
            let mut state: State = State::NewChunk {cursor};
        
            // > Loop
            loop {
                state = match state {
                    State::NewChunk { cursor } => State::NewLine {
                        chunk_data: ChunkData{
                            lines: Vec::new(), cursor: cursor.clone(), start_cursor: cursor}
                    },
                    State::NewLine { chunk_data } => {
                        let line_read = read_line(&chunk_data.cursor, &file_lengths, &a_lines, &b_lines);
        
                        match chunk_data.lines.as_slice() {
                            [] => {
                                match line_read.kind {
                                    CompType::Match => State::ContinuingChunk {
                                        chunk_data, line_read },
                                    CompType::Change => State::ContinuingChunk {
                                        chunk_data, line_read },
                                    CompType::FileEnd(file_spec) => State::FileEnd {
                                        chunk_data, line_read: file_spec},
                                }
                            },
                            [.., lc] => {
                                match lc.kind {
                                    CompType::Match => {
                                        match line_read.kind {
                                            CompType::Match => State::ContinuingChunk {
                                                chunk_data, line_read },
                                            CompType::Change => State::EndingChunk {
                                                chunk_data, line_read },
                                            CompType::FileEnd(file_spec) => State::FileEnd {
                                                chunk_data, line_read: file_spec},
                                        }
                                    }
                                    CompType::Change => {
                                        match line_read.kind {
                                            CompType::Match => State::EndingChunk {
                                                chunk_data, line_read },
                                            CompType::Change => State::ContinuingChangeChunk {
                                                chunk_data, line_read },
                                            CompType::FileEnd(_) => State::FileEndChange {
                                                chunk_data},
                                        }
                                    }
                                    CompType::FileEnd(_) => panic!(
                                    // error! should not have come here from FileEnd
                                        "Failed to process file end correctly (failed at lines a:{},b:{})",
                                        line_read.cursor.a, line_read.cursor.b),
                                }
                            }
                        }
                    },
                    State::ContinuingChunk { mut chunk_data, line_read } => {
                        chunk_data.lines.push(line_read);
                        let new_cursor = chunk_data.cursor.increment_cursor(None);
                        chunk_data.set_cursor(new_cursor);
                        State::NewLine{chunk_data}
                    },
                    State::ContinuingChangeChunk { chunk_data, line_read } => {
                        let first_lc = chunk_data.lines.first().unwrap();
                        if a_lines[first_lc.cursor.a] == b_lines[line_read.cursor.b] {
                            State::EndingChangedChunk{
                                chunk_data, chunk_type: ChunkType::Addition
                            }
                        }
                        else if a_lines[line_read.cursor.a] == b_lines[first_lc.cursor.b] {
                            State::EndingChangedChunk{
                                chunk_data, chunk_type: ChunkType::Deletion
                            }
                        }
                        else {
                            State::ContinuingChunk { chunk_data, line_read }
                        }
                    },
                    State::EndingChunk { chunk_data, line_read } => {
                        let chunk_type = match line_read.kind {
                            CompType::Match => ChunkType::Modification,
                            CompType::Change => ChunkType::Match,
                            CompType::FileEnd(_) => panic!(
                                    // error! should not have come here from FileEnd
                                    "Failed to process file end correctly (failed at lines a:{},b:{})",
                                    line_read.cursor.a, line_read.cursor.b
                                )
                        };
                        let new_a_lines: Vec<String> = chunk_data.lines.iter()
                                                        .map(|lc| String::from(a_lines[lc.cursor.a]))
                                                        .collect();
                        let new_b_lines: Vec<String> = chunk_data.lines.iter()
                                                        .map(|lc| String::from(b_lines[lc.cursor.b]))
                                                        .collect();
                        chunks.push(
                            Chunk{
                                chunk_type,
                                a_lines: new_a_lines,
                                b_lines: new_b_lines,
                                cursor: chunk_data.start_cursor,
                            }
                        );
        
                        // continue from last read line, but with a new chunk
                        // ... repetitive yes, but cleaner code I think
                        State::NewChunk { cursor: line_read.cursor }
                    },
                    State::EndingChangedChunk { chunk_data, chunk_type } => {
        
                        let new_a_lines: Vec<String> = chunk_data.lines.iter()
                                                        .map(|lc| String::from(a_lines[lc.cursor.a]))
                                                        .collect();
                        let new_b_lines: Vec<String> = chunk_data.lines.iter()
                                                        .map(|lc| String::from(b_lines[lc.cursor.b]))
                                                        .collect();
                        chunks.push(
                            Chunk{
                                chunk_type,
                                a_lines: new_a_lines,
                                b_lines: new_b_lines,
                                cursor: chunk_data.start_cursor,
                            }
                        );
                        let new_cursor = chunk_data.cursor.increment_cursor(Some(chunks.last().unwrap()));
        
                        State::NewChunk { cursor: new_cursor }
                    },
                    State::FileEnd { chunk_data, line_read} => {
                        match line_read {
                            FileSpec::A => {
                                chunks.push(
                                    Chunk{
                                        chunk_type: ChunkType::Addition,
                                        a_lines: vec![],
                                        b_lines: b_lines[chunk_data.cursor.b..].iter()
                                                    .map(|s| String::from(*s)).collect(),
                                        cursor: chunk_data.cursor,
                                    }
                                );
                                State::End
                            },
                            FileSpec::B => {
                                chunks.push(
                                    Chunk{
                                        chunk_type: ChunkType::Deletion,
                                        a_lines: a_lines[chunk_data.cursor.a..].iter()
                                                    .map(|s| String::from(*s)).collect(),
                                        b_lines: vec![],
                                        cursor: chunk_data.cursor,
                                    }
                                );
                                State::End
                            },
                            FileSpec::AB => State::End,
                        }
                    },
                    State::FileEndChange { chunk_data } => {
                        let a_cursor = chunk_data.start_cursor.a.clone();
                        let b_cursor = chunk_data.start_cursor.b.clone();
                        chunks.push(
                            Chunk{
                                chunk_type: ChunkType::Deletion,
                                a_lines: a_lines[a_cursor..].iter()
                                            .map(|s| String::from(*s)).collect(),
                                b_lines: vec![],
                                cursor: chunk_data.start_cursor,
                            }
                        );
                        chunks.push(
                            Chunk{
                                chunk_type: ChunkType::Addition,
                                a_lines: vec![],
                                b_lines: b_lines[b_cursor..].iter()
                                            .map(|s| String::from(*s)).collect(),
                                cursor: chunk_data.cursor,
                            }
                        );
                        State::End
                    },
                    State::End => break,
                };
        
            }
        
            // > Wrap up
        
            println!("Done!");
        
            for (i,c) in chunks.iter().enumerate() {
                println!("\n--- Chunk: {} ---", i);
                println!("Type: {:?}", c.chunk_type);
                println!("A: {:?}", c.a_lines);
                println!("B: {:?}", c.b_lines);
            }
        }
        
        
  • JayjaderM
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 months ago

    A markdown parser??

    This is a great challenge to get into writing parsers, and/or try out the excellent nom crate.

    • maegul (he/they)@lemmy.mlOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      Thanks! I was just gonna “state machine” it but nom looks like fun too! Hadn’t seen a parsing library/apprach like that before (“combinator”?)

      • JayjaderM
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 months ago

        If I’m not mistaken, it’s how they teach parsing in schools. But even then, they have you figure out the exact combinators that you need first, on paper, and then implement them more-or-less entirely by hand. So you wouldn’t encounter a combinator library such as this one.

        I’m aware of few combinator libraries, mostly because they really require functions-as-first-class-citizens to be intended by whatever language you’re using in the first place. I wouldn’t be surprised if languages like haskel, ocaml, and fsharp have many.

  • JayjaderM
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 months ago

    An idea for a future challenge: use actix_web (for example) to make a web interface for your diff tool.

    Notably, this gives you an excuse to try dealing with refactoring existing code into separate modules - albeit this can be greatly trivialized with a sufficiently powerful IDE. I don’t know what you’ve been using so far.

    Dealing with file uploads can provide an interesting change over the “classic” todo CRUD as well, if one is tired of that. Not to mention the diff output/result is similarly a bit more interesting data to communicate over http.

    This might be more appropriate if attempted once your first 2 challenges listed are tackled (todo web app & JSON/HTML for diff).

    • maegul (he/they)@lemmy.mlOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      6 months ago

      An idea for a future challenge: use actix_web (for example) to make a web interface for your diff tool.

      Nice idea (I honestly hadn’t thought of that)! Honestly kinda keen on this!

      Notably, this gives you an excuse to try dealing with refactoring existing code into separate modules - albeit this can be greatly trivialized with a sufficiently powerful IDE. I don’t know what you’ve been using so far.

      I’m just using rust-analyzer (inside of Sublime) so nothing really powerful. What are you using/recommending? … i realise I’m not aware of any “goto” rust IDE. Do IntelliJ have a rust IDE or do people just use CLion?

      • JayjaderM
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 months ago

        As of last November, Jetbrains have released into their Early Access Program their new rust-flavored variant of intelliJ, named RustRover.

        I have found it very pleasant, from the debugger working as expected to being able to tell it to use cargo clippy instead of cargo check for code hints and warnings.