컴퓨터는 바이트의 데이터 유형을 어떻게 결정합니까? 공간을 사용하기 때문에 이것이 사실이라고

예를 들어 컴퓨터가 10111100하나의 특정 RAM 바이트에 저장된 경우 컴퓨터는 이 바이트를 정수, ASCII 문자 또는 다른 것으로 해석하는 방법을 어떻게 알 수 있습니까? 타입 데이터가 인접 바이트에 저장되어 있습니까? (1 바이트에 두 배의 공간을 사용하기 때문에 이것이 사실이라고 생각하지 않습니다.)

아마도 컴퓨터는 데이터 유형을 알지 못하고 그것을 사용하는 프로그램 만 알고 있다고 생각합니다. 내 생각 엔 RAM이기 때문이다 R AM 따라서 특정 프로그램이 단지 특정 주소에서 정보와이를 처리하는 방법을 프로그램 정의를 가져 CPU를 알 수 있음을 순차적으로 읽을 수 없습니다. 이것은 타입 캐스팅의 필요성과 같은 프로그래밍에 적합합니다.

내가 올바른 길을 가고 있습니까?



답변

당신의 의심이 맞습니다. CPU는 데이터의 의미를 신경 쓰지 않습니다. 그러나 때로는 차이가 있습니다. 예를 들어 인수가 의미 적으로 부호가 있거나 부호가없는 경우 일부 산술 연산은 다른 결과를 생성합니다. 이 경우 CPU에 의도 한 해석을 알려 주어야합니다.

데이터를 이해하는 것은 프로그래머의 몫입니다. CPU는 주문에 복종하고 그들의 의미 나 목표를 행복하게 인식하지 못합니다.


답변

As others have already answered, today’s common CPUs do not know what a given memory position contains; the software decides.

However, there are other possibilities. Lisp Machines for example used a tagged architecture which stored the type of each memory position; that way the hardware itself could do some of the work of high-level languages.

And even now, I guess you could consider the NX bit in Intel, AMD, ARM and other architectures to follow the same principle: distinguish at the hardware level whether a given memory zone contains data or instructions.

Also, just for completeness, in Harvard architectures (like some microcontrollers) data and instructions are physically separated, so the CPU does have some idea of what it is reading.

In this Quora question there’s some commentary on how the tagged memory worked, its performance implications and demise, and more.


답변

Yes. The program just gets a byte from the memory and it can interpret it however it wants.


답변

There are no type annotations.
RAM stores pure data, and then program defines what to do.

With CPU registers is a bit harder, if you have registers of given type (like FPU), you tell what is inside.
Operations on floating point registers are explicitly using typed data. You or your compiler tell what and when should be put there, so you not have such freedom.
Computer does not make any assumptions on underlying data in RAM, and in registers with one exception – typed registers in CPU are of known type, optimised to deal with it. This is only to show that there are places where data is to be of expected type, but nothing stops you from casting strings to floats and multiply them.

In programming languages you specify type, or in higher level languages data is general and compiler / interpreter / VM encodes what is inside with overhead.
For example in C your pointer type tells what to do with data, how to access it.

Of course you can read string (characters) and treat then as floating point values, integers and mix them.


답변

The CPU doesn’t care, it executes assembly code, which justs merely moves data around, shift it, add it or multiply it…

Data Types are a higher level language concept: in C or C++ you need to specify Types for every single piece of data you manipulate; the C/C++ Compiler takes care of transforming these pieces of data into the right commands for the CPU to process (compilers write assembly code)

In some even higher level languages, Types may be inferred: in Python or Javascript, for example, one does not have to specify data types, yet data has a type and you can’t add a string with an integer, but you can add a float with an integer: the ‘compiler’ (which in the case of Javascript is a JIT (Just in Time) Compiler. Javascript is often called an ‘interpreted’ language because historically browsers interpreted Javascript code, but nowadays Javascript engines are compilers.

Code, always ends up being compiled to machine code, but obviously machine code format depends on the machine you’re targeting (x86 64bit code won’t work on a x86 32 bits machine or a ARM processor for example)

So there is actually a lot of layers involved in running interpreted code.

Java and C# are other interesting ones, as Java or C# code is technically ‘compiled’ to a Java binary (bytecode), but that code itself is then interpreted by the Java Runtime, which is specific to the underlying hardware (one needs to install the JRE targeting the right machine to run Java binaries (Jars) )


답변

Datatypes are not a hardware feature. The CPU knows a couple (well, a lot) of
different commands. Those are called the instruction set of a CPU.

One of the
best known ones is the x86 instruction set.
If you search for “multiply” on this page, you get 50 results. MULPD and MULSD for the multiplication of doubles,
FIMUL for integer multiplication, …

Those commands work on registers. Registers are memory slots which can contain a fixed number of bits (often 32 or 64, depending on which architecture your CPU uses), no matter what these bits represent.
Hence the CPU instruction interprets the values of the registers in a different
way, but the values themselves don’t have types.

An example was given at PyCon 2017 by Stuart Williams:


답변

…that a particular program just tells the CPU to fetch the info from a specific address and the program defines how to treat it.

Exactly. But RAM is not read “sequentially”, and it stands for Random Access Memory which is exactly the opposite.

Besides knowing what a byte is, you don’t even know if it’s a byte, or a fragment of a larger item like a floating-point number.

I’d like to add to other answers by giving some specific examples.

Consider 01000001. The program might copy it from one place to another as part of a large parcel of data without any regard to its meaning. But copying that to the address used by the text-mode video buffer will cause the letter A to show in some position on the screen. The exact same action when the card is in a CGA graphics mode will display a red pixel and a blue pixel.

In a register, it could be the number 65 as an integer. Doing arithmetic to set the 32’s bit could mean anything without context, but might specifically be changing a letter to lower case.

The 8086 CPU (still) has special instructions called DAA that is used when the register holds 2 decimal digits, so if you just used that instruction you are interpreting it as two digits 41.

Programs crash because a memory word is read thinking it is a pointer when something otherwise was stored there.

Using a debugger, inspecting memory, a map is used to guide the interpretation for display. Without this symbol information, a low-level debugger lets you specify: show this address as 16-bit words, show this address as long floating point, as strings… whatever. Looking at a network packet dump or unknown file format, puzzling it out is a challenge.

That’s a major source of power and flexibility in modern computer architecture: a memory cell can mean anything, data or instruction, implicit only in what it “means” to the program by what it does with the value and how that affects subsequent operations. meaning is deeper than integer width: are these characters … characters in ascii or ebcdic? Forming words in English or SQU product codes? The address to send to or the return address it came from? The lowest level interpretation (logical bits; integer-like, signed or unsigned; float; bcd; pointer) is contextual at the instruction-set level, but you see that it’s all context at some level: the to address is what it is because of the location it’s printed on the envelope. It is contextual to the rules of the postman, not the CPU. The context is one big continuum, with bits on one end of it.


※ Footnote: The DAA instruction is encoded as a byte 00100111. So that byte is the aforenamed instruction if read in the instruction stream, and the digits 27 if interpreted as bcd digits, and 0x27 = 39 as an integer, which is the numeral 9 in ASCII, and part of the interrupt table (half of INT 13 2-byte address, used for BIOS service routines).